To obtain the bit mask needed for these operations, we can use a bit shift operator to shift the number 1 to the left by the appropriate number of places.
We can view a bit array as a subset of {1,2,...,n}, where a 1 bit indicates a number in the set and a 0 bit a number not in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred.
Given two bit arrays of the same size representing sets, we can compute their union, intersection, and set-theoretic difference using n/w simple bit operations each (2n/w for difference), as well as the complement of either:
for i from 0 to n/w-1
complement_a[i] := not a[i]
union[i] := a[i] or b[i]
intersection[i] := a[i] and b[i]
difference[i] := a[i] and (not b[i])
If we wish to iterate through the bits of a bit array, we can do this efficiently using a doubly-nested loop which loops through each word, one at a time. Only n/w memory accesses are required:
for i from 0 to n/w-1
index := 0 // if needed
word := a[i]
for b from 0 to w-1
value := word and 1 ≠ 0
word := word shift right 1
// do something with value
index := index + 1 // if neededBoth of these code samples exhibit ideal locality of reference, and so get a large performance boost from a data cache. If a cache line is k words, only about n/wk cache misses will occur.
On machines that use two's complement arithmetic, such as all PCs, the find first one function can be performed quickly by anding a word with its two's complement, that is performing (w and -w) results in a word with only the righmost bit set of the bits that were set before the operation. For instance, if the original value were 6 (...110), after this operation the result would be 2 (...010).
However, bit arrays aren't the solution to everything. In particular:
We mentioned above that bit arrays are used for priority queues, where the bit at index k is set if and only if k is in the queue; this data structure is used, for example, by the Linux kernel, and benefits strongly from a find-first-zero operation in hardware.
Bit arrays can be used for the allocation of memory pages, inodes, disk sectors, etc. In such cases, the term bitmap may be used. However, this term is frequently used to refer to raster images, which may use multiple bits per pixel.
Another application of bit arrays is the Bloom filter, a probabilistic set data structure that can store large sets in a small space in exchange for a small probability of error. It is also possible to build probabilistic hash tables based on bit arrays that accept either false positives or false negatives.
Bit arrays and the operations on them are also important for constructing succinct data structures, which use close to the minimum possible space. In this context, operations like finding the nth 1 bit or counting the number of 1 bits up to a certain position become important.
Bit arrays are also a useful abstraction for examining streams of compressed data, which often contain elements that occupy portions of bytes or are not byte-aligned. For example, the compressed Huffman coding representation of a single 8-bit character can be anywhere from 1 to 255 bits long.
In information retrieval, bit arrays are a good representation for the posting lists of very frequent terms. If we compute the gaps between adjacent values in a list of strictly increasing integers and encode them using unary coding, the result is a bit array with a 1 bit in the nth position if and only if n is in the list. The implied probability of a gap of n is 1/2n. This is also the special case of Golomb coding where the parameter M is 1; this parameter is only normally selected when -log(2-p)/log(1-p) ≤ 1, or roughly the term occurs in at least 38% of documents.
In C++, although individual bools typically occupy the same space as a byte or an integer, the STL type vector is a partial specialization in which bits are packed as a space efficiency optimization. Since bytes (and not bits) are the smallest addressable unit in C++, the [] operator does not return a reference to an element, but instead returns a proxy reference. This might seem a minor point, but it means that vector is not a standard STL container, which is why the use of vector is generally discouraged. Another unique STL class, bitset, creates a vector of bits fixed at a particular size at compile-time, and in its interface and syntax more resembles the idiomatic use of words as bit sets by C programmers. It also has some additional power, such as the ability to efficiently count the number of bits that are set. The Boost C++ Libraries provides a dynamic_bitset class whose size is specified at run-time.
The D programming language provides bit arrays in both of its competing standard libraries. In phobos, they are provided in std.bitmanip, and in Tango, they are provided in tango.core.BitArray. As in C++, the [] operator does not return a reference, since individual bits are not directly addressable on most hardware, but instead returns a bool.
In Java, the class creates a bit array which is then manipulated with functions named after bitwise operators familiar to C programmers. Unlike the bitset in C++, the Java BitSet expands dynamically if a bit is set at an index beyond the current size of the bit vector. In addition, there is a class , which represents a Set of values of an enumerated type internally as a bit vector, as a safer alternative to bitfields.
The .NET Framework supplies a BitArray collection class. It stores boolean values, supports random access and bitwise operators, can be iterated over, and its Length property can be changed to grow or truncate it.
Although Standard ML has no support for bit arrays, Standard ML of New Jersey has an extension, the BitArray structure, in its SML/NJ Library. It is not fixed in size and supports set operations and bit operations, including, unusually, shift operations.