An associative array
(also associative container
, finite map
, and in query-processing an index
or index file
) is an abstract data type
composed of a collection
of unique keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup
or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping
or binding. For example, if the value associated with the key
, we say that our array maps
. Associative arrays are very closely related to the mathematical concept of a function
with a finite domain
. As a consequence, a common and important use of associative arrays is in memoization
From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an index to an arbitrary data type such as integers, other primitive types, or even objects, an associative array's keys can be arbitrarily typed. The values of an associative array do not need to be the same type, although this is dependent on the programming language.
The operations that are usually defined for an associative array are:
- Add: Bind a new key to a new value
- Reassign: Bind an old key to a new value
- Remove: Unbind a key from a value and remove the key from the key set
- Lookup: Find the value (if any) that is bound to a key
One can think of a telephone book as an example of an associative array, where names are the keys and phone numbers are the values. Using the usual array-like notation, we might write
and so on. These entries can be thought of as two records in a database table:
|| telephone |
|| 01-1234-56 |
|| 02-4321-56 |
To retrieve the element from the associative array, we use a similar notation i.e.
x = telephone['ada']
y = telephone['charles']
Another example would be a dictionary where words are the keys and definitions are the values.
dictionary['toad']='four legged amphibian'
dictionary['cow']='female four-legged domesticated mammalian ruminant'
Since a database equivalent is that of a table containing precisely two fields - key and value, we can use an associative array to store any information which can be held in this form.
Data structures for associative arrays
Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists
There are two main efficient data structures used to represent associative arrays, the hash table
and the self-balancing binary search tree
(such as a red-black tree
or an AVL tree
). Skip lists
are also an alternative, though relatively new and not as widely used. B-trees
(and variants) can also be used, and are commonly used when the associative array is too large to reside entirely in memory, for instance in a simple database. Relative advantages and disadvantages include:
- Hash tables have faster average lookup and insertion time (O(1)) compared to a balanced binary search tree's O(log n).
- Hash tables have seen extensive use in real-time systems, but trees can be useful in high-security realtime systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
- Hash tables can have more compact storage for small value types, especially when the values are bits.
- There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
- Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
- Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently (they require the data to be sorted in a separate step).
- Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. (With n elements in the array and performing the operation on a contiguous range of m keys, a balanced binary tree will take O(log n + m) time while a hash table would need O(n) time as it needs to search the entire table.)
- In cases where the number of elements in the array fluctuates a lot, trees have other benefits over a hash table. A hash table of a given size can only hold some number of keys before it becomes inefficient, and if the size grows too much it must allocate a new backing store, rehash all the keys, and copy the data to the larger table; this takes O(n) time but can be amortised over multiple access, which retains the O(1) property. A balanced search tree will always take O(log n) time for insertion. Once a hash table grows, if the size decreases the keys can be again rehashed and copied to a smaller space. In contrast, the memory demands of a tree grow and shrink in smaller steps with the number of elements currently in the tree, potentially causing more chance of fragmentation of the program's heap.
A simple but generally inefficient type of associative array is an association list
, often called an alist
for short, which simply stores a linked list
of key-value pairs. Each lookup does a linear search
through the list looking for a key match.
Advantages of association lists include:
- It need only be known how to test keys for equality — which is minimal for maps supporting the four basic operations — while the above alternatives require a linear order comparison or a hash function.
- For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
- Insertions are done in constant time by adding the new association to the head of the list.
If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees
or Judy arrays
, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables
String-keyed maps can avoid extra comparisons during lookups by using tries.
A variation of the map (associative array) is the multimap, which is the same as map data structures, but allows a key to be mapped to more than one value. Formally, a multimap can be thought of as a regular associative array that maps unique keys to nonempty multisets of values, although actual implementation may vary. C++
's Standard Template Library
provides the "
for the sorted multimap, SGI's STL provides the "
" container, which implements a multimap using a hash table, and some varieties of LPC
have built-in multimap support.
Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting.
In many more languages, they are available as library functions without special syntax.
Associative arrays have a variety of names. In Smalltalk, Objective-C, .NET, Python and REALbasic they are called dictionaries; in Perl and Ruby they are called hashes; in C++ and Java they are called maps (see std::map and ) and in Common Lisp and Windows PowerShell they are called hashtables (since both typically use this implementation). In PHP all arrays can be associative, except that the keys are limited to integers and strings and can only be a single level of subscripts.