In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest.
The term trie comes from "retrieval." Following the etymology, the inventor, Edward Fredkin, pronounces it [tɹi] ("tree"). However, it is usually pronounced [tɹaɪ] ("treye").
In the example shown, keys are listed in the nodes and values below them. Each complete English word has an (arbitrary) integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.
It is not necessary for keys to be explicitly stored in nodes. (In the figure, words are shown only to illustrate how the trie works.)
Though it is most common, tries need not be keyed by character strings. The same algorithms can easily be adapted to serve similar functions of ordered lists of any construct, e.g., permutations on a list of digits, permutations on a list of shapes, etc.
Tries do have some drawbacks as well:
Tries are also well suited for implementing approximate matching algorithms, including those used in spell checking software.
We can describe trie lookup (and membership) easily. Given a recursive trie type, storing an optional value at each node, and a list of children tries, indexed by the next character, (here, represented as a Haskell data type):
data Trie a =
Trie { value :: Maybe a
, children :: [(Char,Trie a)] }
We can lookup a value in the trie as follows:
find :: String -> Trie a -> Maybe a
find [] t = value t
find (k:ks) t = case lookup k (children t) of
Nothing -> Nothing
Just t' -> find ks t'
In an imperative style, and assuming an appropriate data type in place, we can describe a the same algorithm in Python (here, specifically for testing membership). Note that children is map of a node's children; and we say that a "terminal" node is one which contains a valid word.
def isKeyInTrie(node, key):
if len(key) == 0: # base case: key is an empty string
return node.isTerminal() # we have a match if the node is terminal
c = key[0] # first character in key
if not node.children.hasKey(c): # no child node?
return False # key is not empty, so there is no match
child = node.children[c] # get child node
tail = key[1:] # the rest of the key after first character
return isKeyInTrie(child, tail) # recurse
Lexicographic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:
This algorithm is a form of radix sort.
A trie forms the fundamental data structure of Burstsort; currently (2007) the fastest known, memory/cache-based, string sorting algorithm.
A parallel algorithm for sorting N keys based on tries is O(1) if there are N processors and the length of the keys have a constant upper bound. There is the potential that the keys might collide by having common prefixes or by being identical to one another, reducing or eliminating the speed advantage of having multiple processors operating in parallel.
A special kind of trie, called a suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.
|
|
|