Definitions

# Directed acyclic graph

In computer science and mathematics, a directed acyclic graph, also called a DAG, is a with no ; that is, for any vertex v, there is no nonempty directed path that starts and ends on v. DAGs appear in models where it doesn't make sense for a vertex to have a path to itself; for example, if an edge uv indicates that v is a part of u, such a path would indicate that u is a part of itself, which is impossible. Informally speaking, a DAG "flows" in a single direction.

Each directed acyclic graph gives rise to a partial order ≤ on its vertices, where uv exactly when there exists a directed path from u to v in the DAG. However, many different DAGs may give rise to this same reachability relation. Among all such DAGs, the one with the fewest edges is the transitive reduction of each of them and the one with the most is their transitive closure. In particular, the transitive closure is the reachability order ≤.

## Terminology

A source is a vertex with no incoming edges, while a sink is a vertex with no outgoing edges. A finite DAG has at least one source and at least one sink.

The depth of a vertex in a finite DAG is the length of the longest path from a source to that vertex, while its height is the length of the longest path from that vertex to a sink.

The length of a finite DAG is the length (number of edges) of a longest directed path. It is equal to the maximum height of all sources and equal to the maximum depth of all sinks.

## Properties

Every directed acyclic graph has a topological sort, an ordering of the vertices such that each vertex comes before all vertices it has edges to. In general, this ordering is not unique. Any two graphs representing the same partial order have the same set of topological sort orders.

DAGs can be considered to be a generalization of trees in which certain subtrees can be shared by different parts of the tree. In a tree with many identical subtrees, this can lead to a drastic decrease in space requirements to store the structure. Conversely, a DAG can be expanded to a forest of rooted trees using this simple algorithm:

• While there is a vertex v with in-degree n > 1,
• Make n copies of v, each with the same outgoing edges but no incoming edges.
• Attach one of the incoming edges of v to each vertex.
• Delete v.

If we explore the graph without modifying it or comparing nodes for equality, this forest will appear identical to the original DAG.

Some algorithms become simpler when used on DAGs instead of general graphs. For example, search algorithms like depth-first search without iterative deepening normally must mark vertices they have already visited and not visit them again. If they fail to do this, they may never terminate because they follow a cycle of edges forever. Such cycles do not exist in DAGs (marking is still a good idea as it reduces the worst-case performance from exponential (due to multiple paths) to linear).

A polytree is a specifically efficient kind of DAG, with many tree-like properties. Its efficiency is exploited, for example, in the belief propagation algorithm for Bayesian networks.

The number of Non-Isomorphic DAGs is obtained by Weisstein's conjecture: the number of DAGs on n vertices is equal to the number of nxn matrices with entries from {0,1} and only positive real eigenvalues, proved by McKay et al. .

## Applications

Directed acyclic graphs have many important applications in computer science, including:

### Directed acyclic word graph

A directed acyclic word graph (DAWG) is a data structure in computer science similar to a trie but much more space efficient. It is used to represent a set of strings and supports a constant time search operation. The lookup time is proportional to the length of the search string and is the same as an equivalent trie.

A DAWG is defined as a trie where isomorphic subtrees are identified, thus producing an acyclic directed graph instead of a tree structure. Each node in the graph represents a unique substring. Each outgoing edge from one node to the next is labeled with a letter and represents appending that letter to the substring represented by the first node to get the substring represented by the second node. There is one node having zero incoming edges; it represents the empty string. Similarly, the nodes representing entire strings that are not a substring of any other string (this can always be guaranteed by appending an otherwise unused character such as \$ to the end of every string) have zero outgoing edges.

The primary difference between DAWG and trie is the elimination of suffix redundancy in storing strings. The trie eliminates prefix redundancy since all common prefixes are shared between strings, such as between doctors and doctorate the doctor prefix is shared. In a DAWG common suffixes are also shared, such as between desertion and desertification both the prefix deserti- and suffix -tion are shared. For dictionary sets of common English words, this translates into major memory usage reduction.

Acyclic deterministic finite automata (ADFA) are deterministic finite automata without cycles. In other words, they can only represent finite sets of strings. They can be used as a data structure for word storage with extremely fast search performance. Minimized ADFA can be very compact as well. The size of a minimized ADFA does not directly depend on the number of keys stored. In fact, after a certain point, as more words are stored in a minimized ADFA, its size can begin to decrease. Its size would actually appear to be related to how complex the set of strings is. A trie is a type of ADFA.

## References

• M. Crochemore and R. Verin, Direct Construction of Compact Directed Acyclic Word Graphs, 8th Annual Symposium, CPM 97, Aarhus, Denmark, 116-129, 1997.