Definitions

Substring

A subsequence, substring, prefix or suffix of a string is a subset of the symbols in a string, where the order of the elements is preserved. In this context, the terms string and sequence have the same meaning.

Subsequence

Main article subsequence

A subsequence of a string $T = t_1 t_2 dots t_n$ is a string $hat T = t_\left\{i_1\right\} dots t_\left\{i_m\right\}$ such that $i_1 < dots < i_m$, where $m leq n$. Subsequence is a generalisation of substring, suffix and prefix. Finding the longest string which is equal to a subsequence of two or more strings is known as the longest common subsequence problem.

Example: The string `anna` is equal to a subsequence of the string `banana`:

`banana`
` || ||`
` an na`

Substring

A substring (or factor) of a string $T = t_1 dots t_n$ is a string $hat T = t_\left\{1+i\right\} dots t_\left\{m+i\right\}$, where $0 leq i$ and $m + i leq n$. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If $hat T$ is a substring of $T$, it is also a subsequence, which is a more general concept. Given a pattern $P$, you can find its occurrences in a string $T$ with a string searching algorithm. Finding the longest string which is equal to a substring of two or more strings is known as the longest common substring problem.

Example: The string `ana` is equal to substrings (and subsequences) of `banana` at two different offsets:

`banana`
` |||||`
` ana||`
`   |||`
`   ana`

In the mathematical literature, substrings are also called subwords (in America) or factors (in Europe).

Prefix

A prefix of a string $T = t_1 dots t_n$ is a string $widehat T = t_1 dots t_\left\{m\right\}$, where $m leq n$. A proper prefix of a string is not equal to the string itself and not empty ($0 < m < n$). A prefix can be seen as a special case of a substring.

Example: The string `ban` is equal to a prefix (and substring and subsequence) of the string `banana`:

`banana`
|||
`ban`

The square subset symbol is sometimes used to indicate a prefix, so that $widehat T sqsubseteq T$ denotes that $widehat T$ is a prefix of $T$. This defines a binary relation on strings, called the prefix relation.

In formal language theory, the term prefix of a string is also commonly understood to be the set of all prefixes of a string, with respect to that language. See the article on string functions (mathematics) for more details.

Suffix

A suffix of a string $T = t_1 dots t_n$ is a string $hat T = t_\left\{n-m+1\right\} dots t_\left\{n\right\}$, where $m leq n$. A proper suffix of a string is not equal to the string itself and not empty ($0 < m < n$). A suffix can be seen as a special case of a substring.

Example: The string `nana` is equal to a suffix (and substring and subsequence) of the string `banana`:

`banana`
`  ||||`
`  nana`

Border

A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab".

Superstring

Given a set of k strings $P = \left\{s_1,s_2,s_3,dots s_k\right\}$, a superstring of the set P is single string that contains every string in P as a substring. For example, a concatenation of the strings of P in any order gives a trivial superstring of P. For a more interesting example, let P = {abcc, efab, bccla}. Then bcclabccefab is a superstring of P, and efabccla is another, shorter superstring of P. Generally, we are interested in finding superstrings whose length is small.

References

• Gusfield, Dan (1999). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press. ISBN 0-521-58519-8.

Search another word or see Substringon Dictionary | Thesaurus |Spanish