Contact analysis is based on the fact that, in any sample of any written language, certain symbols appear adjacent to other symbols with varying frequencies. Moreover, these frequencies are roughly the same for almost all samples of that language, even when the distribution of the symbols themselves differs significantly from normal. This is true regardless of whether the symbols being used are words or letters.
In some ciphers, these properties of the natural language plaintext are preserved in the ciphertext, and have the potential to be exploited in a ciphertext-only attack.
Although in a sense contact analysis can be considered a type of frequency analysis, most discussions of frequency analysis concern themselves with the simple probabilities of the symbols in the text: or
Contact analysis is based on the conditional probability that certain letters will precede or succeed other letters: , or , or even , where and are subsets of the alphabet being used.
Where frequency analysis is based on first-order statistics, contact analysis is based on second or third-order statistics.