4c1j5b2p0cv4w1 8rx2y39umgw5q85s7ur qbjfdppa0q7nieieqe9noc4cvafzf
The first string admits a short English language description, namely "ab 32 times", which consists of 11 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, which has 64 characters.
More formally, the complexity of a string is the length of the string's shortest description in some fixed universal description language. The sensitivity of complexity relative to the choice of description language is discussed below. It can be shown that the Kolmogorov complexity of any string cannot be too much larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are not considered to be complex. The notion of Kolmogorov complexity is surprisingly deep and can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.
We could alternatively choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which on input w outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article we will use an informal approach.
Any string s has at least one description, namely the program
If a description of s, d(s), is of minimal length—i.e. it uses the fewest number of characters—it is called a minimal description of s. Then the length of d(s)—i.e. the number of characters in the description—is the Kolmogorov complexity of s, written K(s). Symbolically,
We now consider how the choice of description language affects the value of K and show that the effect of changing the description language is bounded.
Theorem. If K1 and K2 are the complexity functions relative to description languages L1 and L2, then there is a constant c (which depends only on the languages L1 and L2) such that
Proof. By symmetry, it suffices to prove that there is some constant c such that for all bitstrings s,
To see why this is so, there is a program in the language L1 which acts as an interpreter for L2:
function InterpretLanguage(string p)
where p is a program in L2. The interpreter is characterized by the following property:
Thus if P is a program in L2 which is a minimal description of s, then InterpretLanguage(P) returns the string s. The length of this description of s is the sum of
This proves the desired upper bound.
See also invariance theorem.
Kolmogorov Complexity was first invented by Ray Solomonoff in 1960, who described it in "A Preliminary Report on a General Theory of Inductive Inference" as a side product to his invention of Algorithmic Probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part I and Part II" in Information and Control.
Andrey Kolmogorov independently invented complexity as a measure of information content, first describing it in 1965, Problems Inform. Transmissions, 1, (1965), 1-7. Gregory Chaitin also invented it independently, submitting 2 reports on it in 1965, a preliminary investigation published in 1966 (J. ACM, 13(1966) and a more complete discussion in 1969 (J. ACM, 16(1969).
When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority (IEEE Trans. Inform Theory, 14:5(1968), 662-664). For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence while Algorithmic Probability became associated with Solomonoff, who focussed on prediction using his invention of the universal a priori probability distribution.
There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974).
Axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov (Burgin 1982). Axiomatic approach to Kolmogorov complexity was further developed in the book (Burgin 2005) and applied to software metrics (Burgin and Debnath, 2003; Debnath and Burgin, 2003).
Naming this concept "Kolmogorov complexity" is an example of the Matthew effect.
It is not hard to see that the minimal description of a string cannot be too much larger than the string itself: the program GenerateFixedString above that outputs s is a fixed amount larger than s.
Theorem. There is a constant c such that
The first result is that there is no way to effectively compute K.
Theorem. K is not a computable function.
In other words, there is no program which takes a string s as input and produces the integer K(s) as output. We show this by contradiction by making a program that creates a string that should only be able to be created by a longer program. Suppose there is a program
function KolmogorovComplexity(string s)
that takes as input a string s and returns K(s). Now consider the program
function GenerateComplexString(int n)
for i = 1 to infinity:
for each string s of length exactly i
if KolmogorovComplexity(s) >= n
This program calls KolmogorovComplexity as a subroutine. This program tries every string, starting with the shortest, until it finds a string with complexity at least n, then returns that string. Therefore, given any positive integer n, it produces a string with Kolmogorov complexity at least as great as n. The program itself has a fixed length U. The input to the program GenerateComplexString is an integer n; here, the size of n is measured by the number of bits required to represent n which is log2(n). Now consider the following program:
This program calls GenerateComplexString as a subroutine and also has a free parameter n0. This program outputs a string s whose complexity is at least n0. By an auspicious choice of the parameter n0 we will arrive at a contradiction. To choose this value, note s is described by the program GenerateParadoxicalString whose length is at most
where C is the "overhead" added by the program GenerateParadoxicalString. Since n grows faster than log2(n), there exists a value n0 such that
This is proof by contradiction where the contradiction is similar to the Berry paradox: "Let n be the smallest positive integer that cannot be defined in fewer than twenty English words." It is also possible to show the uncomputability of K by reduction from the uncomputability of the halting problem H, since K and H are Turing-equivalent.
The chain rule for Kolmogorov complexity states that
It states that the shortest program that reproduces X and Y is no more than a logarithmic term larger than a program to reproduce X and a program to reproduce Y given X. Using this statement one can define an analogue of mutual information for Kolmogorov complexity.
A string s is compressible by a number c if it has a description whose length does not exceed |s| − c. This is equivalent to saying K(s) ≤ |s| − c. Otherwise s is incompressible by c. A string incompressible by 1 is said to be simply incompressible; by the pigeonhole principle, incompressible strings must exist, since there are 2n bit strings of length n but only 2n−2 shorter strings, that is strings of length n − 1 or less.
For the same reason, "most" strings are complex in the sense that they cannot be significantly compressed: K(s) is not much smaller than |s|, the length of s in bits. To make this precise, fix a value of n. There are 2n bitstrings of length n. The uniform probability distribution on the space of these bitstrings assigns to each string of length exactly n equal weight 2−n.
Theorem. With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least 1 − 2−c+1 + 2−n.
To prove the theorem, note that the number of descriptions of length not exceeding n − c is given by the geometric series:
There remain at least
many bitstrings of length n that are incompressible by c. To determine the probability divide by 2n.
This theorem is the justification for various challenges in comp.compression FAQ Despite this result, it is sometimes claimed by certain individuals (considered cranks) that they have produced algorithms which uniformly compress data without loss. See lossless data compression.
Theorem. There exists a constant L (which only depends on the particular axiomatic system and the choice of description language) such that there does not exist a string s for which the statement
Note that by the abundance of nearly incompressible strings, the vast majority of those statements must be true.
The proof of this result is modeled on a self-referential construction used in Berry's paradox. The proof is by contradiction. If the theorem were false, then
We can find an effective enumeration of all the formal proofs in S by some procedure
function NthProof(int n)which takes as input n and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here (examples of proofs which will be listed by the procedure NthProof are the various known proofs of the law of quadratic reciprocity, those of Fermat's little theorem or the proof of Fermat's last theorem all translated into the formal language of S). Some of these are complexity formulas of the form K(s) ≥ n where s and n are constants in the language of S. There is a program
function NthProofProvesComplexityFormula(int n)
which determines whether the nth proof actually proves
a complexity formula K(s) ≥ L. The strings s and
the integer L in turn are computable by programs:
function StringNthProof(int n)
function ComplexityLowerBoundNthProof(int n)
Consider the following program
function GenerateProvablyComplexString(int n)
for i = 1 to infinity:
if NthProofProvesComplexityFormula(i) and ComplexityLowerBoundNthProof(i) >= n
Given an n, this program tries every proof until it finds a string and a proof in the formal system S of the formula K(s) ≥ L for some L >= n. The program terminates by our Assumption (X). Now this program has a length U. There is an integer n0 such that U + log2(n0) + C < n0, where C is the overhead cost of
The program GenerateProvablyParadoxicalString outputs a string s for which there exists an L such that K(s) ≥ L can be formally proved in S with L >= n0. In particular K(s) ≥ n0 is true. However, s is also described by a program of length U+log2(n0)+C so its complexity is less than n0. This contradiction proves Assumption (X) cannot hold.
Similar ideas are used to prove the properties of Chaitin's constant.
The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is Bayesian (it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (even for very hard problems, MML will converge to any underlying model) and efficiency (the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity) in 1999.