The relationship between the complexity classes P and NP is an unsolved question in theoretical computer science. It is considered to be the most important problem in the field – the Clay Mathematics Institute has offered a $1 million US prize for the first correct proof.
In essence, the question P = NP? asks: if 'yes'-answers to a 'yes'-or-'no'-question can be verified "quickly" (in polynomial time), can the answers themselves also be computed quickly?
Consider, for instance, the subset-sum problem, an example of a problem which is "easy" to verify, but whose answer is believed (but not proven) to be "difficult" to compute. Given a set of integers, does some nonempty subset of them sum to 0? For instance, does a subset of the set {{nowrap| {−2, −3, 15, 14, 7, −10} }} add up to 0? The answer "yes, because {{nowrap| {−2, −3, −10, 15} }} add up to zero", can be quickly verified with a few additions. However, finding such a subset in the first place could take much longer. The information needed to verify a positive answer is also called a certificate. Given the right certificates, "yes" answers to our problem can be verified in polynomial time, so this problem is in NP.
An answer to the P = NP question would determine whether problems like the subset-sum problem are as "easy" to compute as to verify. If it turned out P does not equal NP, it would mean that some NP problems are substantially "harder" to compute than to verify.
The restriction to yes/no problems is unimportant; the resulting problem when more complicated answers are allowed (whether FP = FNP) is equivalent.
The relation between the complexity classes P and NP is studied in computational complexity theory, the part of the theory of computation dealing with the resources required during computation to solve a given problem. The most common resources are time (how many steps it takes to solve a problem) and space (how much memory it takes to solve a problem).
In such analysis, a model of the computer for which time must be analyzed is required. Typically, such models assume that the computer is deterministic (given the computer's present state and any inputs, there is only one possible action that the computer might take) and sequential (it performs actions one after the other). As of 2008, these assumptions are satisfied by all practical computers yet devised, even those featuring parallel computing.
In this theory, the class P consists of all those decision problems (defined below) that can be solved on a deterministic sequential machine in an amount of time that is polynomial in the size of the input; the class NP consists of all those decision problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose solution can be found in polynomial time on a non-deterministic machine. Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:
Conceptually, a decision problem is a problem that takes as input some string, and outputs "yes" or "no". If there is an algorithm (say a Turing machine, or a computer program with unbounded memory) which is able to produce the correct answer for any input string of length in at most steps, where and are constants independent of the input string, then we say that the problem can be solved in polynomial time and we place it in the class P. Formally, P is defined as the set of all languages which can be decided by a deterministic polynomial-time Turing machine. That is,
P =
where
and a deterministic polynomial-time Turing machine is a deterministic Turing machine which satisfies the following two conditions:
NP can be defined similarly using nondeterministic Turing machines (the traditional way). However, a modern approach to define NP is to use the concept of certificate and verifier. Formally, NP is defined as the set of languages over a finite alphabet that have a verifier that runs in polynomial time, where the notion of "verifier" is defined as follows.
Let be a language over a finite alphabet, .
if, and only if, there exists a binary relation and a positive integer such that the following two conditions are satisfied:
A Turing machine that decides is called a verifier for and a such that is called a certificate of membership of in .
In general, a verifier does not have to be polynomial-time. However, for to be in NP, there must be a verifier that runs in polynomial time.
Let and ; and
Although it is unknown whether P = NP, problems outside of P are known. A number of succinct problems, that is, problems which operate not on normal input but on a computational description of the input, are known to be EXPTIME-complete. Because it can be shown that P
The problem of deciding the truth of a statement in Presburger arithmetic requires even more time. Fischer and Rabin proved in 1974 that every algorithm which decides the truth of Presburger statements has a runtime of at least
All of the above discussion has assumed that P means "easy" and "not in P" means "hard". While this is a common and reasonably accurate assumption in complexity theory, it is not always true in practice. See Cobham's thesis for an in-depth discussion of this point, but the main arguments are:
Most computer scientists believe that P≠NP. A key reason for this belief is that after decades of studying these problems, no one has been able to find a polynomial-time algorithm for any of the more than 3000 NP-complete problems (see List of NP-complete problems). These algorithms were sought long before the concept of NP-completeness was even known (Karp's 21 NP-complete problems, among the first found, were all well-known existing problems at the time they were shown to be NP-complete). Furthermore, the result P = NP would imply many other startling results that are currently believed to be false, such as NP = co-NP and P = PH.
It is also intuitively argued that the existence of problems that are hard to solve but for which the solutions are easy to verify matches real-world experience.
On the other hand, some researchers believe that we are overconfident in P ≠ NP and should explore proofs of P = NP as well. For example, in 2002 these statements were made:
One of the reasons the problem attracts so much attention is the consequences of the answer.
A proof of P = NP could have stunning practical consequences, if the proof leads to efficient methods for solving some of the important problems in NP. Various NP-complete problems are fundamental in many fields. There are enormous positive consequences that would follow from rendering tractable many currently mathematically intractable problems. For instance, many problems in operations research are NP-complete, such as some types of integer programming, and the travelling salesman problem, to name two of the most famous examples. Efficient solutions to these problems would have enormous implications for logistics. Many other important problems, such as some problems in Protein structure prediction are also NP-complete; if these problems were solvable efficiently it could spur considerable advances in biology.
But such changes may pale in significance compared to the revolution an efficient method for solving NP-complete problems would cause in mathematics itself. According to Stephen Cook,
Research mathematicians spend their careers trying to prove theorems, and some proofs have taken decades or even centuries to find after problems have been stated – for instance, Fermat's Last Theorem took over three centuries to prove. A method that is guaranteed to find proofs to theorems, should one exist of a "reasonable" size, would essentially end this struggle.
A proof that showed that P ≠ NP, while lacking the practical computational benefits of a proof that P = NP, would also represent a massive advance in computational complexity theory and provide guidance for future research. It would allow one to show in a formal way that many common problems cannot be solved efficiently, so that the attention of researchers can be focused on partial solutions or solutions to other problems. Due to widespread belief in P ≠ NP, much of this focusing of research has already taken place.
The Clay Mathematics Institute million-dollars prize and a huge amount of dedicated research with no substantial results suggest that the problem is difficult. In fact, some of the most fruitful research related to the P = NP problem has been in showing that existing proof techniques are not powerful enough to answer the question, thus suggesting that novel technical approaches are probably required.
Essentially all known proof techniques in computational complexity theory fall into one of the following classifications, each of which is known to be insufficient to prove that P ≠ NP:
These barriers are another reason why NP-complete problems are useful: if a polynomial-time algorithm can be demonstrated for an NP-complete problem, this would solve the P = NP problem in a way which is not excluded by the above results.
No one knows whether polynomial-time algorithms exist for NP-complete languages. But if such algorithms do exist, some of them are already known. For example, the following algorithm (due to Levin) correctly accepts an NP-complete language, but as of 2008, it is unknown how long it takes in general.
// Algorithm that accepts the NP-complete language SUBSET-SUM.
//
// This is a polynomial-time algorithm if and only if P=NP.
//
// "Polynomial-time" means it returns "yes" in polynomial time when
// the answer should be "yes", and runs forever when it is "no".
//
// Input: S = a finite set of integers
// Output: "yes" if any subset of S adds up to 0.
// Runs forever with no output otherwise.
// Note: "Program number P" is the program obtained by
// writing the integer P in binary, then
// considering that string of bits to be a
// program. Every possible program can be
// generated this way, though most do nothing
// because of syntax errors.
FOR N = 1...infinity
FOR P = 1...N
Run program number P for N steps with input S
IF the program outputs a list of distinct integers
AND the integers are all in S
AND the integers sum to 0
THEN
OUTPUT "yes" and HALT
If, and only if, P = NP, then this is a polynomial-time algorithm accepting an NP-complete language. "Accepting" means it gives "yes" answers in polynomial time, but is allowed to run forever when the answer is "no".
Perhaps we want to "solve" the SUBSET-SUM problem, rather than just "accept" the SUBSET-SUM language. That means we want the algorithm to always halt and return a "yes" or "no" answer. As of 2008, it is unknown whether an algorithm exists that does this in polynomial time. But if there is an algorithm that provably does this in polynomial time, then so does the algorithm that is obtained by replacing the IF statement in the above algorithm with this:
IF the program outputs a complete math proof
AND each step of the proof is legal
AND the conclusion is that S does (or does not) have a subset summing to 0
THEN
OUTPUT "yes" (or "no") and HALT
The P = NP problem can be restated in terms of the expressibility of certain classes of logical statements, as a result of work in descriptive complexity. All languages (of finite structures with a fixed signature including a linear order relation) in P can be expressed in first-order logic with the addition of a suitable least fixed point operator (effectively, this, in combination with the order, allows the definition of recursive functions); indeed, (as long as the signature contains at least one predicate or function in addition to the distinguished order relation [so that the amount of space taken to store such finite structures is actually polynomial in the number of elements in the structure]), this precisely characterizes P. Similarly, NP is the set of languages expressible in existential second-order logic — that is, second-order logic restricted to exclude universal quantification over relations, functions, and subsets. The languages in the polynomial hierarchy, PH, correspond to all of second-order logic. Thus, the question "is P a proper subset of NP" can be reformulated as "is existential second-order logic able to describe languages (of finite linearly ordered structures with nontrivial signature) that first-order logic with least fixed point cannot?". The word "existential" can even be dropped from the previous characterization, since P = NP if and only if P = PH (as the former would establish that NP = co-NP, which in turn would imply that NP = PH).