The Turing test is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which try to appear human; if the judge cannot reliably tell which is which, then the machine is said to pass the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen (Turing originally suggested a teletype machine, one of the few text-only communication systems available in 1950).
In 1936, philosopher A J Ayer considered the standard philosophical question of other minds: how do we know that other people have the same conscious experiences as we do? In his book Language, Truth and Logic Ayer suggested a protocol to distinguish between a conscious man and an unconscious machine: 'The only ground I can have for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined'. This suggestion is very similar to the Turing test. It is not certain that Ayer's popular philosophical classic was familiar to Turing.
Researchers in Britain had been exploring "machine intelligence" for up to ten years prior to 1956. It was a common topic among the members of the Ratio Club, an informal group of British cybernetics and electronics researchers that included Alan Turing.
Turing in particular had been tackling the notion of machine intelligence since at least 1941, and one of the earliest known mentions of "computer intelligence" was made by Turing in 1947. In Turing's report, "Intelligent Machinery", he investigated "the question of whether or not it is possible for machinery to show intelligent behaviour", and as part of that investigation proposed what may be considered the forerunner to his later tests:
Thus by the time Turing published "Computing Machinery and Intelligence", he had been considering the possibility of machine intelligence for many years. This, however, was the first published paper by Turing to focus exclusively on the notion.
Turing began his 1950 paper with the claim: "I propose to consider the question, 'Can machines think?'" As Turing highlighted, the traditional approach to such a question is to start with definitions, defining both the terms machine and intelligence. Nevertheless, Turing chose not to do so. Instead he replaced the question with a new question, "which is closely related to it and is expressed in relatively unambiguous words". In essence, Turing proposed to change the question from "Do machines think?" into "Can machines do what we (as thinking entities) can do? The advantage of the new question, Turing argued, was that it "drew a fairly sharp line between the physical and intellectual capacities of a man.
To demonstrate this approach, Turing proposed a test that was inspired by a party game known as the "Imitation Game", in which a man and a woman go into separate rooms, and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back. In this game, both the man and the woman aim to convince the guests that they are the other. Turing proposed recreating the imitation game as follows:
Later in the paper he suggested an "equivalent" alternative formulation involving a judge conversing only with a computer and a man.
While neither of these two formulations precisely match the version of the Turing Test that is more generally known today, a third version was proposed by Turing in 1952. In this version, which Turing discussed in a BBC radio broadcast, Turing proposes a jury which asks questions of a computer, and where the role of the computer is to make a significant proportion of the jury believe that it is really a man.
Turing's paper considered nine common objections, which include all the major arguments against artificial intelligence that have been raised in the years since his paper was first published. (See Computing Machinery and Intelligence.)
Blay Whitby lists four major turning points in the history of the Turing Test: the publication of "Computing Machinery and Intelligence" in 1950; the announcement of Joseph Weizenbaum's ELIZA in 1966; Kenneth Colby's creation of PARRY, which was first described in 1972; and the Turing Colloquium in 1990.
ELIZA works by examining a user's typed comments for keywords. If a keyword is found, a rule is applied which transforms the user's comments and the resulting sentence is then returned. If a keyword is not found, ELIZA responds with either a generic response or by repeating one of the earlier comments. In addition, Weizenbaum developed ELIZA to replicate the behavior of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world." Due to these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA ... is not human." Thus ELIZA is claimed by many to be one of the programs (perhaps the first) that are able to pass the Turing Test.
Colby's PARRY has been described as "ELIZA with attitude - it attempts to model the behavior of a paranoid schizophrenic, using a similar (if more advanced) approach to that employed by Weizenbaum. In order to help validate the work, PARRY was tested in the early 1970s using a variation of the Turing Test. A group of experienced psychiatrists analyzed a combination of real patients and computers running PARRY through teletype machines. Another group of 33 psychiatrists were shown transcripts of the conversations. The two groups were then asked to identify which of the "patients" were human, and which were computer programs. The psychiatrists were only able to make the correct identification 48% of the time - a figure consistent with random guessing.
While neither ELIZA nor PARRY were able to pass a strict Turing Test, they - and software like them - suggested that software might be written that was able to do so. More importantly, they suggested that such software might involve little more than databases and the application of simple rules. This led to John Searle's 1980 paper, "Minds, Brains, and Programs", in which he proposed an argument against the Turing Test. Searle described a thought experiment known as the Chinese room that highlighted what he saw as a fundamental misinterpretation of what the Turing Test could and could not prove: while software such as ELIZA might be able to pass the Turing Test, they might do so by simply manipulating symbols of which they have no understanding. And without understanding, they could not be described as "thinking" in the same sense people do. Searle concludes that the Turing Test can not prove that a machine can think, contrary to Turing's original proposal.
Arguments such as that proposed by Searle and others working in the philosophy of mind sparked off a more intense debate about the nature of intelligence, the possibility of intelligent machines and the value of the Turing test that continued through the 1980s and 1990s.
1990 was the 40th anniversary of the first publication of Turing's "Computing Machinery and Intelligence" paper, and thus saw renewed interest in the test. Two significant events occurred in that year. The first was the Turing Colloquium, which was held at the University of Sussex in April, and brought together academics and researchers from a wide variety of disciplines to discuss the Turing Test in terms of its past, present and future. The second significant event was the formation of the annual Loebner prize competition.
The Loebner prize was instigated by Hugh Loebner under the auspices of the Cambridge Center for Behavioral Studies of Massachusetts, United States, with the first competition held in November, 1991. As Loebner describes it, the competition was created to advance the state of AI research, at least in part because while the Turing Test had been discussed for many years, "no one had taken steps to implement it. The Loebner prize has three awards: the first prize of $100,000 and a gold medal, to be awarded to the first program that passes the "unrestricted" Turing test; the second prize of $25,000, to be awarded to the first program that passes the "restricted" version of the test; and a sum of $2000 (now $3000) to the "most human-like" program that was entered each year. As of 2007, neither the first nor second prizes have been awarded.
The running of the Loebner prize led to renewed discussion of both the viability of the Turing Test and the aim of developing artificial intelligences that could pass it. The Economist, in an article entitled "Artificial Stupidity", commented that the winning entry from the first Loebner prize won, at least in part, because it was able to "imitate human typing errors". (Turing had considered the possibility that computers could be identified by their lack of errors, and had suggested that the computers should be programmed to add errors into their output, so as to be better "players" of the game). The issue that The Economist raised was one that was already well established in the literature: perhaps we don't really need the types of computers that could pass the Turing Test, and perhaps trying to pass the Turing Test is nothing more than a distraction from more fruitful lines of research. Equally, a second issue became apparent - by providing rules which restricted the abilities of the interrogators to ask questions, and by using comparatively "unsophisticated" interrogators, the Turing Test can be passed through the use of "trickery" rather than intelligence.
There are at least three primary versions of the Turing test - two offered by Turing in "Computing Machinery and Intelligence" and one which Saul Traiger describes as the "Standard Interpretation". While there is some debate as to whether or not the "Standard Interpretation" is described by Turing or is, instead, based on a misreading of his paper, these three versions are not regarded as being equivalent, and are seen as having different strengths and weaknesses.
Turing described a simple party game which involves three players. Player A is a man, player B is a woman, and player C (who plays the role of the interrogator) can be of either gender. In the imitation game, player C - the interrogator - is unable to see either player A or player B, and can only communicate with them through written notes. By asking questions of player A and player B, player C tries to determine which of the two is the man, and which of the two is the woman. Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator.
In what Sterret refers to as the "Original Imitation Game Test", Turing proposed that the role of player A be replaced with a computer. The computer's task is therefore to pretend to be a woman and to attempt to trick the interrogator into making an incorrect evaluation. The success of the computer is determined by comparing the outcome of the game when player A is a computer against the outcome when player A is a woman. If, as Turing puts it, "the interrogator decide[s] wrongly as often when the game is played [with the computer] as he does when the game is played between a man and a woman", then it can be argued that the computer is intelligent.
The second version comes later in Turing's 1950 paper. As with the Original Imitation Game Test, the role of player A is performed by a computer. The difference is that now the role of player B is to be performed by a man, rather than by a woman.
In this version both player A (the computer) and player B are trying to trick the interrogator into making an incorrect decision.
A common understanding of the Turing test is that the purpose was not specifically to test if a computer is able to fool an interrogator into believing that it is a woman, but to test whether or not a computer could imitate a human. While there is some dispute as to whether or not this interpretation was intended by Turing (for example, Sterrett believes that it was, and thus conflates the second version with this one, while others, such as Traiger, do not), this has nevertheless led to what can be viewed as the "standard interpretation". In this version, player A is a computer, and player B is a person of either gender. The role of the interrogator is not to determine which is male and which is female, but to determine which is a computer and which is a human.
There has been some controversy over which of the alternative formulations of the test Turing intended. Sterret argues that two distinct tests can be extracted from Turing's 1950 paper, and that, pace Turing's remark, they are not equivalent. The test that employs the party game and compares frequencies of success in the game is referred to as the "Original Imitation Game Test" whereas the test consisting of a human judge conversing with a human and a machine is referred to as the "Standard Turing Test", noting that Sterret equates this with the "standard interpretation" rather than the second version of the imitation game. Sterrett agrees that the Standard Turing Test (STT) has the problems its critics cite, but argues that, in contrast, the Original Imitation Game Test (OIG Test) so defined is immune to many of them, due to a crucial difference: the OIG Test, unlike the STT, does not make similarity to a human performance the criterion of the test, even though it employs a human performance in setting a criterion for machine intelligence. A man can fail the OIG Test, but it is argued that this is a virtue of a test of intelligence if failure indicates a lack of resourcefulness. It is argued that the OIG Test requires the resourcefulness associated with intelligence and not merely "simulation of human conversational behaviour". The general structure of the OIG Test could even be used with nonverbal versions of imitation games.
Still other writers have interpreted Turing to be proposing that the imitation game itself is the test, without specifying how to take into account Turing's statement that the test he proposed using the party version of the imitation game is based upon a criterion of comparative frequency of success in that imitation game, rather than a capacity to succeed at one round of the game.
Turing never makes it clear as to whether or not the interrogator in his tests is aware that one of the participants is a computer. To return to the Original Imitation Game, Turing states only that Player A is to be replaced with a machine, not that player C is to be made aware of this replacement. When Colby, Hilf, Weber and Kramer tested PARRY, they did so by assuming that the interrogators did not need to know that one or more of those being interviewed was a computer during the interrogation. But, as Saygin and others highlight, this makes a big difference to the implementation and outcome of the test.
In order to pass a well designed Turing test, the machine would have to use natural language, to reason, to have knowledge and to learn. The test can be extended to include video input, as well as a "hatch" through which objects can be passed, and this would force the machine to demonstrate the skill of vision and robotics as well. Together these represent almost all the major problems of artificial intelligence.
The test is explicitly anthropomorphic. It only tests if the subject resembles a human being, not whether the subject is generally "intelligent" or "sentient". The Turing test will fail to test for general intelligence in two ways:
Stuart J. Russell and Peter Norvig argue that the anthropomorphism of the test prevents it from being truly useful for the task of engineering intelligent machines. They write: "Aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'"
A machine passing the Turing test may be able to simulate human conversational behaviour but the machine might just follow some cleverly devised rules. Two famous examples of this line of argument against the Turing test are John Searle's Chinese room argument and Ned Block's Blockhead argument.
Even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has consciousness, or that it has intentionality. Perhaps intelligence and consciousness, for example, are such that neither one necessarily implies the other. In that case, the Turing test might fail to capture one of the key differences between intelligent machines and intelligent people.
By extrapolating an exponential growth of technology over several decades, futurist Ray Kurzweil predicted that Turing-test-capable computers would be manufactured around the year 2020, roughly speaking. See the Moore's Law article and the references therein for discussions of the plausibility of this argument.
As of 2008, no computer has passed the Turing test as such. Simple conversational programs such as ELIZA have fooled people into believing they are talking to another human being, such as in an informal experiment termed AOLiza. However, such "successes" are not the same as a Turing Test. Most obviously, the human party in the conversation has no reason to suspect they are talking to anything other than a human, whereas in a real Turing test the questioner is actively trying to determine the nature of the entity they are chatting with. Documented cases are usually in environments such as Internet Relay Chat where conversation is sometimes stilted and meaningless, and in which no understanding of a conversation is necessary. Additionally, many internet relay chat participants use English as a second or third language, thus making it even more likely that they would assume that an unintelligent comment by the conversational program is simply something they have misunderstood, and do not recognize the very non-human errors they make. See ELIZA effect.
The Loebner prize is an annual competition to determine the best Turing test competitors. Although they award an annual prize for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour (with learning AI Jabberwacky winning in 2005 and 2006, and A.L.I.C.E. before that), they have an additional prize for a system that in their opinion passes a Turing test. This second prize has not yet been awarded. The creators of Jabberwacky have proposed a personal Turing Test: the ability to pass the imitation test while attempting to specifically imitate the human player, with whom the AI will have conversed at length before the test.
In 2008 the competition for the Loebner prize is being co-organised by Kevin Warwick and held at the University of Reading on October 12. The directive for the competition is to stay as close as possible to Turing's original statements made in his 1950 paper, such that it can be ascertained if any machines are presently close to 'passing the test'. An academic meeting discussing the Turing Test, organised by the Society for the Study of Artificial Intelligence and the Simulation of Behaviour, is being held in parallel at the same venue.
Trying to pass the Turing test in its full generality is not, as of 2005, an active focus of much mainstream academic or commercial effort. Current research in AI-related fields is aimed at more modest and specific goals.
The first bet of the Long Bet Project is a $10,000 one between Mitch Kapor (pessimist) and Ray Kurzweil (optimist) about whether a computer will pass a Turing Test by the year 2029. The bet specifies the conditions in some detail.
A modification of the Turing test, where the objective or one or more of the roles have been reversed between computers and humans, is termed a reverse Turing test.
Another variation of the Turing test is described as the Subject matter expert Turing test where a computer's response cannot be distinguished from an expert in a given field.
As brain and body scanning techniques improve it may also be possible to replicate the essential data elements of a person to a computer system. The Immortality test variation of the Turing test would determine if a person's essential character is reproduced with enough fidelity to make it impossible to distinguish a reproduction of a person from the original person.
The Minimum Intelligent Signal Test proposed by Chris McKinstry, is another variation of Turing's test, but where only binary responses are permitted. It is typically used to gather statistical data against which the performance of artificial intelligence programs may be measured.
Another variation of the reverse Turing test is implied in the work of psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another. Carrying this idea forward, R. D. Hinshelwood described the mind as a "mind recognizing apparatus", noting that this might be some sort of "supplement" to the Turing test. To make this more explicit, the challenge would be for the computer to be able to determine if it were interacting with a human or another computer. This is an extension of the original question Turing was attempting to answer, but would, perhaps, be a high enough standard to define a machine that could "think" in a way we typically define as characteristically human.
Another variation is the Meta Turing test, in which the subject being tested (for example a computer) is classified as intelligent if it itself has created something that the subject itself wants to test for intelligence.
Real Turing tests, such as the Loebner prize, do not usually force programs to demonstrate the full range of intelligence and are reserved for testing chatterbot programs. However, even in this limited form these tests are still very rigorous. The 2008 Loebner prize however is sticking closely to Turing's original concepts - for example conversations will be for 5 minutes only.
CAPTCHA is a form of reverse Turing test. Before being allowed to do some action on a website, the user is presented with alphanumerical characters in a distorted graphic image and asked to recognise it. This is intended to prevent automated systems from abusing the site. The rationale is that software sufficiently sophisticated to read the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human being.
Publication No. WO/2009/128773 Published on Oct. 22, Assigned to Active Focus Sweden for Dispensing, Storage Container (Swedish Inventors)
Nov 02, 2009; GENEVA, Nov. 4 -- Helle Henriksveen and Bjorn Brorsson, both of Sweden, have developed a dispensing and storage container. The...
Publication No. WO/2010/083211 Published on July 22, Assigned to eVision for Electro-Active Focus, Zoom System (American Inventors)
Jul 23, 2010; GENEVA, July 23 -- Dwight P. Duston and Anthony Van Heugten, both of the U. S., have developed an electro-active focus and zoom...
US Patent Issued to Microsoft on Jan. 31 for "Hidden Desktop Director for an Adaptive Device" (Washington Inventors)
Feb 04, 2012; ALEXANDRIA, Va., Feb. 4 -- United States Patent no. 8,108,578, issued on Jan. 31, was assigned to Microsoft Corp. (Redmond, Wash...