As with biological families, the evidence of relationship is observable shared characteristics. An accurately identified family is a phylogenetic unit; that is, all its members derive from a common ancestor, and all attested descendants of that ancestor are included in the family. Most of the world's languages are known to belong to language families. For the others, family relationships are not known or only tentatively proposed.
The concept of language families is based on the assumption that over time languages gradually diverge into dialects and then into new languages. However, linguistic ancestry is less clear-cut than biological ancestry, because there are extreme cases of languages mixing due to language contact in conquest or trade, whereas biological species normally don't interbreed. In the formation of creole languages and other types of mixed languages, there may be no one ancestor of a given language. In addition, a number of sign languages have developed in isolation and may have no relatives at all. However, these cases are relatively rare and most languages can be unambiguously classified.
The common ancestor of a language family is seldom known directly, since most languages have a relatively short recorded history. However, it is possible to recover many features of a proto-language by applying the comparative method—a reconstructive procedure worked out by 19th century linguist August Schleicher. This can demonstrate the validity of many of the proposed families in the list of language families. For example, the reconstructible common ancestor of the Indo-European language family is called Proto-Indo-European. Proto-Indo-European is not attested by written records, since it was spoken before the invention of writing.
Sometimes, though, a proto-language can be identified with a historically known language. For instance, dialects of Old Norse are the proto-language of Norwegian, Swedish, Danish, Faroese and Icelandic. Likewise, the Appendix Probi depicts Proto-Romance, a language almost unattested due to the prestige of Classical Latin, a highly stylised literary dialect not representative of the speech of ordinary people.
Language families can be divided into smaller phylogenetic units, conventionally referred to as branches of the family because the history of a language family is often represented as a tree diagram. However, the term family is not restricted to any one level of this "tree". The Germanic family, for example, is a branch of the Indo-European family. Some taxonomists restrict the term family to a certain level, but there is little consensus in how to do so. Those who affix such labels also subdivide branches into groups, and groups into complexes. The terms superfamily, phylum, and stock are applied to proposed groupings of language families whose status as phylogenetic units is generally considered to be unsubstantiated by accepted historical linguistic methods.
Languages that cannot be reliably classified into any family are known as isolates. A language isolated in its own branch within a family, such as Greek within Indo-European, is often also called an isolate; but the meaning of isolate in such cases is usually clarified. For instance, Greek might be referred to as an Indo-European isolate. The isolation of modern Greek, however, is not typical of its relationship to other languages at other times in its history. Several Greek dialects evolved out of the larger Indo-European language group; and later, Greek words influenced many other languages. By contrast, the Basque language is a living modern language and a near perfect isolate. The history of its lexical, phonetic, and syntactic structures is not known, and is not easily associated to other languages, though it has been influenced by Romance languages in the region, like Castilian Spanish, Occitan, and French.
Connections within and between language families are often used by geneticists and archaeologists, in combination with DNA evidence and archaeological evidence, to help reconstruct prehistoric migrations and other prehistoric developments, such as the spread of the Neolithic complex of farming, herding, pottery, and polished stone utensils. For the scientists concerned, this is treacherous but necessary ground: the linguistic evidence is often vital to resolving the problems concerned, but must be handled with caution, for two reasons: first, it is often a delicate matter to relate languages to archaeological cultures, on the one hand, and to genetic lineages, on the other; second, many proposed language relationships are controversial, which often requires non-linguists to take a stand on linguistic issues, a professionally uncomfortable but often inevitable situation.
The Linguist List is now working on a National Science Foundation funded project entitled Multitree, to build a database of all hypothesized language relationships, with a full searchable bibliography for each.
Membership of languages in the same language family is determined by a genetic relationship. The languages involved present shared retentions, i.e., features of the proto-language (or reflexes of such features) that cannot be explained better by chance or borrowing (convergence). Membership in a branch/group/subgroup within a language family is determined by shared innovations which are presumed to have taken place in a common ancestor. For example, what makes Germanic languages "Germanic" is that large parts of the structures of all the languages so designated can be stated just once for all of them. In other words, they can be treated as an innovation that took place in Proto-Germanic, the source of all the Germanic languages.
Shared innovations acquired by borrowing or other means, are not considered genetic and have no bearing with the language family concept. It has been asserted, for example, that many of the more striking features shared by Italic languages (Latin, Oscan, Umbrian, etc.) might well be "areal features". More certainly, very similar-looking alterations in the systems of long vowels in the West Germanic languages greatly postdate any possible notion of a proto-language innovation (and cannot readily be regarded as "areal", either, since English and continental West Germanic were not a linguistic area). In a similar vein, there are many similar unique innovations in Germanic and Baltic/Slavic that are far more likely to be areal features than traceable to a common proto-language. But legitimate uncertainty about whether shared innovations are areal features, coincidence, or inheritance from a common ancestor, leads to disagreement over the proper subdivisions of any large language family.
A sprachbund is a geographic area having several languages that feature common linguistic structures. The similarities between those languages are caused by language contact, not by chance or common origin, and are not recognized as criteria that define a language family.