In classical genetics, the genome of a diploid organism including eukarya refers to a full set of chromosomes or genes in a gamete, thereby a regular somatic cell contains two full sets of genomes. In a haploid organism, including bacteria, archaea, virus, and mitochondria, a cell contains only a single set of genome, usually in a single circular or contiguous linear DNA (or RNA for some viruses). In modern molecular biology the genome of an organism is its hereditary information encoded in DNA (or, for some viruses, RNA). The genome includes both the genes and the non-coding sequences of the DNA. The term was adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany. The Oxford English Dictionary suggests the name to be a portmanteau of the words gene and chromosome, however, many related -ome words already existed, such as biome and rhizome, forming a vocabulary into which genome fits systematically.
More precisely, the genome of an organism is a complete genetic sequence on one set of chromosomes; for example, one of the two sets that a diploid individual carries in every somatic cell. The term genome can be applied specifically to mean that stored on a complete set of nuclear DNA (i.e., the "nuclear genome") but can also be applied to that stored within organelles that contain their own DNA, as with the mitochondrial genome or the chloroplast genome. When people say that the genome of a sexually reproducing species has been "sequenced", typically they are referring to a determination of the sequences of one set of autosomes and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite read from the chromosomes of various individuals. In general use, the phrase "genetic makeup" is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics, which distinguishes it from genetics which generally studies the properties of single genes or groups of genes.
Both the number of base pairs and the number of genes vary widely from one species to another, and there is little connection between the two (an observation known as the C-value paradox). At present, the highest known number of genes is around 60,000, for the protozoan causing trichomoniasis (see List of sequenced eukaryotic genomes), almost three times as many as in the human genome.
An analogy to the human genome stored on DNA is that of instructions stored in a book:
In eukaryotes such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain mitochondria that have their own DNA, the genes in this mitochondrial DNA are not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome, often referred to as the "mitochondrial genome".
Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.
The Human Genome Project was organized to map and to sequence the human genome. Other genome projects include mouse, rice, the plant Arabidopsis thaliana, the puffer fish, bacteria like E. coli, etc. In 1976, Walter Fiers at the University of Ghent (Belgium) was the first to establish the complete nucleotide sequence of a viral RNA-genome (bacteriophage MS2). The first DNA-genome project to be completed was the Phage Φ-X174, with only 5368 base pairs, which was sequenced by Fred Sanger in 1977 . The first bacterial genome to be completed was that of Haemophilus influenzae, completed by a team at The Institute for Genomic Research in 1995.
In May 2007, the New York Times announced that the full genome of DNA pioneer James D. Watson had been recorded.
The article noted that some scientists believe this to be the gateway to upcoming personalized genomic medicine.
Many genomes have been sequenced by various genome projects. The cost of sequencing continues to drop.
| Organism | Genome size (base pairs) | Note |
|---|---|---|
| Virus, Bacteriophage MS2 | 3,569 | First sequenced RNA-genome |
| Virus, SV40 | 5,224 | |
| Virus, Phage Φ-X174; | 5,386 | First sequenced DNA-genome |
| Virus, Phage λ | 50,000 | |
| Bacterium, Haemophilus influenzae | 1,830,000 | First genome of living organism, July 1995 |
| Bacterium, Carsonella ruddii | 160,000 | Smallest non-viral genome. |
| Bacterium, Buchnera aphidicola | 600,000 | |
| Bacterium, Wigglesworthia glossinidia | 700,000 | |
| Bacterium, Escherichia coli | 4,000,000 | |
| Amoeba, Amoeba dubia | 670,000,000,000 | Largest known genome. |
| Plant, Arabidopsis thaliana | 157,000,000 | First plant genome sequenced, Dec 2000. |
| Plant, Genlisea margaretae | 63,400,000 | Smallest recorded flowering plant genome, 2006. |
| Plant, Fritillaria assyrica | 130,000,000,000 | |
| Plant, Populus trichocarpa | 480,000,000 | First tree genome, Sept 2006 |
| Yeast,Saccharomyces cerevisiae | 20,000,000 | |
| Fungus, Aspergillus nidulans | 30,000,000 | |
| Nematode, Caenorhabditis elegans | 98,000,000 | First multicellular animal genome, December 1998 |
| Insect, Drosophila melanogaster aka Fruit Fly | 130,000,000 | |
| Insect, Bombyx mori aka Silk Moth | 530,000,000 | |
| Insect, Apis mellifera aka Honey Bee | 1,770,000,000 | |
| Fish, Tetraodon nigroviridis, type of Puffer fish | 385,000,000 | Smallest vertebrate genome known |
| Mammal, Homo sapiens | 3,200,000,000 | |
| Fish, Protopterus aethiopicus aka Marbled lungfish | 130,000,000,000 | Largest vertebrate genome known |
Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology). The work is both in vivo and in silico.
Duplications play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes. Such duplications are probably fundamental to the creation of genetic novelty.
Horizontal gene transfer is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells seem to have experienced a transfer of some genetic material from their chloroplast and mitochondrial genomes to their nuclear chromosomes.