The workflow of MLST involves: 1) data collection, 2) data analysis and 3) multilocus sequence analysis. In first section, definitive identification of variation is obtained by nucleotide sequence determination of gene fragments. In data analysis all unique sequences are assigned allele numbers and combined in to an allelic profile and assigned a sequence type (ST). If the new alleles and STs are found, they are stored in database after verification. In final section of MLST scheme the relatedness of isolates are made by comparing allelic profiles. Researchers do epidemiological and phylogenetical studies by comparing STs of different clonal complexes. A huge set of data is produced during sequencing and identification process. So bioinformatic techniques are used to arrange, manage, analyze and merge of the biological data.
To strike the balance between the acceptable identification power, time and cost for the strain typing, about seven to eight house-keeping genes are commonly used in the laboratories. Quoting Staphylococcus aureus as an example, there are seven house-keeping genes are used in MLST typing, these genes include carbamate kinase (arcC), shikimate dehydrogenase (aroE), glycerol kinase (glpF), guanylate kinase (gmk), phosphate acetyltransferase (pta), triosephosphate isomerase (tpi) and acetyl coenzyme A acetyltransferase (yqiL) as specified by the MLST website. However, it is not uncommon that up to ten house-keeping genes may be used such as Vibrio vulnificus, the house-keeping genes being used include Glucose-6-phosphate isomerase (glp), DNA gyrase, subunit B (gyrB), Malate-lactate dehydrogenase (mdh), Methionyl-tRNA synthetase (metG), Phosphoribosylaminoimidazole synthetase (purM), Threonine dehyrogenase (dtdS), Diaminopimelate decarboxylase (lysA), Transhydrogenase alpha subunit (pntA), Dihydroorotase (pyrC), Tryptophanase (tnaA). As clearly illustrated from the above two examples, both the number of house-keeping genes and which genes used in MLST may be different from species to species.
For each of these housekeeping gene, the different sequences are assigned as alleles and the alleles at the loci provide an allelic profile, a series of profiles can be the identification marker for strain typing. Sequences that differ at even a single nucleotide are assigned as different alleles and no weighting is given to take into account the number of nucleotide differences between alleles, as we cannot distinguish whether differences at multiple nucleotide sites are a result of multiple point mutations or a single recombinational exchange. The large number of alleles at each of the seven loci provides the ability to distinguish billions of different allelic profiles, and a strain with the most common allele at each locus would only be expected to occur by chance approximately once in 10,000 isolates. Despite MLST provides high discriminatory power, the accumulation of nucleotide changes in housekeeping genes is a relatively slow process and the allelic profile of a bacterial isolate is sufficiently stable over time for the method to be ideal for global epidemiology.
The relatedness of isolates is displayed as a dendrogram constructed using the matrix of pair-wise differences between their allelic profiles. The dendrogram is only a convenient way of displaying those isolates that have identical or very similar allelic profiles that can be assumed to be derived from a common ancestor; the relationships between isolates that differ at more than three out of seven loci are likely to be unreliable and should not be taken to infer their phylogeny.
Earlier serological typing approach has been established for differentiating bacterial isolates. But immunological typing has drawbacks like reliance on few antigenic loci, unpredictable reactivities of antibodies with different antigenic variants. Several molecular typing schemes have been proposed to determine the relatedness of pathogens such as pulsed-field gel electrophoresis (PFGE), ribotyping, and PCR-based fingerprinting. But these DNA banding-based subtyping methods do not provide meaningful evolutionary analyses. Despite PFGE is considered by many researchers as the “gold standard”, many strains are not typable by this technique due to the degradation of the DNA during the process (gel smears).
The approach of MLST is based on the principle of Multi locus enzyme eletrophoresis (MLEE), which is based on different electrophoretic mobilities within multiple core metabolic genes under investigation. The alleles at each locus define an electrophoretic type, its relatedness of isolates can be visualized on a dendrogram generated from the matrix of pairwise difference between the electrophoretic types. MLST is different from MLEE in assignation of alleles by nucleotide sequencing rather than the electrophoretic mobility of their gene products. The level of resolution is much higher than MLEE (problem with band resolution). Furthermore, the main drawback of MLEE is that it determines phenotypes and not genotypes. Phenotype of the enzyme can easily be altered in response to environmental conditions and badly affect the reproducibility of MLEE results. So the MLEE data obtained by different laboratories is incomparable where as the MLST provides portable and comparable DNA sequence data and has great potential for automation and standardization.
MLST should not be confused with DNA barcoding. The latter is a taxonomic method that makes application of short genetic marker in mitochondrial DNA to recognize particular species of eukaryotes. It is based on the fact that mitochondrial DNA (mtDNA) has a relatively fast mutation rate, which gives significant variation in mtDNA sequences between species. While MLST is not limited only to eukaryotes it was first established for prokaryotes and now eukaryotes are also included.
MLST is highly unambiguous and portable. Materials required for ST determination can be exchanged between laboratories. Primer sequences and protocols can be accessed electronically. It is reproducible and scalable. MLST is automated, combines advances in high throughput sequencing and bioinformatics with established population genetics techniques. MLST data can be used to investigate evolutionary relationships among bacteria. MLST provides good discriminatory power to differentiate isolates.
The application of MLST is huge, and provides a resource for the scientific, public health, and veterinary communities as well as the food industry. Here are some examples that are worth mentioning –
Campylobacter is the common causative agent for bacterial infectious intestinal diseases, usually arising from undercooked poultry or unpasteurised milk. However, its epidemiology is poorly understood since outbreaks are rarely detected, so that the sources and transmission routes of outbreak are not easily traced. In addition, Campylobacter genomes are genetically diverse and unstable with frequent inter and intragenomic recombination, together with phase variation, which complicates the interpretation of data from many typing methods. Until recently, with the application of MLST technique, Campylobacter typing has achieved a great success and added onto the MLST database. As at 1 May 2008, the Campylobacter MLST database contains 3 516 isolates and about 30 publications that use or mention MLST in research on Campylobacter (http://pubmlst.org/campylobacter/)
Neisseria meningitides MLST has provided a more richly textured picture of bacteria within human populations and be able to dig much deeper on strain variants that may be pathogenic to human, plants and animals. MLST technique was first used by Maiden et al. (1) to characterize Neisseria meningitides by using six loci. The application of MLST has clearly resolved the major meningococcal lineages known to be responsible for invasive disease around the world nowadays. To improve the level of discriminatory power between the major invasive lineages, seven loci is now being used and has been accepted by many laboratories as the method of choice for characterizing meningococcal isolates. It is a well known fact that recombinational exchanges are commonly occurred in N. meningitidis that leads to rapid diversification of meningococcal clones. MLST has successfully provided a reliable method for characterization of clones within other bacterial species in which the rates of clonal diversification are generally lower.
Staphylococcus aureus S. aureus causes a number of diseases, in particular notorious methicillin-resistant S. aureus (MRSA), which has received growing concerns over its resistance to almost all antibiotics except vancomycin. However, most serious S. aureus infections in the community, and many in hospitals, are caused by methicillin-susceptible isolates (MSSA) and there have been few attempts to identify the hypervirulent MSSA clones associated with serious disease. MLST was therefore developed to provide an unambiguous method of characterizing MRSA clones and for the identification of the MSSA clones associated with serious disease. (References?)
Streptococcus pyogenes S. pyogenes causes diseases ranging from pharyngitis to life-threatening impetigo including necrotizing fasciitis. An MLST scheme for S. pyogenes has been developed in collaboration with the laboratory of Debra Bessen at Yale. At present, the database contains the allelic profiles of isolates that represent the worldwide diversity of the organism and isolates from serious invasive disease.
Candida albicans C. albicans is the main fungal pathogen of humans and is responsible for hospital-acquired bloodstream infections. A recent study by applying MLST technique on sequencing 6–8 selected housekeeping genes and identification of polymorphic nucleotide sites has been introduced for the characterization of C. albicans isolates. Combination of the alleles at the different loci results in unique diploid sequence types that can be used to discriminate strains. MLST has been shown successfully applied to study the epidemiology of C. albicans in the hospital as well as the diversity of C. albicans isolates obtained from diverse ecological niches including human and animal hosts.
MLST appears best in population genetic study but it is expensive. Due to the sequence conservation in housekeeping genes, MLST sometimes lacks the discriminatory power to differentiate bacterial strains, which limits its use in epidemiological investigations. To improve the discriminatory power of MLST, a multi-virulence-locus sequence typing (MVLST) approach has been developed using Listeria monocytogenes . MVLST broadens the benefits of MLST but targets virulence genes, which may be more polymorphic than housekeeping genes. Population genetics is not the only thing that matters in any epidemic outbreak. Matter-of-fact, virulence factors are also important in causing disease, and population genetic studies by their nature will not be the top or only way to monitor for these. This is because the genes involved are often highly recombining and mobile between strains in comparison with the population genetic framework. Thus, for example in the case of Escherichia coli, identifying strains carrying toxin genes is more important than to have a population genetics-based evaluation of prevalent strains.
MLST database is developed as it contains the sequence typing of isolates for each bacteria or pathogens species together with epidemiological data, interrogation and analysis software and is widely used as a tool for researchers and public healthcare workers in the society. The MLST databases are hosted by 2 web servers which currently located in Imperial College, London () as well as in Oxford University ().