Several sets of BLOSUM exist using different alignment databases, named with numbers. BLOSUM with high numbers are designed for comparing closely related sequences, while BLOSUM with low numbers are designed for comparing distant related sequences. For example, BLOSUM80 is used for less divergent alignments, and BLOSUM45 is used for more divergent alignments. The matrices were created by merging (clustering) all sequences that were more similar than a given percentage into one single sequence and then comparing those sequences (that were all more divergent than the given percentage value) only; thus reducing the contribution of closely related sequences. The percentage used was appended to the name, giving BLOSUM80 for example where sequences that were more than 80% identical were clustered.
Scores within a BLOSUM are log-odds scores that measure, in an alignment, the logarithm for the ratio of the likelihood of two amino acids appearing with a biological sense and the likelihood of the same amino acids appearing by chance. The matrices are based on the minimum percentage identity of the aligned protein sequence used in calculating them. Every possible identity or substitution is assigned a score based on its observed frequences in the alignment of related proteins. A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions.
To calculate a matrix for BLOSUM, the following equation is used:
Here, is the probability of two amino acids and replacing each other in a homologous sequence, and and are the background probabilities of finding the amino acids and in any protein sequence at random. The factor is a scaling factor, set such that the matrix contains easily computable integer values.