Definitions

# Diversity index

In ecology, a diversity index is a statistic which is intended to measure the biodiversity of an ecosystem. More generally, diversity indices can be used to assess the diversity of any population in which each member belongs to a unique species. Estimators for diversity indices are likely to be biased, so caution is advisable when comparing similar values.

## Species richness

The species richness $S$ is simply the number of species present in an ecosystem. This index makes no use of relative abundances.

## Species Evenness

The species evenness is the relative abundance or proportion of individuals among the species.

## Simpson's diversity index

If $p_i$ is the fraction of all organisms which belong to the i-th species, then Simpson's diversity index is most commonly defined as the statistic

$D = sum_\left\{i=1\right\}^S p_i^2.$

This quantity was introduced by Edward Hugh Simpson.

If $n_i$ is the number of individuals of species $i$ which are counted, and $N$ is the total number of all individuals counted, then

$sum_\left\{i=1\right\}^S frac\left\{n_i \left(n_i -1\right)\right\}\left\{N \left(N-1\right)\right\}$
is an estimator for Simpson's index for sampling without replacement.

Note that $0 leq D leq 1$, with values near zero corresponding to highly diverse or heterogeneous ecosystems and values near one corresponding to more homogeneous ecosystems. Biologists who find this confusing sometimes use $1/D$ instead; confusingly, this reciprocal quantity is also called Simpson's index. Another response is to redefine Simpson's index as

$tilde\left\{D\right\} = 1 - D = 1 - sum_\left\{i=1\right\}^S p_i^2,$
This quantity is called by statisticians the index of diversity.

In sociology, psychology and management studies the index is often known as Blau's Index, as it was introduced into the literature by the sociologist Peter Blau.

## Shannon's diversity index

Shannon's diversity index is simply the ecologist's name for the communication entropy introduced by Claude Shannon:
$H = -sum_\left\{i=1\right\}^S p_i ln p_i$
where $p_i$ is the fraction of individuals belonging to the i-th species. This is by far the most widely used diversity index. The intuitive significance of this index can be described as follows. Suppose we devise binary codewords for each species in our ecosystem, with short codewords used for the most abundant species, and longer codewords for rare species. As we walk around and observe individual organisms, we call out the corresponding codeword. This gives a binary sequence. If we have used an efficient code, we will be able to save some breath by calling out a shorter sequence than would otherwise be the case. If so, the average codeword length we call out as we wander around will be close to the Shannon diversity index.

It is possible to write down estimators which attempt to correct for bias in finite sample sizes, but this would be misleading since communication entropy does not really fit expectations based upon parametric statistics. Differences arising from using two different estimators are likely to be overwhelmed by errors arising from other sources. Current best practice tends to use bootstrapping procedures to estimate communication entropy.

Shannon himself showed that his communication entropy enjoys some powerful formal properties, and furthermore, it is the unique quantity which does so. These observations are the foundation of its interpretation as a measure of statistical diversity (or "surprise", in the arena of communications). The applications of this quantity go far beyond the one discussed here; see the textbook cited below for an elementary survey of the extraordinary richness of modern information theory.

## Berger-Parker index

The Berger-Parker diversity index is simply
$operatorname\left\{max\right\}_\left\{1 leq i leq S\right\} , p_i$
This is an example of an index which uses only partial information about the relative abundances of the various species in its definition.

## Renyi entropy

The Species richness, the Shannon index, Simpson's index, and the Berger-Parker index can all be identified as particular examples of quantities bearing a simple relation to the Renyi entropy,
$H_alpha = frac\left\{1\right\}\left\{1-alpha\right\} ; log sum_\left\{i=1\right\}^S p_i^alpha$
for $alpha$ approaching $0, , 1, , 2, , infty$ respectively.

Unfortunately, the powerful formal properties of communication entropy do not generalize to Renyi's entropy, which largely explains the much greater power and popularity of Shannon's index with respect to its competitors.