Definitions

Empirical probability

Empirical probability, also known as relative frequency, or experimental probability, is the ratio of the number favourable outcomes to the total number of trials , not in a sample space but in an actual sequence of experiments. In a more general sense, empirical probability estimates probabilities from experience and observation. The phrase a posteriori probability has also been used an alternative to empirical probability or relative frequency. This unusual usage of the phrase is not directly related to Bayesian inference and not to be confused with its equally occasional use to refer to posterior probability, which is something else.

In statistical terms, the empirical probability is an estimate of a probability. If modelling using a binomial distribution is appropriate, it is the maximum likelihood estimate. It is the Bayesian estimate for the same case if certain assumptions are made for the prior distribution of the probability

An advantage of estimating probabilities using empirical probabilities is that this procedure is relatively free of assumptions. For example, consider estimating the probability among a population of men that they satisfy two conditions: (i) that they are over 6 feet in height; (ii) that they prefer strawberry jam to raspberry jam. A direct estimate could be found by counting the number of men who satisfy both conditions to give the empirical probability the combined condition. An alternative estimate could be found by multiplying the proportion of men who are over 6 feet in height with the proportion of men who prefer strawberry jam to raspberry jam, but this estimate relies on the assumption that the two conditions are statistically independent.

A disadvantage in using empirical probabilities arises in estimating probabilities which are either very close to zero, or very close to one. In these cases very large sample sizes would be needed in order to estimate such probabilities to a good standard of relative accuracy. Here statistical models can help, depending on the context, and in general one can hope that such models would provide improvements in accuracy compared to empirical probabilities, provided that the assumptions involved actually do hold. For example, consider estimating the probability that the lowest of the daily-maximum temperatures at a site in February in any one year is less zero degrees Celsius. A record of such temperatures in past years could be used to estimate this probability. A model-based alternative would be to select of family of probability distributions and fit it to the dataset contain past yearly values: the fitted distribution would provide an alternative estimate of the required probability. This alternative method can provide an estimate of the probability even if all values in the record are greater than zero.