Correlation ratio

Correlation ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample.

Suppose each observation is yxi where x indicates the category that observation is in and i is the label of the particular observation. Let nx be the number of observations in category x and

overline{y}_x=frac{sum_i y_{xi}}{n_x} and overline{y}=frac{sum_x n_x overline{y}_x}{sum_x n_x},

where overline{y}_x is the mean of the category x and overline{y} is the mean of the whole population. The correlation ratio η (eta) is defined as to satisfy

eta^2 = frac{sum_x n_x (overline{y}_x-overline{y})^2}{sum_{x,i} (y_{xi}-overline{y})^2}.

It is worth noting that if the relationship between values of x ; and values of overline{y}_x is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient, otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.


The correlation ratio eta takes values between 0 and 1. The limit eta=0 represents the special case of no dispersion among the means of the different categories, while eta=1 refers to no dispersion within the respective categories. Note further, that eta is undefined when all data points of the complete population take the same value.


Suppose there is a distribution of test scores in three topics (categories):

  • Algebra: 45, 70, 29, 15 and 21 (5 scores)
  • Geometry: 40, 20, 30 and 42 (4 scores)
  • Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).

Then the subject averages are 36, 33 and 78, with an overall average of 52.

The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:

5 (36-52)^2 + 4 (33-52)^2 +6 (78-52)^2 = 6780
This gives
eta^2 = frac{6780}{9640}=0.7033ldots
suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
eta = sqrt{frac{6780}{9640}}=0.8386ldots
Observe that for eta = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.

The limit eta = 0 refers to the case without dispersion in the categories contributing to the overall dispersion. The trivial requirement for this extreme is that all category means are the same.

Search another word or see correlation ratioon Dictionary | Thesaurus |Spanish
Copyright © 2015, LLC. All rights reserved.
  • Please Login or Sign Up to use the Recent Searches feature