If there are no tied ranks, i.e.
then ρ is given by:
If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used instead of this formula:
One has to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values:
An example of averaging ranks
In the table below, notice how the rank of values that are the same is the mean of what their ranks would otherwise be.
|Variable||Position in the descending order||Rank|
In this case we cannot use the shortcut formula (because of the tied ranks in the data) and must use the second, product-moment form.
|IQ,||Hours of TV per week,|
The first step is to sort this data by the second column. Next, two more columns are created ( and ). The last of these columns () is assigned 1,2,3,...n, and then the data is sorted by the first original column (). The first of the newly created columns () is assigned 1,2,3,...n. Then a column is created to hold the differences between the two rank columns ( and ). Finally another column should be created. This is just column squared.
After doing this process with the example data you should end up with something like:
|IQ,||Hours of TV per week,||rank||rank|
The values in the column can now be added to find . The value of n is 10. So these values can now be substituted back into the equation,
which evaluates to which shows that the correlation between IQ and hour spend between TV is really low (barely any correlation). In the case of ties in the original values, this formula should not be used. Instead, the Pearson correlation coefficient should be calculated on the ranks (where ties are given ranks, as described above).
Although the permutation test is often trivial to perform for anyone with computing resources and programming experience, traditional methods for determining significance are still widely used. The most basic approach is to compare the observed ρ with published tables for various levels of significance. This is a simple solution if the significance only needs to be known within a certain range or less than a certain value, as long as tables are available that specify the desired ranges. A reference to such a table is given below. However, generating these tables is computationally intensive and complicated mathematical tricks have been used over the years to generate tables for larger and larger sample sizes, so it is not practical for most people to extend existing tables.
An alternative approach available for sufficiently large sample sizes is an approximation to the Student's t-distribution with degrees of freedom N-2. For sample sizes above about 20, the variable
A generalization of the Spearman coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and we predict that the observations will have a particular order. For example, a number of subjects might each be given three trials at the same task, and we predict that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. B. Page and is usually referred to as Page's trend test for ordered alternatives.