Although the street definition of correlation applies to any two items that are related (such as gender and political affiliation), statisticians use this term only in the context of two numerical variables. The formal term for correlation is the correlation coefficient. Many different correlation measures have been created; the one used in this case is called the Pearson correlation coefficient.
The formula for the correlation (r) is
where n is the number of pairs of data;
are the sample means of all the x-values and all the y-values, respectively; and sx and sy are the sample standard deviations of all the x- and y-values, respectively.
You can use the following steps to calculate the correlation, r, from a data set:
-
Find the mean of all the x-values
-
Find the standard deviation of all the x-values (call it sx) and the standard deviation of all the y-values (call it sy).
For example, to find sx, you would use the following equation:
-
For each of the n pairs (x, y) in the data set, take
-
Add up the n results from Step 3.
-
Divide the sum by sx ∗ sy.
-
Divide the result by n – 1, where n is the number of (x, y) pairs. (It’s the same as multiplying by 1 over n – 1.)
This gives you the correlation, r.
-
Calculating the mean of the x and y values, you get
-
The standard deviations are sx = 1.73 and sy = 1.00.
-
The n = 3 differences found in Step 2 multiplied together are: (3 – 4)(2 – 3) = (– 1)( – 1) = +1; (3 – 4)(3 – 3) = (– 1)(0) = 0; (6 – 4)(4 – 3) = (2)(1) = +2.
-
Adding the n = 3 Step 3 results, you get 1 + 0 + 2 = 3.
-
Dividing by sx ∗ sy gives you 3 / (1.73 ∗ 1.00) = 3 / 1.73 = 1.73. (It’s just a coincidence that the result from Step 5 is also 1.73.)
-
Now divide the Step 5 result by 3 – 1 (which is 2), and you get the correlation r = 0.87.