Of all of the misunderstood statistical issues, the one that’s perhaps the most problematic is the misuse of the concepts of correlation and causation. Correlation, as a statistical term, is the extent to which two numerical variables have a linear relationship (that is, a relationship that increases or decreases at a constant rate). Following are three examples of correlated variables:
The number of times a cricket chirps per second is strongly related to temperature; when it’s cold outside, they chirp less frequently, and as the temperature warms up, they chirp at a steadily increasing rate. In statistical terms, you say the number of cricket chirps and temperature have a strong positive correlation.
The number of crimes (per capita) has often been found to be related to the number of police officers in a given area. When more police officers patrol the area, crime tends to be lower, and when fewer police officers are present in the same area, crime tends to be higher. In statistical terms we say the number of police officers and the number of crimes have a strong negative correlation.
The consumption of ice cream (pints per person) and the number of murders in New York are positively correlated. That is, as the amount of ice cream sold per person increases, the number of murders increases. Strange but true!
But correlation as a statistic isn’t able to explain why or how the relationship between two variables, x and y, exists; only that it does exist.
Causation goes a step further than correlation, stating that a change in the value of the x variable will cause a change in the value of the y variable. Too many times in research, in the media, or in the public consumption of statistical results, that leap is made when it shouldn’t be. For instance, you can’t claim that consumption of ice cream causes an increase in murder rates just because they are correlated. In fact, the study showed that temperature was positively correlated with both ice cream sales and murders. When can you make the causation leap? The most compelling case is when a well-designed experiment is conducted that rules out other factors that could be related to the outcomes.
You may find yourself wanting to jump to a cause-and-effect relationship when a correlation is found; researchers, the media, and the general public do it all the time. However, before making any conclusions, look at how the data were collected and/or wait to see if other researchers are able to replicate the results (the first thing they try to do after someone else’s “groundbreaking result” hits the airwaves).