Wikia

Psychology Wiki

Changes: Correlation does not imply causation

Edit

Back to page

(Examples)
 
Line 1: Line 1:
 
{{StatsPsy}}
 
{{StatsPsy}}
'''Correlation does not imply causation''' is a phrase used in the [[sciences]] and [[statistics]] to indicate that [[correlation]] between two variables does not imply there is a [[Causality|cause-and-effect]] relationship between the two. Its converse '''correlation implies causation''' is a [[logical fallacy]] by which two events that occur together are prematurely claimed to have a cause-and-effect relationship. It is also known as '''cum hoc ergo propter hoc''' (Latin for "with this, therefore because of this") and '''false cause'''.
+
"'''Correlation does not imply causation'''" (related to "ignoring a common cause" and [[questionable cause]]) is a phrase used in [[science]] and [[statistics]] to emphasize that a [[correlation]] between two variables does not automatically imply that one [[causality|causes]] the other (though correlation is ''necessary'' for linear causation in the absence of any third and countervailing causative variable, it can indicate possible causes or areas for further investigation; in other words, correlation is a hint).<ref name="Tufte 2006 5">{{Cite document
 
== Usage ==
 
 
In the most literal sense, to say a "''Correlation does not '''imply''' causation''" may sometimes be incorrect. In [[logic]], "imply" means
 
 
:* ''To involve as a '''necessary''' circumstance''. - which may make the above phrase correct in some cases.
 
This is the meaning intended by statisticians when they use the phrase. Indeed, '''p implies q''' has the technical meaning of [[logical implication]]: '''if p then q''' symbolized as '''p&nbsp;⇒&nbsp;q'''.
 
 
However, in everyday English, "imply" often means
 
 
:* ''To indicate or suggest''.
 
 
To say a "''Correlation does not '''suggest''' causation''" is not necessarily true: A demonstrably consistent correlation often ''suggests'' or ''increases the probability'' of some causal relationship (or ''implies'' it, in the latter sense of the term).
 
 
What the correlation does not do is ''prove'' causation, as arguments that use the '''cum hoc ergo propter hoc''' logical fallacy as a pattern of reasoning assert. <ref>Karl L. Wuensch, Department of Psychology, East Carolina University [http://72.14.235.104/search?q=cache:PAUfJBK8lg8J:core.ecu.edu/psyc/wuenschk/StatHelp/Correlation-Causation.htm+%22correlation+does+imply+causation%22&hl=en&gl=au&ct=clnk&cd=1&client=firefox-a When does correlation imply causation?]</ref>
 
 
[[Edward Tufte]], in a criticism of the brevity of [[Microsoft PowerPoint]] presentations, deprecates the use of ''is'' to relate correlation and causation (as in "''Correlation is not causation''"), citing its inaccuracy as incomplete.<ref>{{cite book
 
 
| last = Tufte
 
| last = Tufte
 
| first = Edward R.
 
| first = Edward R.
Line 9: Line 9:
 
| date = 2006
 
| date = 2006
 
| pages = 5
 
| pages = 5
| id = ISBN 0-9613921-5-0
+
| isbn = 0-9613921-5-0
| url = http://www.edwardtufte.com/tufte/powerpoint }}</ref> While it is not the case that correlation is causation, simply stating their nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and correlation must be at least expanded to either
+
| url = http://www.edwardtufte.com/tufte/powerpoint
:''Empirically observed covariation is a necessary but not sufficient condition for causality''.
+
| postscript = <!--None--> }}</ref><ref>[http://www.economics.soton.ac.uk/staff/aldrich/spurious.pdf] {{Cite journal | last=Aldrich | first=John | journal=Statistical Science | volume=10 | year=1995 | pages=364–376 | title=Correlations Genuine and Spurious in Pearson and Yule | jstor=2246135 | doi= 10.1214/ss/1177009870| issue=4 | postscript=<!--None-->}}</ref>
or
 
:''Correlation is not causation but it sure is a hint''.
 
   
== General pattern ==
+
The opposite belief, '''correlation proves causation''', is a [[logical fallacy]] by which two events that occur ''together'' are claimed to have a cause-and-effect relationship. The fallacy is also known as '''''cum hoc ergo propter hoc''''' ([[Latin]] for "with this, therefore because of this") and ''false cause''. It is a common fallacy in which it is assumed that, because two things or events occur together, one must be the cause of the other. By contrast, the fallacy, ''[[post hoc ergo propter hoc]]'', requires that one event occur ''after the other'', and so may be considered a related fallacy.
  +
  +
In a widely studied example, numerous [[epidemiological study|epidemiological studies]] showed that women who were taking combined [[Hormone replacement therapy (menopause)|hormone replacement therapy]] (HRT) also had a lower-than-average incidence of [[coronary heart disease]] (CHD), leading doctors to propose that HRT was protective against CHD. But [[randomized controlled trials]] showed that HRT caused a small but statistically significant ''increase'' in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher [[socio-economic group]]s ([[NRS social grade|ABC1]]), with better than average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than cause and effect, as had been supposed.<ref>{{cite journal |author=Lawlor DA, Davey Smith G, Ebrahim S |title=Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology? |journal=Int J Epidemiol |volume=33 |issue=3 |pages=464–7 |year=2004 |month=June |pmid=15166201 |doi=10.1093/ije/dyh124}}</ref>
  +
  +
==Usage==
  +
In [[logic]], the technical use of the word "implies" means "to be a ''[[Sufficient condition|sufficient]]'' circumstance." This is the meaning intended by statisticians when they say causation is not certain. Indeed, ''p implies q'' has the technical meaning of [[logical implication]]: ''if p then q'' symbolized as ''p&nbsp;→&nbsp;q''. That is "if circumstance ''p'' is true, then ''q'' necessarily follows." In this sense, it is always correct to say "Correlation does not ''imply'' causation."
  +
  +
However, in casual use, the word "imply" loosely means ''suggests'' rather than ''requires''. The idea that correlation and causation are connected is certainly true; where there is causation, there is likely to be correlation. Indeed, correlation is used when inferring causation; the important point is that such inferences are not always correct because there are other possibilities, as explained later in this article.
  +
  +
[[Edward Tufte]], in a criticism of the brevity of "correlation does not imply causation," deprecates the use of "is" to relate correlation and causation (as in "Correlation is not causation"), citing its inaccuracy as incomplete.<ref name="Tufte 2006 5"/> While it is not the case that correlation is causation, simply stating their nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and correlation is one of the following:<ref>{{cite book|last=Tufte|first=Edward R.|authorlink=Edward Tufte|title=The Cognitive Style of PowerPoint|url=http://books.google.com/?id=3oNRAAAAMAAJ&q=%22Empirically+observed+covariation%22+necessary&dq=%22Empirically+observed+covariation%22+necessary|year=2003|publisher=Graphics Press|location=Cheshire, Connecticut|isbn=0-9613921-5-0|page=4}}</ref>
  +
*"Empirically observed covariation is a necessary but not sufficient condition for causality."
  +
*"Correlation is not causation but it sure is a hint."
  +
  +
==General pattern==
   
 
The ''cum hoc ergo propter hoc'' logical fallacy can be expressed as follows:
 
The ''cum hoc ergo propter hoc'' logical fallacy can be expressed as follows:
* A occurs in correlation with B.
+
# ''A'' occurs in correlation with ''B''.
* Therefore, A causes B.
+
# Therefore, ''A'' causes ''B''.
   
In this type of logical fallacy, one makes a premature conclusion about [[causality]] after observing only a [[correlation]] between two or more factors. Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes ''taken for granted'' that A is causing B ''even when no evidence supports this''. This is a logical fallacy because there are at least four other possibilities:
+
In this type of logical fallacy, one makes a premature conclusion about [[causality]] after observing only a [[correlation]] between two or more factors. Generally, if one factor (''A'') is observed to only be correlated with another factor (''B''), it is sometimes taken for granted that ''A'' is causing ''B'', even when no evidence supports it. This is a logical fallacy because there are at least five possibilities:
   
# B may be the cause of A, or
+
# ''A'' may be the cause of ''B''.
# some unknown third factor is actually the cause of the relationship between A and B, or
+
# ''B'' may be the cause of ''A''.
# the "relationship" is so complex it can be labelled [[coincidence|coincidental]] (i.e., two events occurring at the same time that have no simple relationship to each other besides the fact that they are occurring at the same time).
+
# some unknown third factor ''C'' may actually be the cause of both ''A'' and ''B''.
# B may be the cause of A ''at the same time'' as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). This describes a ''[[self-reinforcement|self-reinforcing]]'' system.
+
# there may be a combination of the above three relationships. For example, ''B'' may be the cause of ''A'' at the same time as ''A'' is the cause of ''B'' (contradicting that the only relationship between ''A'' and ''B'' is that ''A'' causes ''B''). This describes a [[self-reinforcement|self-reinforcing]] system.
  +
# the "relationship" is a [[coincidence]] or so complex or indirect that it is more effectively called a coincidence (i.e. two events occurring at the same time that have no direct relationship to each other besides the fact that they are occurring at the same time). A larger [[sample size]] helps to reduce the chance of a coincidence, unless there is a [[systematic error]] in the experiment.
   
In other words, '''there can be no conclusion made regarding the ''existence'' or the ''direction'' of a cause and effect relationship ''only'' from the fact that A is correlated with B'''. Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is [[statistically significant]], a large [[effect size]] is observed, or a large part of the [[Coefficient of determination|variance is explained]].
+
In other words, there can be no conclusion made regarding the ''existence'' or the ''direction'' of a cause and effect relationship only from the fact that A and B are correlated. Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between ''A'' and ''B'' is [[statistical significance|statistically significant]], a large [[effect size]] is observed, or a large part of the [[Coefficient of determination|variance is explained]].
   
== Examples ==
+
==Examples of illogically inferring causation from correlation==
  +
{{inline|section|date=July 2012}}
  +
===B causes A (reverse causation)===
  +
:The more firemen fighting a fire, the bigger the fire is observed to be.
  +
:Therefore firemen cause an increase in the size of a fire.
  +
In this example, the correlation between the number of firemen at a scene and the size of the fire does not imply that the firemen cause the fire. Firemen are sent according to the severity of the fire and if there is a large fire, a greater number of firemen are sent; therefore, it is rather that fire causes firemen to arrive at the scene. So the above conclusion is false.
   
:''Sleeping with one's shoes on is strongly correlated with waking up with a headache''.
+
===A causes B and B causes A (bidirectional causation)===
:''Therefore, sleeping with one's shoes on causes headache''.
+
:Increased pressure is associated with increased temperature.
  +
:Therefore pressure causes temperature.
   
The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that ''both are caused by a third factor'', in this case alcohol intoxication, which thereby gives rise to a correlation. Thus, this is a case of possibility (2) above.
+
The [[ideal gas law]], <math>PV=nRT</math>, describes the direct relationship between pressure and temperature (along with other factors) to show that there is a direct correlation between the two properties. For a fixed volume and mass of gas, an increase in temperature will cause an increase in pressure; likewise, increased pressure will cause an increase in temperature. This demonstrates bidirectional causation. The conclusion that pressure causes temperature is true but is not logically guaranteed by the premise.
   
A recent scientific example:
+
===Third factor C (the common-causal variable) causes both A and B===
:''Young children who sleep with the light on are much more likely to develop [[myopia]] in later life.''
+
{{Main|Spurious relationship}}
   
This result of a study at University of Pennsylvania Medical Center was
+
All these examples deal with a [[lurking variable]], which is simply a hidden third variable that affects both causes of the correlation; for example, the fact that it is summer in Example 3. A difficulty often also arises where the third factor, though fundamentally different from A and B, is so closely related to A and/or B as to be confused with them or very difficult to scientifically disentangle from them (see Example 4).
published in the May 13, 1999, issue of [[Nature]] and received much coverage at the time in the popular press <ref> [[CNN]], May 13, 1999. [http://www.cnn.com/HEALTH/9905/12/children.lights/index.html Night-light may lead to nearsightedness].</ref>. However a later study at
 
[[Ohio State University]] did not find any link between [[infant]]s sleeping with the light on and developing
 
myopia but did find a strong link between parental myopia and the development of child myopia and also noted that
 
myopic parents were more likely to leave a light on in their children's bedroom
 
<ref>[[Ohio State University]] Research News, March 9, 2000. [http://researchnews.osu.edu/archive/nitelite.htm Night lights don't lead to nearsightedness, study suggests].</ref>. This is a case of (2).
 
   
Another example:
+
;Example 1
:''Since the 1950s, both the atmospheric CO<sub>2</sub> level and crime levels have increased sharply''.
+
:[[Sleeping]] with one's [[shoes]] on is strongly correlated with waking up with a [[headache]].
:''Hence, atmospheric CO<sub>2</sub> causes crime''.
+
:Therefore, sleeping with one's shoes on causes headache.
   
The above example arguably makes the mistake of prematurely concluding a causal relationship where the relationship between the variables, if any, is so complex it may be labelled coincidental. The two events have no simple relationship to each other beside the fact that they are occurring at the same time. This is a case of possibility (3) above.
+
The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case going to bed [[Drunkenness|drunk]], which thereby gives rise to a correlation. So the conclusion is false.
   
Another example:
+
;Example 2
:''Not eating causes [[anorexia nervosa]]''.
+
:Young children who sleep with the light on are much more likely to develop [[myopia]] in later life.
  +
:Therefore, sleeping with the light on causes myopia.
   
Depending on the evidence used to support this statement, it can be shown that this is a correlation implies causation error of either type (1) or (4) described above. Having the disease Anorexia Nervosa may be the cause of not eating. This could, however, also be an example of case (4): It is correct that not eating does cause anorexia nervosa, but it can also be claimed that having developed anorexia nervosa causes one not to eat. Empirical evidence would be necessary to make a causative statement.
+
This is a scientific example that resulted from a study at the [[University of Pennsylvania]] [[Penn Presbyterian Medical Center|Medical Center]]. Published in the May 13, 1999 issue of ''[[Nature (journal)|Nature]]'',<ref name="QuinnMyopiaNature">{{cite journal |author=Quinn GE, Shin CH, Maguire MG, Stone RA |title=Myopia and ambient lighting at night |journal=Nature |volume=399 |issue=6732 |pages=113–4 |year=1999 |month=May |pmid=10335839 |doi=10.1038/20094}}</ref> the study received much coverage at the time in the popular press.<ref>[[CNN]], May 13, 1999. [http://www.cnn.com/HEALTH/9905/12/children.lights/index.html Night-light may lead to nearsightedness]</ref> However, a later study at [[Ohio State University]] did not find that [[infant]]s sleeping with the light on caused the development of myopia. It did find a strong link between parental myopia and the development of child myopia, also noting that myopic parents were more likely to leave a light on in their children's bedroom.<ref>[[Ohio State University]] Research News, March 9, 2000. [http://researchnews.osu.edu/archive/nitelite.htm Night lights don't lead to nearsightedness, study suggests]</ref><ref>{{cite journal |author=Zadnik K, Jones LA, Irvin BC, ''et al.'' |title=Myopia and ambient night-time lighting |journal=Nature |volume=404 |issue=6774 |pages=143–4 |year=2000 |month=March |pmid=10724157 |doi=10.1038/35004661}}</ref><ref>{{cite journal |author=Gwiazda J, Ong E, Held R, Thorn F |title=Myopia and ambient night-time lighting |journal=Nature |volume=404 |issue=6774 |pages=144 |year=2000 |month=March |pmid=10724158 |doi=10.1038/35004663}}</ref><ref>{{Cite journal|journal=Nature|year=2000|volume=404|doi=10.1038/35004665|last2=et al.|last=Stone|title=Myopia and ambient night-time lighting|pages=144 |issue=6774 |month=March|pmid=<!--none-->|first1=J|first2=E|last3=Held|first3=R|last4=Thorn|first4=F|postscript=<!--None-->}}</ref> In this case, the cause of both conditions is parental myopia, and the above-stated conclusion is false.
   
A more complex example:
+
;Example 3
:''Scientific research finds that people who use cannabis (A) have a higher prevalence of psychiatric disorders compared to those who do not (B).''
+
:As [[ice cream]] sales increase, the rate of [[drowning]] deaths increases sharply.
  +
:Therefore, ice cream consumption causes drowning.
   
This particular correlation is sometimes used to support the theory that the use of cannabis ''causes'' a psychiatric disorder (A is the cause of B). Although this may be possible, we cannot automatically discern a cause and effect relationship from research that has only determined people who use cannabis are more likely to develop a psychiatric disorder. From the same research, it can also be the case that (1.) having the predisposition for a psychiatric disorder causes these individuals to use cannabis (B causes A), OR (2.) it may be the case that in the above study some unknown third factor (e.g., poverty) is the actual cause for there being found a higher number of people (compared to the general public) who both use cannabis and who have been diagnosed as having a psychiatric disorder. Alternatively, it may be that the effects of cannabis are found more pleasureable by persons with certain psychiatric disorders. To assume that A causes B is tempting, but further scientific investigation of the type that can isolate extraneous variables is needed when research has only determined a statistical correlation.
+
The aforementioned example fails to recognize the importance of time and temperature in relationship to ice cream sales. Ice cream is sold during the hot [[summer]] months at a much greater rate than during colder times, and it is during these hot summer months that people are more likely to engage in activities involving water, such as [[Human swimming|swimming]]. The increased drowning deaths are simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is false.
   
== Determining causation ==
+
;Example 4
{{unreferenced|date=October 2006}}
 
   
[[David Hume]] argued{{fact}} that causality cannot be perceived (and therefore cannot be known or proven), and instead we can only perceive correlation. However, he argued{{fact}} that we can use the [[scientific method]] to rule out false causes. <!-- What are the sources of these statements? Where did David Hume make these arguments? -->
+
:A hypothetical study shows a relationship between test anxiety scores and shyness scores, with a statistical ''r'' value (strength of correlation) of +.59.<ref>The Psychology of Personality: Viewpoints, Research, and Applications. Carducci, Bernard J. 2nd Edition. Wiley-Blackwell: UK, 2009.</ref>
  +
:Therefore, it may be simply concluded that shyness, in some part, causally influences test anxiety.
   
In modern science, causation is defined by a counterfactual.{{fact}} Suppose that a student performed poorly on a test and guesses that the cause was not studying. To prove this, we think of the counterfactual - the same student writing the same test under the same circumstances but having studied the night before. If we could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because we cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference{{fact}} - it is impossible to directly observe causal effects.{{fact}} <!-- Please cite who "refers" to the "Fundamental Problem of Causal Inference" and a source that defines it. -->
+
However, as encountered in many psychological studies, another variable, a "self-consciousness score," is discovered which has a sharper correlation (+.73) with shyness. This suggests a possible "third variable" problem, however, when three such closely related measures are found, it further suggests that each may have bidirectional tendencies (see "[[Correlation_does_not_imply_causation#A_causes_B_and_B_causes_A_.28bidirectional_causation.29|bidirectional variable]]," above), being a cluster of correlated values each influencing one another to some extent. Therefore, the simple conclusion above may be false.
   
The central goal of scientific [[experiments]] and statistical methods is to approximate as best as possible the counterfactual state of the world.{{fact}} <!-- Provide source that supports this statement. --> For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.{{fact}} <!-- Provide source that supports this statement. -->
+
;Example 5
   
Well designed statistical studies replace equality of individuals as in the previous example by equality of groups.{{fact}} This is achieved by randomization of the subjects to two or more groups. Although not a perfect system, placing the subjects randomly in the treatment/[[placebo]] groups, ensure that it is highly likely that the groups are reasonably equal in all relevant aspects.{{fact}} If the treatment has a significant different effect than the placebo, one can conclude that the treatment is likely to have a causal effect on the disease. This likeliness can be quantified in statistical terms by the [[P-value]].{{fact}}
+
:Since the 1950s, both the atmospheric [[carbon dioxide|CO<sub>2</sub>]] level and [[obesity]] levels have increased sharply.
  +
:Hence, atmospheric CO<sub>2</sub> causes obesity.
   
==See also==
+
Richer populations tend to eat more food and consume more energy
   
* [[Post hoc ergo propter hoc]] (coincidental correlation)
+
;Example 6
* [[Spurious relationship]]
 
   
{{Informal_Fallacy}}
+
:[[High-density lipoprotein|HDL]] ("good") [[cholesterol]] is negatively correlated with incidence of heart attack.
  +
:Therefore, taking medication to raise HDL will decrease the chance of having a heart attack.
   
==References and notes==
+
Further research<ref>Ornish, Dean. "Cholesterol: The good, the bad, and the truth" [http://www.huffingtonpost.com/dr-dean-ornish/cholesterol-the-good-the-_b_870655.html] (retrieved 3 June 2011)</ref> has called this conclusion into question. Instead, it may be that other underlying factors, like genes, diet and exercise, affect both HDL levels and the likelihood of having a heart attack; it is possible that medicines may affect the directly measurable factor, HDL levels, without affecting the chance of heart attack.
<div class="references-small">
+
<references/>
+
===Coincidence===
</div>
+
:With a decrease in the number of [[pirates]], there has been an increase in [[global warming]] over the same period.
  +
:Therefore, global warming is caused by a lack of pirates.
  +
  +
This example is used by the [[religion]] [[Flying Spaghetti Monster|Pastafarianism]] to illustrate the logical fallacy of assuming that correlation equals causation.
  +
  +
==Relation to the Ecological fallacy==
  +
There is a relation between this subject-matter and the [[Ecological fallacy]], described in a 1950 paper by William S. Robinson.<ref>{{cite journal|author=Robinson, W.S.|year=1950|title=Ecological Correlations and the Behavior of Individuals|journal=American Sociological Review|volume=15|pages=351–357|doi=10.2307/2087176|jstor=2087176|issue=3|publisher=American Sociological Review, Vol. 15, No. 3}}</ref> Robinson shows that ecological correlations, where the statistical object is a group of persons (i.e. an ethnic group), does not show the same behaviour as individual correlations, where the objects of inquiry are individuals: "The relation between ecological and individual correlations which is discussed in this paper provides a definite answer as to whether ecological correlations can validly be used as substitutes for individual correlations. They cannot." (...) "(a)n ecological correlation is almost certainly not equal to its corresponding individual correlation."
  +
  +
==Determining causation==
  +
  +
[[David Hume]] argued that causality is based on experience, and experience similarly based on the assumption that the future models the past, which in turn can only be based on experience&nbsp;– leading to [[circular logic]]. In conclusion, he asserted that [[problem of induction|causality is not based on actual reasoning]]: only correlation can actually be perceived.<ref>[http://plato.stanford.edu/entries/hume/#CausationN David Hume (Stanford Encyclopedia of Philosophy)]</ref>
  +
  +
In order for a correlation to be established as causal, the cause and the effect must be connected through an impact mechanism in accordance with known [[laws of nature]].
  +
  +
Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was his not studying. To prove this, one thinks of the counterfactual&nbsp;– the same student writing the same test under the same circumstances but having studied the night before. If one could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because one cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference&nbsp;– it is impossible to directly observe causal effects.<ref>Paul W. Holland. 1986. "Statistics and Causal Inference" Journal of the American Statistical Association, Vol. 81, No. 396. (Dec., 1986), pp. 945-960.</ref>
  +
  +
A major goal of scientific [[experiments]] and statistical methods is to approximate as best as possible the counterfactual state of the world.<ref>Judea Pearl. 2000. ''Causality: Models, Reasoning, and Inference,'' Cambridge University Press.</ref> For example, one could run an [[Twin study|experiment on identical twins]] who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.
  +
  +
Well-designed [[experiment|experimental studies]] replace equality of individuals as in the previous example by equality of groups. This is achieved by randomization of the subjects to two or more groups. Although not a perfect system, the likeliness of being equal in all aspects rises with the number of subjects placed randomly in the treatment/[[placebo]] groups. From the significance of the difference of the effect of the treatment vs. the placebo, one can conclude the likeliness of the treatment having a causal effect on the disease. This likeliness can be quantified in statistical terms by the [[P-value]] {{dubious|date=July 2012}}.
  +
  +
When experimental studies are impossible and only pre-existing data are available, as is usually the case for example in economics, [[regression analysis]] can be used. Factors other than the potential causative variable of interest are controlled for by including them as regressors in addition to the regressor representing the variable of interest. False inferences of causation due to reverse causation (or wrong estimates of the magnitude of causation due the presence of bidirectional causation) can be avoided by using explanators ([[Dependent and independent variables#Use in statistics|regressors]]) that are necessarily [[exogenous]], such as physical explanators like rainfall amount (as a determinant of, say, futures prices), lagged variables whose values were determined before the dependent variable's value was determined, [[instrumental variables]] for the explanators (chosen based on their known exogeneity), etc. See [[Causality#Economics]]. Spurious correlation due to mutual influence from a third, common, causative variable, is harder to avoid: the model must be specified such that there is a theoretical reason to believe that no such underlying causative variable has been omitted from the model; in particular, underlying time trends of both the dependent variable and the independent (potentially causative) variable must be controlled for by including time as another independent variable.
  +
  +
==See also==
  +
* [[Affirming the consequent]]
  +
* [[Chain reaction]]
  +
* [[Confirmation bias]]
  +
* [[Confounding]]
  +
* [[Design of experiments]]
  +
* [[Domino effect]]
  +
* [[Ecological fallacy]]
  +
* [[Four causes]]
  +
* [[Mierscheid Law]]
  +
* [[Normally distributed and uncorrelated does not imply independent]]
  +
* [[Observational study]]
  +
  +
==References==
  +
{{Reflist}}
   
 
==External links==
 
==External links==
* [http://www.fallacyfiles.org/cumhocfa.html Cum Hoc, Ergo Propter Hoc] in the ''Fallacy Files'' by Gary N. Curtis
+
* [http://singapore.cs.ucla.edu/LECTURE/lecture_sec1.htm "The Art and Science of cause and effect"]: a slide show and tutorial lecture by Judea Pearl
* [http://www.fallacyfiles.org/noncause.html Non Causa Pro Causa] in the ''Fallacy Files'' by Gary N. Curtis
+
* [http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf Causal inference in statistics: An overview], by Judea Pearl (September 2009)
* [http://www.obereed.net/hh/correlation.html New Poll Shows Correlation is Causation] A satirical article regarding correlation and causation.
 
   
  +
{{Informal Fallacy}}
  +
  +
{{DEFAULTSORT:Correlation Does Not Imply Causation}}
 
[[Category:Causal fallacies]]
 
[[Category:Causal fallacies]]
  +
[[Category:Causal inference]]
  +
[[Category:Causation]]
 
[[Category:Covariance and correlation]]
 
[[Category:Covariance and correlation]]
  +
[[Category:Misuse of statistics]]
   
+
<!--
:de:Cum hoc ergo propter hoc
+
[[bg:Cum hoc ergo propter hoc]]
:he:קום הוק ארגו פרופטר הוק
+
[[ca:Cum hoc ergo propter hoc]]
:nl:Cum hoc ergo propter hoc
+
[[de:Cum hoc ergo propter hoc]]
:no:Korrelasjon medfører kausalitet
+
[[el:Cum hoc ergo propter hoc]]
:fi:Cum hoc ergo propter hoc
+
[[es:Cum hoc ergo propter hoc]]
  +
[[eu:Cum hoc ergo propter hoc]]
  +
[[fa:مغالطه علت شمردن همبستگی]]
  +
[[fr:Cum hoc ergo propter hoc]]
  +
[[is:Fylgnivilla]]
  +
[[he:קום הוק ארגו פרופטר הוק]]
  +
[[hu:Cum hoc]]
  +
[[nl:Cum hoc ergo propter hoc]]
  +
[[ja:相関関係と因果関係]]
  +
[[no:Korrelasjon medfører kausalitet]]
  +
[[ru:Причинно-следственный круг]]
  +
[[fi:Cum hoc ergo propter hoc]]
  +
-->
 
{{enWP|Correlation does not imply causation}}
 
{{enWP|Correlation does not imply causation}}

Latest revision as of 14:41, August 17, 2012

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


"Correlation does not imply causation" (related to "ignoring a common cause" and questionable cause) is a phrase used in science and statistics to emphasize that a correlation between two variables does not automatically imply that one causes the other (though correlation is necessary for linear causation in the absence of any third and countervailing causative variable, it can indicate possible causes or areas for further investigation; in other words, correlation is a hint).[1][2]

The opposite belief, correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause. It is a common fallacy in which it is assumed that, because two things or events occur together, one must be the cause of the other. By contrast, the fallacy, post hoc ergo propter hoc, requires that one event occur after the other, and so may be considered a related fallacy.

In a widely studied example, numerous epidemiological studies showed that women who were taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But randomized controlled trials showed that HRT caused a small but statistically significant increase in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better than average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than cause and effect, as had been supposed.[3]

UsageEdit

In logic, the technical use of the word "implies" means "to be a sufficient circumstance." This is the meaning intended by statisticians when they say causation is not certain. Indeed, p implies q has the technical meaning of logical implication: if p then q symbolized as p → q. That is "if circumstance p is true, then q necessarily follows." In this sense, it is always correct to say "Correlation does not imply causation."

However, in casual use, the word "imply" loosely means suggests rather than requires. The idea that correlation and causation are connected is certainly true; where there is causation, there is likely to be correlation. Indeed, correlation is used when inferring causation; the important point is that such inferences are not always correct because there are other possibilities, as explained later in this article.

Edward Tufte, in a criticism of the brevity of "correlation does not imply causation," deprecates the use of "is" to relate correlation and causation (as in "Correlation is not causation"), citing its inaccuracy as incomplete.[1] While it is not the case that correlation is causation, simply stating their nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and correlation is one of the following:[4]

  • "Empirically observed covariation is a necessary but not sufficient condition for causality."
  • "Correlation is not causation but it sure is a hint."

General patternEdit

The cum hoc ergo propter hoc logical fallacy can be expressed as follows:

  1. A occurs in correlation with B.
  2. Therefore, A causes B.

In this type of logical fallacy, one makes a premature conclusion about causality after observing only a correlation between two or more factors. Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes taken for granted that A is causing B, even when no evidence supports it. This is a logical fallacy because there are at least five possibilities:

  1. A may be the cause of B.
  2. B may be the cause of A.
  3. some unknown third factor C may actually be the cause of both A and B.
  4. there may be a combination of the above three relationships. For example, B may be the cause of A at the same time as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). This describes a self-reinforcing system.
  5. the "relationship" is a coincidence or so complex or indirect that it is more effectively called a coincidence (i.e. two events occurring at the same time that have no direct relationship to each other besides the fact that they are occurring at the same time). A larger sample size helps to reduce the chance of a coincidence, unless there is a systematic error in the experiment.

In other words, there can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A and B are correlated. Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained.

Examples of illogically inferring causation from correlationEdit

Template:Inline

B causes A (reverse causation)Edit

The more firemen fighting a fire, the bigger the fire is observed to be.
Therefore firemen cause an increase in the size of a fire.

In this example, the correlation between the number of firemen at a scene and the size of the fire does not imply that the firemen cause the fire. Firemen are sent according to the severity of the fire and if there is a large fire, a greater number of firemen are sent; therefore, it is rather that fire causes firemen to arrive at the scene. So the above conclusion is false.

A causes B and B causes A (bidirectional causation)Edit

Increased pressure is associated with increased temperature.
Therefore pressure causes temperature.

The ideal gas law, PV=nRT, describes the direct relationship between pressure and temperature (along with other factors) to show that there is a direct correlation between the two properties. For a fixed volume and mass of gas, an increase in temperature will cause an increase in pressure; likewise, increased pressure will cause an increase in temperature. This demonstrates bidirectional causation. The conclusion that pressure causes temperature is true but is not logically guaranteed by the premise.

Third factor C (the common-causal variable) causes both A and BEdit

Main article: Spurious relationship

All these examples deal with a lurking variable, which is simply a hidden third variable that affects both causes of the correlation; for example, the fact that it is summer in Example 3. A difficulty often also arises where the third factor, though fundamentally different from A and B, is so closely related to A and/or B as to be confused with them or very difficult to scientifically disentangle from them (see Example 4).

Example 1
Sleeping with one's shoes on is strongly correlated with waking up with a headache.
Therefore, sleeping with one's shoes on causes headache.

The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case going to bed drunk, which thereby gives rise to a correlation. So the conclusion is false.

Example 2
Young children who sleep with the light on are much more likely to develop myopia in later life.
Therefore, sleeping with the light on causes myopia.

This is a scientific example that resulted from a study at the University of Pennsylvania Medical Center. Published in the May 13, 1999 issue of Nature,[5] the study received much coverage at the time in the popular press.[6] However, a later study at Ohio State University did not find that infants sleeping with the light on caused the development of myopia. It did find a strong link between parental myopia and the development of child myopia, also noting that myopic parents were more likely to leave a light on in their children's bedroom.[7][8][9][10] In this case, the cause of both conditions is parental myopia, and the above-stated conclusion is false.

Example 3
As ice cream sales increase, the rate of drowning deaths increases sharply.
Therefore, ice cream consumption causes drowning.

The aforementioned example fails to recognize the importance of time and temperature in relationship to ice cream sales. Ice cream is sold during the hot summer months at a much greater rate than during colder times, and it is during these hot summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is false.

Example 4
A hypothetical study shows a relationship between test anxiety scores and shyness scores, with a statistical r value (strength of correlation) of +.59.[11]
Therefore, it may be simply concluded that shyness, in some part, causally influences test anxiety.

However, as encountered in many psychological studies, another variable, a "self-consciousness score," is discovered which has a sharper correlation (+.73) with shyness. This suggests a possible "third variable" problem, however, when three such closely related measures are found, it further suggests that each may have bidirectional tendencies (see "bidirectional variable," above), being a cluster of correlated values each influencing one another to some extent. Therefore, the simple conclusion above may be false.

Example 5
Since the 1950s, both the atmospheric CO2 level and obesity levels have increased sharply.
Hence, atmospheric CO2 causes obesity.

Richer populations tend to eat more food and consume more energy

Example 6
HDL ("good") cholesterol is negatively correlated with incidence of heart attack.
Therefore, taking medication to raise HDL will decrease the chance of having a heart attack.

Further research[12] has called this conclusion into question. Instead, it may be that other underlying factors, like genes, diet and exercise, affect both HDL levels and the likelihood of having a heart attack; it is possible that medicines may affect the directly measurable factor, HDL levels, without affecting the chance of heart attack.

CoincidenceEdit

With a decrease in the number of pirates, there has been an increase in global warming over the same period.
Therefore, global warming is caused by a lack of pirates.

This example is used by the religion Pastafarianism to illustrate the logical fallacy of assuming that correlation equals causation.

Relation to the Ecological fallacyEdit

There is a relation between this subject-matter and the Ecological fallacy, described in a 1950 paper by William S. Robinson.[13] Robinson shows that ecological correlations, where the statistical object is a group of persons (i.e. an ethnic group), does not show the same behaviour as individual correlations, where the objects of inquiry are individuals: "The relation between ecological and individual correlations which is discussed in this paper provides a definite answer as to whether ecological correlations can validly be used as substitutes for individual correlations. They cannot." (...) "(a)n ecological correlation is almost certainly not equal to its corresponding individual correlation."

Determining causationEdit

David Hume argued that causality is based on experience, and experience similarly based on the assumption that the future models the past, which in turn can only be based on experience – leading to circular logic. In conclusion, he asserted that causality is not based on actual reasoning: only correlation can actually be perceived.[14]

In order for a correlation to be established as causal, the cause and the effect must be connected through an impact mechanism in accordance with known laws of nature.

Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was his not studying. To prove this, one thinks of the counterfactual – the same student writing the same test under the same circumstances but having studied the night before. If one could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because one cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference – it is impossible to directly observe causal effects.[15]

A major goal of scientific experiments and statistical methods is to approximate as best as possible the counterfactual state of the world.[16] For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.

Well-designed experimental studies replace equality of individuals as in the previous example by equality of groups. This is achieved by randomization of the subjects to two or more groups. Although not a perfect system, the likeliness of being equal in all aspects rises with the number of subjects placed randomly in the treatment/placebo groups. From the significance of the difference of the effect of the treatment vs. the placebo, one can conclude the likeliness of the treatment having a causal effect on the disease. This likeliness can be quantified in statistical terms by the P-value [dubious].

When experimental studies are impossible and only pre-existing data are available, as is usually the case for example in economics, regression analysis can be used. Factors other than the potential causative variable of interest are controlled for by including them as regressors in addition to the regressor representing the variable of interest. False inferences of causation due to reverse causation (or wrong estimates of the magnitude of causation due the presence of bidirectional causation) can be avoided by using explanators (regressors) that are necessarily exogenous, such as physical explanators like rainfall amount (as a determinant of, say, futures prices), lagged variables whose values were determined before the dependent variable's value was determined, instrumental variables for the explanators (chosen based on their known exogeneity), etc. See Causality#Economics. Spurious correlation due to mutual influence from a third, common, causative variable, is harder to avoid: the model must be specified such that there is a theoretical reason to believe that no such underlying causative variable has been omitted from the model; in particular, underlying time trends of both the dependent variable and the independent (potentially causative) variable must be controlled for by including time as another independent variable.

See alsoEdit

ReferencesEdit

  1. 1.0 1.1 Tufte, Edward R. (2006). The Cognitive Style of PowerPoint: Pitching Out Corrupts Within: 5.
  2. [1] Aldrich, John (1995). Correlations Genuine and Spurious in Pearson and Yule. Statistical Science 10 (4): 364–376.
  3. Lawlor DA, Davey Smith G, Ebrahim S (June 2004). Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology?. Int J Epidemiol 33 (3): 464–7.
  4. Tufte, Edward R. (2003). The Cognitive Style of PowerPoint, Cheshire, Connecticut: Graphics Press.
  5. Quinn GE, Shin CH, Maguire MG, Stone RA (May 1999). Myopia and ambient lighting at night. Nature 399 (6732): 113–4.
  6. CNN, May 13, 1999. Night-light may lead to nearsightedness
  7. Ohio State University Research News, March 9, 2000. Night lights don't lead to nearsightedness, study suggests
  8. Zadnik K, Jones LA, Irvin BC, et al. (March 2000). Myopia and ambient night-time lighting. Nature 404 (6774): 143–4.
  9. Gwiazda J, Ong E, Held R, Thorn F (March 2000). Myopia and ambient night-time lighting. Nature 404 (6774): 144.
  10. Stone (March 2000). Myopia and ambient night-time lighting. Nature 404 (6774): 144.
  11. The Psychology of Personality: Viewpoints, Research, and Applications. Carducci, Bernard J. 2nd Edition. Wiley-Blackwell: UK, 2009.
  12. Ornish, Dean. "Cholesterol: The good, the bad, and the truth" [2] (retrieved 3 June 2011)
  13. Robinson, W.S. (1950). Ecological Correlations and the Behavior of Individuals. American Sociological Review 15 (3): 351–357.
  14. David Hume (Stanford Encyclopedia of Philosophy)
  15. Paul W. Holland. 1986. "Statistics and Causal Inference" Journal of the American Statistical Association, Vol. 81, No. 396. (Dec., 1986), pp. 945-960.
  16. Judea Pearl. 2000. Causality: Models, Reasoning, and Inference, Cambridge University Press.

External linksEdit

Informal fallacies
Special pleading | Red herring | Gambler's fallacy and its inverse
Fallacy of distribution (Composition | Division) | Begging the question | Many questions
Correlative-based fallacies:
False dilemma (Perfect solution) | Denying the correlative | Suppressed correlative
Deductive fallacies:
Accident | Converse accident
Inductive fallacies:
Hasty generalization | Overwhelming exception | Biased sample
False analogy | Misleading vividness | Conjunction fallacy
Vagueness:
False precision | Slippery slope
Ambiguity:
Amphibology | Continuum fallacy | False attribution (Contextomy | Quoting out of context)
Equivocation (Loki's Wager | No true Scotsman)
Questionable cause:
Correlation does not imply causation | Post hoc | Regression fallacy
Texas sharpshooter | Circular cause and consequence | Wrong direction | Single cause
Other types of fallacy
This page uses Creative Commons Licensed content from Wikipedia (view authors).

Around Wikia's network

Random Wiki