Wikia

Psychology Wiki

Changes: Regression toward the mean

Edit

Back to page

 
(update wp)
 
Line 1: Line 1:
 
{{StatsPsy}}
 
{{StatsPsy}}
   
In [[statistics]], '''regression toward the mean''' is a principle stating that of related measurements, the expected value of the second is closer to the mean than the observed value of the first.
 
   
==Examples==
+
[[Image:Regression toward the mean.svg|200px|thumb|A [[scatterplot]] demonstrating regression toward the mean, distinguished by a football-shaped cloud of points]]
Consider, for example, students who take a midterm and a final exam. Students who got an extremely high score on the midterm will probably get a good score on the final exam as well, but we expect their score to be closer to the average (i.e.: fewer [[standard deviation]]s above the average) than their midterm score was. The reason: it is likely that some [[luck]] was involved in getting the exceptional midterm score, and this luck cannot be counted on for the final. It is also true that among those who get exceptionally high final exam scores, the average midterm score will be fewer standard deviations above average than the final exam score, since some of those got high scores on the final due to luck that they didn't have on the midterm. Similarly, unusually low scores regress toward the mean.
+
'''Regression toward the mean'''<ref name="Raiffa">Howard Raiffa and Robert Schlaifer, <cite>Applied Statistical Decision Theory</cite>, Wiley-Interscience (2000) ISBN: 978-0471383499</ref><ref name="Casella">George Casella and Roger L. Berger, <cite>Statistical Inference</cite>, Duxbury Press (2001) ISBN: 978-0534243128 </ref> is a principle in [[statistics]] that states that if you take a pair of independent measurements from the same distribution, samples far from the [[mean]] on the first set will tend to be closer to the mean on the second set, and the farther from the [[mean]] on the first measurement, the stronger the effect. Regression to the mean relies on random variance affecting the measurement of any variable; this random variance will cause some samples to be extreme. On the second measurement, these samples will appear to regress because the random variance affecting the samples in the second measurement is independent of the random variance affecting the first. Thus, regression to the mean is a mathematical inevitability: any measurement of any variable that is affected by random variance ''must'' show regression to the mean.
  +
  +
For example, if you give a class of students a test on two successive days, the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because each sample is affected by random variance. Student scores are determined in part by underlying ability and in part by purely stochastic, unpredictable chance. On the first test, some will be lucky, and score higher than their ability, and some will be unlucky and score lower than their ability. The lucky ones are more likely to score above the mean than below it, because their good luck improves their score. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have average or below average luck. Therefore a student who was lucky on the first test is more likely to have a worse score on the second test than a better score. The students who score above the mean on the first test are more likely to be lucky than unlucky, and lucky students are more likely to see their score decline than go up, so students who score above the mean on the first test will tend to see their scores decline on the second test. By parallel reasoning, students who score below the mean on the first test will tend to see their scores increase on the second test. Students will regress toward the mean.
   
It is a commonplace observation that matings of two championship athletes, or of two geniuses, usually results in a child who is above average but less talented than either of their parents.
+
The magnitude of regression toward the mean depends on the ratio of error variance over the total variance within the [[Sampling (statistics)|sample]]. If a measurement is largely determined by random chance, then regression to the mean will be very large. If measurement is largely determined by known factors, regression to the mean will be less. In one extreme case, where all individuals are identical and all differences are caused by measurement error, there will be 100% regression toward the mean. If we ask 10,000 people to flip a fair coin ten times, the people who flipped ten heads the first time are expected to get five heads on a repeat experiment, the same as the people who flipped zero heads the first time. In the other extreme of perfect measurement, there is no regression toward the mean. We not only expect, we know, the second measurement will be the same as the first.
   
 
==History==
 
==History==
   
The first [[regression analysis|regression]] line drawn on biological data was a plot of seed weights presented by [[Francis Galton]] at a Royal Institution lecture in [[1877]]. Galton had seven sets of sweet pea seeds labelled K to Q and in each packet the seeds were of the same weight. He chose sweet peas on the advice of his cousin [[Charles Darwin]] and the botanist [[Joseph Hooker]] as sweet peas tend not to self fertilise and the seed weight varies little with [[humidity]]. He distributed these packets to a group of friends throughout [[Great Britain]] who planted them. At the end of the growing season the plants were uprooted and returned to Galton. The seeds were distributed because when Galton had tried this experiment himself in the [[Kew Gardens]] in [[1874]], the crop had failed.
+
The concept of regression comes from genetics and was popularized by [[Sir Francis Galton]] in the late 19th century with the publication of '''Regression Towards Mediocrity in Hereditary Stature.''' Galton observed that extreme characteristics (e.g., height) in parents were not fully passed on to their offspring. Rather, the characteristic in the offspring ''regressed'' towards the a ''mediocre'' point (a point which has since been mathematically shown to be the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, "the average regression of the offspring is a constant fraction of their respective mid-parental deviations." This means that the difference between a child and her parents on some characteristic was proportional to her parents deviation from typical people in the population. So if her parents were each two inches taller than the averages for men and women, on average she would be shorter than her parents by some factor (which today we would call one minus the [[Regression analysis|regression coefficient]]) times two inches. For height, Galton estimated this correlation coefficient to be around 2/3: the height of an individual will center around 2/3rds of the parents deviation.
   
He found that the weights of the offspring seeds were normally distributed, like their parents, and that if he plotted the mean diameter of the offspring seeds against the mean diameter of their parents he could draw a straight line through the points - the first regression line. He also found on this plot that the mean size of the offspring seeds tended to the overall mean size. He initially referred to the slope of this line as the "coefficient of reversion". Once he discovered that this effect was not a heritable property but the result of his manipulations of the data, he changed the name to the "coefficient of regression". This result was important because it appeared to conflict with the current thinking on evolution and natural selection. He went to do extensive work in quantitative genetics and in [[1888]] coined the term "co-relation" and used the now familiar symbol "r" for this value.
+
Although [[Galton]] popularized the concept of regression, he fundamentally misunderstood the phenomenon; thus, his understanding of regression differs from that of modern statisticians. Galton's was correct in his observation that the characteristics of an individual are not fully determined by their parents; there must be another source. However, he explains this by arguing that, "A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large."<ref name="galton1886">{{cite journal|journal=Nature|author=Galton, F.|title=Regression Toward Mediocrity in Hereditary Stature|year=1886}}</ref> In other words, Galton believed that regression to the mean was simply an inheritance of characteristics from ancestors that are not expressed in the parents; he did not understand regression to the mean as a statistical phenomenon. In contrast to this view, it is now known that regression to the mean is a mathematical inevitability: if there is any random variance between the height of an individual and parents--if the correlation is not exactly equal to 1--then the predictions must regress to the mean regardless of the underlying mechanisms of inheritance, race or culture. Thus, Galton was attributing random variance in height to the ancestry of the individual. This fundamental misunderstanding of a purely mathematical phenomenon is a major motivating factor in the development of [[eugenics]].
   
In additional work he investigated [[genius]]es in various fields and noted that their children, while typically gifted, were almost invariably closer to the average than their exceptional parents. He later described the same effect more numerically by comparing fathers' heights to their sons' heights. Again, the heights of sons both of unusually tall fathers and of unusually short fathers was typically closer to the mean height than their fathers' heights.
+
== Why it matters ==
   
==Ubiquity==
+
The most important reason to care about regression toward the mean is for [[Design of experiments|experimental design]]. Suppose you give physical exams to 1,000 55-year old males and score them on risk of having a heart attack. You take the 50 who scored at highest risk, and put them on a diet and exercise regime, and give them a drug. Even if the treatments are worthless, you expect the group to show improvement on their next physical exam due to regression toward the mean. The best way to combat this is to randomly divide the group into a treatment group that gets the treatment, and a [[Scientific control|control]] group that does not. We expect both groups to improve, the treatment should be judged effective only if the treatment group improves more than the control group.
   
It is important to realize that regression toward the mean is a ubiquitous statistical phenomenon and has nothing to do with biological inheritance. It is also unrelated to the progression of time: the ''fathers'' of exceptionally tall people also tend to be closer to the mean than their sons. The overall variability of height among fathers and sons is the same.
+
On the opposite side, suppose you give a test to a group of disadvantaged ninth-graders to identify the ones with most college potential. You select the top 1% and supply them with special enrichment courses, tutoring, counseling and computers. Even if the program is effective, you might find that their scores decline on average when the test is repeated a year later. If it is considered unfair to have a [[Scientific control|control]] group, you can make a mathematical calculation to adjust for this effect, although that will not be as reliable as the [[Scientific control|control]]. [[Shrinkage (statistics)|Shrinkage]] is the name of the statistical technique to adjust for regression toward the mean (see also, [[Stein's example]]).
   
== Mathematical derivation ==
+
The effect can also be exploited for general inference and estimation. The hottest place in the country today is more likely to be cooler tomorrow than hotter. The best performing mutual fund over the last three years is more likely to see performance decline than improve over the next three years. The Hollywood star of the biggest box office success of this year more is likely to see a lower gross than a higher gross on her movie next year. The baseball player with the highest batting average by the All-Star break is more likely to have a lower average than a higher average over the second half of the season.
   
Let ''X'' and ''Y'' be jointly Gaussian random variables each of mean 0, variance 1, with [[correlation|correlation coefficient]] ''r''. The Cauchy-Schwartz inequality shows that |''r''| <= 1. From Gaussianity, the expected value of ''Y'' given the value of ''X'' is linear in ''X''; more precisely, ''E[Y|X]=rX'', which states that the estimated value for ''Y'' is closer to the mean 0 than the observed value ''X'' since |''r''| < 1. Similar results can be obtained for more general classes of joint distributions.
+
== Warnings ==
   
The example illustrates a general feature: regression toward the mean is more pronounced the less the two variables are correlated, i.e. the smaller |''r''| is.
+
The concept of regression toward the mean can be misused very easily.
   
The phenomenon of regression toward the mean is related to [[Stein's example]].
+
The [[Law of large numbers]] is an unrelated phenomenon often confused with regression toward the mean. Suppose you flip a coin 100 times and measure the frequency of heads. Then you flip the coin 100 more times. The frequency of heads over the entire 200 flips is likely to be closer to the mean than the frequency over the first 100 flips. This is different from regression toward the mean. In the first place, the frequency of heads over the second 100 flips is equally likely to be closer to or farther from the mean than the frequency of heads over the first 100 flips. It is a fallacy to think the second 100 flips has a tendency to even out the total. If the first 100 flips produces 5 heads more than [[Expected value|expected]], we expect to have 5 heads more than [[Expected value|expected]] at the end of 200 flips as well. The average number of heads regresses toward the mean, but the number of heads does not. In the second place, this regression is toward the true mean, not the mean of the first 100 flips.
   
== Regression fallacies ==
+
In the student test example above, it was implicitly assumed that what was being measured did not change between the two measurements. But suppose it was a pass/fail course and you had to score above 70 on both tests to pass. Then the students who scored under 70 the first time would have no incentive to do well, and might score worse on average the second time. The students just over 70, on the other hand, would have a strong incentive to study overnight and concentrate while taking the test. In that case you might see movement ''away'' from 70, scores below it getting lower and scores above it getting higher. It is possible for changes between the measurement times to augment, offset or reverse the statistical tendency to regress toward the mean. Do not confuse [[causality|causal]] regression toward the mean (or away from it) with the statistical phenomenon.
   
Misunderstandings of the principle (known as "'''regression fallacies'''") have repeatedly led to mistaken claims in the scientific literature.
+
The opposite point is even more important. Do not think of statistical regression toward the mean as a [[causality|causal]] phenomenon. If you are the student with the worst score on the first day's exam, there is no invisible hand to lift up your score on the second day, without effort from you. If you know you scored in line with your ability, you are equally likely to score better or worse on the second test. On average the worst scorers improve, but that's only true because the worst scorers are more likely to have been unlucky than lucky. You know how lucky or unlucky you were, so regression toward the mean is irrelevant from your point of view.
   
An extreme example is Horace Secrist's 1933 book ''The Triumph of Mediocrity in Business'', in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend towards the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".
+
Although individual measurements regress toward the mean, the second [[Sampling (statistics)|sample]] of measurements will be no closer to the mean than the first. Consider the students again. Suppose their tendency is to regress 10% of the way toward the [[mean]] of 80, so a student who scored 100 the first day is [[Expected value|expected]] to score 98 the second day, and a student who scored 70 the first day is [[Expected value|expected]] to score 71 the second day. Those expectations are closer to the mean, on average, than the first day scores. But the second day scores will vary around their expectations, some will be higher and some will be lower. This will make the second set of measurements farther from the mean, on average, than their expectations. The effect is the exact reverse of regression toward the mean, and exactly offsets it. So for every individual, we expect the second score to be closer to the mean than the first score, but for ''all'' individuals, we expect the average distance from the mean to be the same on both sets of measurements.
   
A different regression fallacy occurs in the following example. We want to test whether a certain stress-reducing drug increases reading skills of poor readers. Pupils are given a reading test. The lowest 10% scorers are then given the drug, and tested again, with a different test that also measures reading skill. We find that the average reading score of our group has improved significantly. This however does not show anything about the effectiveness of the drug: even without the drug, the principle of regression toward the mean would have predicted the same outcome.
+
Related to the point above, regression toward the mean works equally well in both directions. We expect the student with the highest test score on the second day to have done worse on the first day. And if we compare the best student on the first day to the best student on the second day, regardless of whether it is the same individual or not, there is no tendency to regress toward the mean. We [[Expected value|expect]] the best scores on both days to be equally far from the mean.
  +
  +
Also related to the above point, if we pick a point close to the mean on the first set of measurements, we may [[Expected value|expect]] it to be farther from the mean on the second set. The [[expected value]] of the second measurement is closer to the mean than the point, but the measurement error will move it on average farther away. That is, the [[expected value]] of the distance from the mean on the second measurement is greater than the distance from the mean on the first measurement.
  +
  +
== Regression toward everything ==
  +
  +
Notice that in the informal explanation given above for the phenomenon, there was nothing special about the mean. We could pick any point within the [[Range (statistics)|sample range]] and make the same argument: students who scored above this value were more likely to have been lucky than unlucky, students who scored below this value were more likely to have been unlucky than lucky. How can individuals regress toward every point in the [[Range (statistics)|sample range]] at once? The answer is each individual is pulled toward every point in the [[Range (statistics)|sample range]], but to different degrees.
  +
  +
For a physical analogy, every [[mass]] in the [[solar system]] is pulled toward every other [[mass]] by [[gravitation]], but the net effect for [[planets]] is to be pulled toward the [[center of mass]] of the entire [[solar system]]. This illustrates an important point. Individuals on [[Earth]] at noon are pulled toward the [[Earth]], away from the [[Sun]] and the [[center of mass]] of the [[solar system]]. Similarly, an individual in a [[Sampling (statistics)|sample]] might be pulled toward a subgroup [[mean]] more strongly than to the [[Sampling (statistics)|sample]] [[mean]], and even pulled away from the [[Sampling (statistics)|sample]] [[mean]]. Consider, for example, the [[pitcher]] with the highest [[batting average]] in the [[National League]] by the All-Star break, and assume his [[batting average]] is below the average for all [[National League]] players. His [[batting average]] over the second half of the season will regress up toward the [[mean]] of all players, and down toward the [[mean]] of all [[pitcher|pitchers]]. For that matter, if he is left-handed he is pulled toward the [[mean]] of all left-handers, if he is a rookie he is pulled to the [[mean]] of all rookies, and so on. Which of these effects dominates depends on the data under consideration.
  +
  +
The concept does not apply, however, to supersets. While the [[pitcher]] above may be pulled to the [[mean]] of all humans, or the [[mean]] of all things made of matter, our [[Sampling (statistics)|sample]] does not give us estimates of those [[mean|means]].
  +
  +
In general, you can expect the net effect of regressions toward all points to pull an individual toward the closest [[Mode (statistics)|mode]] of the [[Probability distribution|distribution]]. If you have information about subgroups, and the subgroup [[mean|means]] are far apart relative to the differences between individuals, you can expect individuals to be pulled toward subgroup [[mean|means]], even if those do not show up as [[Mode (statistics)|modes]] of the [[Probability distribution|distribution]]. For [[Bimodal distribution|unimodal]] [[Probability distribution|distributions]], without strong subgroup effects or asymmetries, individuals will likely be pulled toward the [[mean]], [[median]] and [[mode]] which should be close together. For [[Bimodal distribution|bimodal and multimodal]] [[Probability distribution|distributions]], asymmetric [[Probability distribution|distributions]] or data with strong subgroup effects, regression toward the mean should be applied with caution.
  +
  +
== Regression fallacies ==
  +
{{main|regression fallacy}}
  +
Misunderstandings of the principle (known as "'''regression fallacies'''") have repeatedly led to mistaken claims in the scientific literature.
  +
  +
An extreme example is Horace Secrist's 1933 book ''The Triumph of Mediocrity in Business'', in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend toward the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer, [[Harold Hotelling]], likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".<ref>Hotelling, H. (1933). Review of The triumph of mediocrity in business by Secrist, H., ''Journal of the American Statistical Association'', 28, 433-435.</ref>
   
 
The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.
 
The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.
   
== In sports ==
+
The psychologist [[Daniel Kahneman]] referred to regression to the mean in his speech when he won the 2002 [[Nobel prize in economics|Bank of Sweden prize]] for economics.
Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "Sophomore Slump." For example, [[Carmelo Anthony]] of the [[NBA]]'s [[Denver Nuggets]] had an outstanding rookie season in [[2004]]. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in [[2005]], Anthony's numbers had slightly dropped from his torrid rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any. Of course, not just "sophomores" experience regression to the mean. Any athlete who posts a significant [[outlier]], whether as a rookie (young players are universally not as good as those in their prime seasons), or particularly after their prime years (for most sports, the mid to late twenties), can be expected to perform more in line with their established standards of performance. The trick for sports executives, then, is to determine whether or not a player's play in the previous season was indeed an outlier, or if the player has established a new level of play. However, this is not easy. [[Melvin Mora]] of the [[Baltimore Orioles]] put up a season in [[2003]], at age 31, that was so far away from his performance in prior seasons that analysts assumed it had to be an outlier... but in [[2004]], Mora was even better. Mora, then, had truly established a new level of production, though he will likely regress to his more reasonable 2003 numbers in [[2005]]. Conversely, [[Kurt Thomas (basketball)|Kurt Thomas]] of the [[New York Knicks]] significantly ramped up his production in [[2001]], at an age (29) when players typically start to play more poorly. Sure enough, in the following season Thomas was his old self again, having regressed to the mean of his established level of play. [[John Hollinger]] has an alternate name for the law of regression to the mean: the "fluke rule." Whatever you call it, though, regression to the mean is a fact of life, and also of sports.
 
   
== References ==
+
{{cquote|I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, "On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don't tell us that reinforcement works and punishment does not, because the opposite is the case." This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.}}
   
* J.M. Bland and D.G. Altman. "Statistic Notes: Regression towards the mean", ''British Medical Journal'' 308:1499, 1994. ''(Article, including a diagram of Galton's original data, online at: [http://bmj.bmjjournals.com/cgi/content/full/308/6942/1499])''
+
UK law enforcement policies have encouraged the visible siting of static or mobile [[speed camera]]s at [[accident blackspot]]s. This policy was justified by a perception that there is a corresponding reduction in serious [[road traffic accidents]] after a camera is set up. However, statisticians have pointed out that, although there is a net benefit in lives saved, failure to take into account the effects of regression to the mean results in the beneficial effects' being overstated. It is thus claimed that some of the money currently spent on traffic cameras could be more productively directed elsewhere.<ref>[http://www.timesonline.co.uk/tol/news/uk/article766659.ece The Times, [[16 December]] 2005 Speed camera benefits overrated]</ref>
   
* Francis Galton. "Regression Towards Mediocrity in Hereditary Stature," ''Journal of the Anthropological Institute'', 15:246-263 (1886). ''(Facsimile at: [http://www.mugu.com/galton/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf])''
+
Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "[[Sophomore slump|Sophomore Slump]]." For example, [[Carmelo Anthony]] of the [[National Basketball Association|NBA]]'s [[Denver Nuggets]] had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in 2005, Anthony's numbers had dropped from his rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any.
   
* Stephen M. Stigler. ''Statistics on the Table'', Harvard University Press, 1999. ''(See Chapter 9.)''
+
Regression to the mean in sports performance may be the reason for the "[[Sports Illustrated Cover Jinx]]" and the "[[Madden Curse]]." [[John Hollinger]] has an alternate name for the law of regression to the mean: the "fluke rule," while [[Bill James]] calls it the "Plexiglass Principle."
   
==External links==
+
Because popular lore has focused on "regression toward the mean" as an account of declining performance of athletes from one season to the next, it has usually overlooked the fact that such regression can also account for improved performance. For example, if one looks at the [[batting average]] of [[Major League Baseball]] players in one season, those whose batting average was above the league mean tend to regress downward toward the mean the following year, while those whose batting average was below the mean tend to progress upward toward the mean the following year.<ref>For an illustration see [[Nate Silver]], "Randomness: Catch the Fever!", [http://www.baseballprospectus.com/article.php?articleid=1897''[[Baseball Prospectus]]'', May 14, 2003].</ref>
   
  +
== Mathematics ==
  +
  +
Let x<sub>1</sub>, x<sub>2</sub>, . . .,x<sub>n</sub> be the first set of measurements and y<sub>1</sub>, y<sub>2</sub>, . . .,y<sub>n</sub> be the second set. Regression toward the mean tells us for all i, the expected value of y<sub>i</sub> is closer to <math>\overline{x}</math> (the mean of the x<sub>i</sub>'s) than x<sub>i</sub> is. We can write this as:
  +
  +
: <math> E(|y_i - \overline{x}|) < |x_i - \overline{x}| </math>
  +
  +
Where E() denotes the [[Expected value|expectation operator]]. We can also write:
  +
  +
: <math> 0 \leq E(\frac{y_i - \overline{x}}{x_i - \overline{x}}) < 1 </math>
  +
  +
which is stronger than the first inequality because it requires that the [[expected value]] of y<sub>i</sub> is on the same side of the mean as x<sub>i</sub>. A natural way to test this is to look at the values of:
  +
  +
: <math> \frac{y_i - \overline{x}}{x_i - \overline{x}} </math>
  +
  +
in the [[Sampling (statistics)|sample]]. Taking an [[arithmetic mean]] is not a good idea, because <math>x_i - \overline{x}</math> might be zero. Even if it's only close to zero, those points could dominate the calculation, when we're really concerned about larger movements of points farther from the [[mean]]. Suppose instead we take a [[weighted mean]], weighted by <math>(x_i - \overline{x})^2</math>:
  +
  +
: <math> \frac{\sum_{i=1}^{n}(y_i - \overline{x})(x_i - \overline{x})}{\sum_{i=1}^{n}(x_i - \overline{x})^2} </math>
  +
  +
which can be rewritten:
  +
  +
: <math> \frac{\sum_{i=1}^{n}y_ix_i-\overline{x}\sum_{i=1}^{n}y_i-\overline{x}\sum_{i=1}^{n}x_i+n\overline{x}^2}{\sum_{i=1}^{n}(x_i - \overline{x})^2} </math>
  +
  +
or:
  +
  +
: <math> \frac{\sum_{i=1}^{n}y_ix_i-n\overline{x}\ \overline{y}}{\sum_{i=1}^{n}(x_i - \overline{x})^2} </math>
  +
  +
which is the well-known formula for the [[Regression analysis|regression]] co-efficient <math>\beta</math>. Therefore, asserting that there is regression toward the mean can be interpreted as asserting:
  +
  +
: <math> 0 \leq \beta_{x,y} < 1 </math>
  +
  +
This will generally be true of two sets of measurements on the same [[Sampling (statistics)|sample]]. We would expect the standard deviation of the two sets of measurements to be the same, so the [[Regression analysis|regression]] co-efficient <math>\beta</math> is equal to the [[correlation]] co-efficient <math>\rho</math>. That's enough to tell us <math>\beta\leq 1</math> since <math>\rho\leq 1</math>. If the measurements are not perfect, we expect <math>\beta< 1</math>. However, if the measurements have any information content at all, <math>\rho > 0</math>, so <math>\beta > 0</math>. <math>\rho = 1</math> corresponds to the case of perfect measurement while <math>\rho = 0</math> corresponds to the case of the measurement being all error.
  +
  +
==See also==
  +
{{Statistics portal}}
  +
*[[Internal validity]]
  +
  +
==Notes==
  +
{{reflist}}
  +
  +
==References==
  +
* {{Cite journal
  +
| author = J.M. Bland and D.G. Altman
  +
| title = Statistic Notes: Regression towards the mean
  +
| journal = [[British Medical Journal]]
  +
| volume = 308
  +
| pages = 1499
  +
| year = 1994
  +
| month = June
  +
| url = http://bmj.bmjjournals.com/cgi/content/full/308/6942/1499
  +
| pmid = 8019287
  +
}} Article, including a diagram of Galton's original data.
  +
  +
* {{Cite journal
  +
| author = [[Francis Galton]]
  +
| title = Regression Towards Mediocrity in Hereditary Stature
  +
| journal = [[Journal of the Anthropological Institute]]
  +
| volume = 15
  +
| pages = 246&ndash;263
  +
| year = 1886
  +
| url = http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf
  +
}}
  +
  +
* {{Cite book
  +
| author = [[Stephen M. Stigler]]
  +
| title = Statistics on the Table
  +
| publisher = [[Harvard University Press]]
  +
| year = 1999
  +
}} See Chapter 9.
  +
  +
==External links==
  +
* [http://davidmlane.com/hyperstat/B153351.html A non-mathematical explanation of regression toward the mean.]
  +
* [http://onlinestatbook.com/stat_sim/reg_to_mean/index.html A simulation of regression toward the mean.]
 
* Amanda Wachsmuth, Leland Wilkinson, Gerard E. Dallal. [http://www.spss.com/research/wilkinson/Publications/galton.pdf Galton's Bend: An Undiscovered Nonlinearity in Galton's Family Stature Regression Data and a Likely Explanation Based on Pearson and Lee's Stature Data] ''(A modern look at Galton's analysis.)''
 
* Amanda Wachsmuth, Leland Wilkinson, Gerard E. Dallal. [http://www.spss.com/research/wilkinson/Publications/galton.pdf Galton's Bend: An Undiscovered Nonlinearity in Galton's Family Stature Regression Data and a Likely Explanation Based on Pearson and Lee's Stature Data] ''(A modern look at Galton's analysis.)''
   
* Massachusetts standardized test score regression: see [http://groups.google.com/groups?q=g:thl3845480903d&dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=93ikdr%24i20%241%40nnrp1.deja.com discussion in sci.stat.edu]
+
* Massachusetts standardized test scores, interpreted by a statistician as an example of regression: see [http://groups.google.com/groups?q=g:thl3845480903d&dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=93ikdr%24i20%241%40nnrp1.deja.com discussion in sci.stat.edu] and [http://groups.google.com/group/sci.stat.edu/tree/browse_frm/thread/c1086922ef405246/60bb528144835a38?rnum=21&hl=en&_done=%2Fgroup%2Fsci.sta its continuation].
  +
  +
* Kahneman's Nobel speech [http://nobelprize.virtual.museum/nobel_prizes/economics/laureates/2002/kahneman-autobio.html]
  +
  +
{{Statistics}}
  +
[[Category:Statistical terminology]]
  +
[[Category:Regression analysis]]
   
[[Category:Statistics]]
+
[[de:Regression zur Mitte]]
  +
[[es:Regresión (estadística)]]
  +
[[ja:平均への回帰]]
 
{{enWP|Regression toward the mean}}
 
{{enWP|Regression toward the mean}}

Latest revision as of 08:21, February 10, 2009

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory



File:Regression toward the mean.svg

Regression toward the mean[1][2] is a principle in statistics that states that if you take a pair of independent measurements from the same distribution, samples far from the mean on the first set will tend to be closer to the mean on the second set, and the farther from the mean on the first measurement, the stronger the effect. Regression to the mean relies on random variance affecting the measurement of any variable; this random variance will cause some samples to be extreme. On the second measurement, these samples will appear to regress because the random variance affecting the samples in the second measurement is independent of the random variance affecting the first. Thus, regression to the mean is a mathematical inevitability: any measurement of any variable that is affected by random variance must show regression to the mean.

For example, if you give a class of students a test on two successive days, the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because each sample is affected by random variance. Student scores are determined in part by underlying ability and in part by purely stochastic, unpredictable chance. On the first test, some will be lucky, and score higher than their ability, and some will be unlucky and score lower than their ability. The lucky ones are more likely to score above the mean than below it, because their good luck improves their score. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have average or below average luck. Therefore a student who was lucky on the first test is more likely to have a worse score on the second test than a better score. The students who score above the mean on the first test are more likely to be lucky than unlucky, and lucky students are more likely to see their score decline than go up, so students who score above the mean on the first test will tend to see their scores decline on the second test. By parallel reasoning, students who score below the mean on the first test will tend to see their scores increase on the second test. Students will regress toward the mean.

The magnitude of regression toward the mean depends on the ratio of error variance over the total variance within the sample. If a measurement is largely determined by random chance, then regression to the mean will be very large. If measurement is largely determined by known factors, regression to the mean will be less. In one extreme case, where all individuals are identical and all differences are caused by measurement error, there will be 100% regression toward the mean. If we ask 10,000 people to flip a fair coin ten times, the people who flipped ten heads the first time are expected to get five heads on a repeat experiment, the same as the people who flipped zero heads the first time. In the other extreme of perfect measurement, there is no regression toward the mean. We not only expect, we know, the second measurement will be the same as the first.

HistoryEdit

The concept of regression comes from genetics and was popularized by Sir Francis Galton in the late 19th century with the publication of Regression Towards Mediocrity in Hereditary Stature. Galton observed that extreme characteristics (e.g., height) in parents were not fully passed on to their offspring. Rather, the characteristic in the offspring regressed towards the a mediocre point (a point which has since been mathematically shown to be the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, "the average regression of the offspring is a constant fraction of their respective mid-parental deviations." This means that the difference between a child and her parents on some characteristic was proportional to her parents deviation from typical people in the population. So if her parents were each two inches taller than the averages for men and women, on average she would be shorter than her parents by some factor (which today we would call one minus the regression coefficient) times two inches. For height, Galton estimated this correlation coefficient to be around 2/3: the height of an individual will center around 2/3rds of the parents deviation.

Although Galton popularized the concept of regression, he fundamentally misunderstood the phenomenon; thus, his understanding of regression differs from that of modern statisticians. Galton's was correct in his observation that the characteristics of an individual are not fully determined by their parents; there must be another source. However, he explains this by arguing that, "A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large."[3] In other words, Galton believed that regression to the mean was simply an inheritance of characteristics from ancestors that are not expressed in the parents; he did not understand regression to the mean as a statistical phenomenon. In contrast to this view, it is now known that regression to the mean is a mathematical inevitability: if there is any random variance between the height of an individual and parents--if the correlation is not exactly equal to 1--then the predictions must regress to the mean regardless of the underlying mechanisms of inheritance, race or culture. Thus, Galton was attributing random variance in height to the ancestry of the individual. This fundamental misunderstanding of a purely mathematical phenomenon is a major motivating factor in the development of eugenics.

Why it matters Edit

The most important reason to care about regression toward the mean is for experimental design. Suppose you give physical exams to 1,000 55-year old males and score them on risk of having a heart attack. You take the 50 who scored at highest risk, and put them on a diet and exercise regime, and give them a drug. Even if the treatments are worthless, you expect the group to show improvement on their next physical exam due to regression toward the mean. The best way to combat this is to randomly divide the group into a treatment group that gets the treatment, and a control group that does not. We expect both groups to improve, the treatment should be judged effective only if the treatment group improves more than the control group.

On the opposite side, suppose you give a test to a group of disadvantaged ninth-graders to identify the ones with most college potential. You select the top 1% and supply them with special enrichment courses, tutoring, counseling and computers. Even if the program is effective, you might find that their scores decline on average when the test is repeated a year later. If it is considered unfair to have a control group, you can make a mathematical calculation to adjust for this effect, although that will not be as reliable as the control. Shrinkage is the name of the statistical technique to adjust for regression toward the mean (see also, Stein's example).

The effect can also be exploited for general inference and estimation. The hottest place in the country today is more likely to be cooler tomorrow than hotter. The best performing mutual fund over the last three years is more likely to see performance decline than improve over the next three years. The Hollywood star of the biggest box office success of this year more is likely to see a lower gross than a higher gross on her movie next year. The baseball player with the highest batting average by the All-Star break is more likely to have a lower average than a higher average over the second half of the season.

Warnings Edit

The concept of regression toward the mean can be misused very easily.

The Law of large numbers is an unrelated phenomenon often confused with regression toward the mean. Suppose you flip a coin 100 times and measure the frequency of heads. Then you flip the coin 100 more times. The frequency of heads over the entire 200 flips is likely to be closer to the mean than the frequency over the first 100 flips. This is different from regression toward the mean. In the first place, the frequency of heads over the second 100 flips is equally likely to be closer to or farther from the mean than the frequency of heads over the first 100 flips. It is a fallacy to think the second 100 flips has a tendency to even out the total. If the first 100 flips produces 5 heads more than expected, we expect to have 5 heads more than expected at the end of 200 flips as well. The average number of heads regresses toward the mean, but the number of heads does not. In the second place, this regression is toward the true mean, not the mean of the first 100 flips.

In the student test example above, it was implicitly assumed that what was being measured did not change between the two measurements. But suppose it was a pass/fail course and you had to score above 70 on both tests to pass. Then the students who scored under 70 the first time would have no incentive to do well, and might score worse on average the second time. The students just over 70, on the other hand, would have a strong incentive to study overnight and concentrate while taking the test. In that case you might see movement away from 70, scores below it getting lower and scores above it getting higher. It is possible for changes between the measurement times to augment, offset or reverse the statistical tendency to regress toward the mean. Do not confuse causal regression toward the mean (or away from it) with the statistical phenomenon.

The opposite point is even more important. Do not think of statistical regression toward the mean as a causal phenomenon. If you are the student with the worst score on the first day's exam, there is no invisible hand to lift up your score on the second day, without effort from you. If you know you scored in line with your ability, you are equally likely to score better or worse on the second test. On average the worst scorers improve, but that's only true because the worst scorers are more likely to have been unlucky than lucky. You know how lucky or unlucky you were, so regression toward the mean is irrelevant from your point of view.

Although individual measurements regress toward the mean, the second sample of measurements will be no closer to the mean than the first. Consider the students again. Suppose their tendency is to regress 10% of the way toward the mean of 80, so a student who scored 100 the first day is expected to score 98 the second day, and a student who scored 70 the first day is expected to score 71 the second day. Those expectations are closer to the mean, on average, than the first day scores. But the second day scores will vary around their expectations, some will be higher and some will be lower. This will make the second set of measurements farther from the mean, on average, than their expectations. The effect is the exact reverse of regression toward the mean, and exactly offsets it. So for every individual, we expect the second score to be closer to the mean than the first score, but for all individuals, we expect the average distance from the mean to be the same on both sets of measurements.

Related to the point above, regression toward the mean works equally well in both directions. We expect the student with the highest test score on the second day to have done worse on the first day. And if we compare the best student on the first day to the best student on the second day, regardless of whether it is the same individual or not, there is no tendency to regress toward the mean. We expect the best scores on both days to be equally far from the mean.

Also related to the above point, if we pick a point close to the mean on the first set of measurements, we may expect it to be farther from the mean on the second set. The expected value of the second measurement is closer to the mean than the point, but the measurement error will move it on average farther away. That is, the expected value of the distance from the mean on the second measurement is greater than the distance from the mean on the first measurement.

Regression toward everything Edit

Notice that in the informal explanation given above for the phenomenon, there was nothing special about the mean. We could pick any point within the sample range and make the same argument: students who scored above this value were more likely to have been lucky than unlucky, students who scored below this value were more likely to have been unlucky than lucky. How can individuals regress toward every point in the sample range at once? The answer is each individual is pulled toward every point in the sample range, but to different degrees.

For a physical analogy, every mass in the solar system is pulled toward every other mass by gravitation, but the net effect for planets is to be pulled toward the center of mass of the entire solar system. This illustrates an important point. Individuals on Earth at noon are pulled toward the Earth, away from the Sun and the center of mass of the solar system. Similarly, an individual in a sample might be pulled toward a subgroup mean more strongly than to the sample mean, and even pulled away from the sample mean. Consider, for example, the pitcher with the highest batting average in the National League by the All-Star break, and assume his batting average is below the average for all National League players. His batting average over the second half of the season will regress up toward the mean of all players, and down toward the mean of all pitchers. For that matter, if he is left-handed he is pulled toward the mean of all left-handers, if he is a rookie he is pulled to the mean of all rookies, and so on. Which of these effects dominates depends on the data under consideration.

The concept does not apply, however, to supersets. While the pitcher above may be pulled to the mean of all humans, or the mean of all things made of matter, our sample does not give us estimates of those means.

In general, you can expect the net effect of regressions toward all points to pull an individual toward the closest mode of the distribution. If you have information about subgroups, and the subgroup means are far apart relative to the differences between individuals, you can expect individuals to be pulled toward subgroup means, even if those do not show up as modes of the distribution. For unimodal distributions, without strong subgroup effects or asymmetries, individuals will likely be pulled toward the mean, median and mode which should be close together. For bimodal and multimodal distributions, asymmetric distributions or data with strong subgroup effects, regression toward the mean should be applied with caution.

Regression fallacies Edit

Main article: regression fallacy

Misunderstandings of the principle (known as "regression fallacies") have repeatedly led to mistaken claims in the scientific literature.

An extreme example is Horace Secrist's 1933 book The Triumph of Mediocrity in Business, in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend toward the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer, Harold Hotelling, likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".[4]

The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.

The psychologist Daniel Kahneman referred to regression to the mean in his speech when he won the 2002 Bank of Sweden prize for economics.

I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, "On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don't tell us that reinforcement works and punishment does not, because the opposite is the case." This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.

UK law enforcement policies have encouraged the visible siting of static or mobile speed cameras at accident blackspots. This policy was justified by a perception that there is a corresponding reduction in serious road traffic accidents after a camera is set up. However, statisticians have pointed out that, although there is a net benefit in lives saved, failure to take into account the effects of regression to the mean results in the beneficial effects' being overstated. It is thus claimed that some of the money currently spent on traffic cameras could be more productively directed elsewhere.[5]

Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "Sophomore Slump." For example, Carmelo Anthony of the NBA's Denver Nuggets had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in 2005, Anthony's numbers had dropped from his rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any.

Regression to the mean in sports performance may be the reason for the "Sports Illustrated Cover Jinx" and the "Madden Curse." John Hollinger has an alternate name for the law of regression to the mean: the "fluke rule," while Bill James calls it the "Plexiglass Principle."

Because popular lore has focused on "regression toward the mean" as an account of declining performance of athletes from one season to the next, it has usually overlooked the fact that such regression can also account for improved performance. For example, if one looks at the batting average of Major League Baseball players in one season, those whose batting average was above the league mean tend to regress downward toward the mean the following year, while those whose batting average was below the mean tend to progress upward toward the mean the following year.[6]

Mathematics Edit

Let x1, x2, . . .,xn be the first set of measurements and y1, y2, . . .,yn be the second set. Regression toward the mean tells us for all i, the expected value of yi is closer to \overline{x} (the mean of the xi's) than xi is. We can write this as:

 E(|y_i - \overline{x}|) < |x_i - \overline{x}|

Where E() denotes the expectation operator. We can also write:

 0 \leq E(\frac{y_i - \overline{x}}{x_i - \overline{x}}) < 1

which is stronger than the first inequality because it requires that the expected value of yi is on the same side of the mean as xi. A natural way to test this is to look at the values of:

 \frac{y_i - \overline{x}}{x_i - \overline{x}}

in the sample. Taking an arithmetic mean is not a good idea, because x_i - \overline{x} might be zero. Even if it's only close to zero, those points could dominate the calculation, when we're really concerned about larger movements of points farther from the mean. Suppose instead we take a weighted mean, weighted by (x_i - \overline{x})^2:

 \frac{\sum_{i=1}^{n}(y_i - \overline{x})(x_i - \overline{x})}{\sum_{i=1}^{n}(x_i - \overline{x})^2}

which can be rewritten:

 \frac{\sum_{i=1}^{n}y_ix_i-\overline{x}\sum_{i=1}^{n}y_i-\overline{x}\sum_{i=1}^{n}x_i+n\overline{x}^2}{\sum_{i=1}^{n}(x_i - \overline{x})^2}

or:

 \frac{\sum_{i=1}^{n}y_ix_i-n\overline{x}\ \overline{y}}{\sum_{i=1}^{n}(x_i - \overline{x})^2}

which is the well-known formula for the regression co-efficient \beta. Therefore, asserting that there is regression toward the mean can be interpreted as asserting:

 0 \leq \beta_{x,y} < 1

This will generally be true of two sets of measurements on the same sample. We would expect the standard deviation of the two sets of measurements to be the same, so the regression co-efficient \beta is equal to the correlation co-efficient \rho. That's enough to tell us \beta\leq 1 since \rho\leq 1. If the measurements are not perfect, we expect \beta< 1. However, if the measurements have any information content at all, \rho > 0, so \beta > 0. \rho = 1 corresponds to the case of perfect measurement while \rho = 0 corresponds to the case of the measurement being all error.

See alsoEdit

Template:Statistics portal

NotesEdit

  1. Howard Raiffa and Robert Schlaifer, Applied Statistical Decision Theory, Wiley-Interscience (2000) ISBN: 978-0471383499
  2. George Casella and Roger L. Berger, Statistical Inference, Duxbury Press (2001) ISBN: 978-0534243128
  3. Galton, F. (1886). Regression Toward Mediocrity in Hereditary Stature. Nature.
  4. Hotelling, H. (1933). Review of The triumph of mediocrity in business by Secrist, H., Journal of the American Statistical Association, 28, 433-435.
  5. The Times, 16 December 2005 Speed camera benefits overrated
  6. For an illustration see Nate Silver, "Randomness: Catch the Fever!", Baseball Prospectus, May 14, 2003.

ReferencesEdit

External linksEdit

  • Kahneman's Nobel speech [1]



de:Regression zur Mitte es:Regresión (estadística)

This page uses Creative Commons Licensed content from Wikipedia (view authors).

Around Wikia's network

Random Wiki