# Regression toward the mean

## Redirected from Regression to the mean

34,200pages on
this wiki

Regression toward the mean[1][2] is a principle in statistics that states that if you take a pair of independent measurements from the same distribution, samples far from the mean on the first set will tend to be closer to the mean on the second set, and the farther from the mean on the first measurement, the stronger the effect. Regression to the mean relies on random variance affecting the measurement of any variable; this random variance will cause some samples to be extreme. On the second measurement, these samples will appear to regress because the random variance affecting the samples in the second measurement is independent of the random variance affecting the first. Thus, regression to the mean is a mathematical inevitability: any measurement of any variable that is affected by random variance must show regression to the mean.

For example, if you give a class of students a test on two successive days, the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because each sample is affected by random variance. Student scores are determined in part by underlying ability and in part by purely stochastic, unpredictable chance. On the first test, some will be lucky, and score higher than their ability, and some will be unlucky and score lower than their ability. The lucky ones are more likely to score above the mean than below it, because their good luck improves their score. Some of the lucky students on the first test will be lucky again on the second test, but more of them will have average or below average luck. Therefore a student who was lucky on the first test is more likely to have a worse score on the second test than a better score. The students who score above the mean on the first test are more likely to be lucky than unlucky, and lucky students are more likely to see their score decline than go up, so students who score above the mean on the first test will tend to see their scores decline on the second test. By parallel reasoning, students who score below the mean on the first test will tend to see their scores increase on the second test. Students will regress toward the mean.

The magnitude of regression toward the mean depends on the ratio of error variance over the total variance within the sample. If a measurement is largely determined by random chance, then regression to the mean will be very large. If measurement is largely determined by known factors, regression to the mean will be less. In one extreme case, where all individuals are identical and all differences are caused by measurement error, there will be 100% regression toward the mean. If we ask 10,000 people to flip a fair coin ten times, the people who flipped ten heads the first time are expected to get five heads on a repeat experiment, the same as the people who flipped zero heads the first time. In the other extreme of perfect measurement, there is no regression toward the mean. We not only expect, we know, the second measurement will be the same as the first.

## HistoryEdit

The concept of regression comes from genetics and was popularized by Sir Francis Galton in the late 19th century with the publication of Regression Towards Mediocrity in Hereditary Stature. Galton observed that extreme characteristics (e.g., height) in parents were not fully passed on to their offspring. Rather, the characteristic in the offspring regressed towards the a mediocre point (a point which has since been mathematically shown to be the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, "the average regression of the offspring is a constant fraction of their respective mid-parental deviations." This means that the difference between a child and her parents on some characteristic was proportional to her parents deviation from typical people in the population. So if her parents were each two inches taller than the averages for men and women, on average she would be shorter than her parents by some factor (which today we would call one minus the regression coefficient) times two inches. For height, Galton estimated this correlation coefficient to be around 2/3: the height of an individual will center around 2/3rds of the parents deviation.

Although Galton popularized the concept of regression, he fundamentally misunderstood the phenomenon; thus, his understanding of regression differs from that of modern statisticians. Galton's was correct in his observation that the characteristics of an individual are not fully determined by their parents; there must be another source. However, he explains this by arguing that, "A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large."[3] In other words, Galton believed that regression to the mean was simply an inheritance of characteristics from ancestors that are not expressed in the parents; he did not understand regression to the mean as a statistical phenomenon. In contrast to this view, it is now known that regression to the mean is a mathematical inevitability: if there is any random variance between the height of an individual and parents--if the correlation is not exactly equal to 1--then the predictions must regress to the mean regardless of the underlying mechanisms of inheritance, race or culture. Thus, Galton was attributing random variance in height to the ancestry of the individual. This fundamental misunderstanding of a purely mathematical phenomenon is a major motivating factor in the development of eugenics.

## Why it matters Edit

The most important reason to care about regression toward the mean is for experimental design. Suppose you give physical exams to 1,000 55-year old males and score them on risk of having a heart attack. You take the 50 who scored at highest risk, and put them on a diet and exercise regime, and give them a drug. Even if the treatments are worthless, you expect the group to show improvement on their next physical exam due to regression toward the mean. The best way to combat this is to randomly divide the group into a treatment group that gets the treatment, and a control group that does not. We expect both groups to improve, the treatment should be judged effective only if the treatment group improves more than the control group.

On the opposite side, suppose you give a test to a group of disadvantaged ninth-graders to identify the ones with most college potential. You select the top 1% and supply them with special enrichment courses, tutoring, counseling and computers. Even if the program is effective, you might find that their scores decline on average when the test is repeated a year later. If it is considered unfair to have a control group, you can make a mathematical calculation to adjust for this effect, although that will not be as reliable as the control. Shrinkage is the name of the statistical technique to adjust for regression toward the mean (see also, Stein's example).

The effect can also be exploited for general inference and estimation. The hottest place in the country today is more likely to be cooler tomorrow than hotter. The best performing mutual fund over the last three years is more likely to see performance decline than improve over the next three years. The Hollywood star of the biggest box office success of this year more is likely to see a lower gross than a higher gross on her movie next year. The baseball player with the highest batting average by the All-Star break is more likely to have a lower average than a higher average over the second half of the season.

## Warnings Edit

The concept of regression toward the mean can be misused very easily.

The Law of large numbers is an unrelated phenomenon often confused with regression toward the mean. Suppose you flip a coin 100 times and measure the frequency of heads. Then you flip the coin 100 more times. The frequency of heads over the entire 200 flips is likely to be closer to the mean than the frequency over the first 100 flips. This is different from regression toward the mean. In the first place, the frequency of heads over the second 100 flips is equally likely to be closer to or farther from the mean than the frequency of heads over the first 100 flips. It is a fallacy to think the second 100 flips has a tendency to even out the total. If the first 100 flips produces 5 heads more than expected, we expect to have 5 heads more than expected at the end of 200 flips as well. The average number of heads regresses toward the mean, but the number of heads does not. In the second place, this regression is toward the true mean, not the mean of the first 100 flips.

In the student test example above, it was implicitly assumed that what was being measured did not change between the two measurements. But suppose it was a pass/fail course and you had to score above 70 on both tests to pass. Then the students who scored under 70 the first time would have no incentive to do well, and might score worse on average the second time. The students just over 70, on the other hand, would have a strong incentive to study overnight and concentrate while taking the test. In that case you might see movement away from 70, scores below it getting lower and scores above it getting higher. It is possible for changes between the measurement times to augment, offset or reverse the statistical tendency to regress toward the mean. Do not confuse causal regression toward the mean (or away from it) with the statistical phenomenon.

The opposite point is even more important. Do not think of statistical regression toward the mean as a causal phenomenon. If you are the student with the worst score on the first day's exam, there is no invisible hand to lift up your score on the second day, without effort from you. If you know you scored in line with your ability, you are equally likely to score better or worse on the second test. On average the worst scorers improve, but that's only true because the worst scorers are more likely to have been unlucky than lucky. You know how lucky or unlucky you were, so regression toward the mean is irrelevant from your point of view.

Although individual measurements regress toward the mean, the second sample of measurements will be no closer to the mean than the first. Consider the students again. Suppose their tendency is to regress 10% of the way toward the mean of 80, so a student who scored 100 the first day is expected to score 98 the second day, and a student who scored 70 the first day is expected to score 71 the second day. Those expectations are closer to the mean, on average, than the first day scores. But the second day scores will vary around their expectations, some will be higher and some will be lower. This will make the second set of measurements farther from the mean, on average, than their expectations. The effect is the exact reverse of regression toward the mean, and exactly offsets it. So for every individual, we expect the second score to be closer to the mean than the first score, but for all individuals, we expect the average distance from the mean to be the same on both sets of measurements.

Related to the point above, regression toward the mean works equally well in both directions. We expect the student with the highest test score on the second day to have done worse on the first day. And if we compare the best student on the first day to the best student on the second day, regardless of whether it is the same individual or not, there is no tendency to regress toward the mean. We expect the best scores on both days to be equally far from the mean.

Also related to the above point, if we pick a point close to the mean on the first set of measurements, we may expect it to be farther from the mean on the second set. The expected value of the second measurement is closer to the mean than the point, but the measurement error will move it on average farther away. That is, the expected value of the distance from the mean on the second measurement is greater than the distance from the mean on the first measurement.

## Regression toward everything Edit

Notice that in the informal explanation given above for the phenomenon, there was nothing special about the mean. We could pick any point within the sample range and make the same argument: students who scored above this value were more likely to have been lucky than unlucky, students who scored below this value were more likely to have been unlucky than lucky. How can individuals regress toward every point in the sample range at once? The answer is each individual is pulled toward every point in the sample range, but to different degrees.

For a physical analogy, every mass in the solar system is pulled toward every other mass by gravitation, but the net effect for planets is to be pulled toward the center of mass of the entire solar system. This illustrates an important point. Individuals on Earth at noon are pulled toward the Earth, away from the Sun and the center of mass of the solar system. Similarly, an individual in a sample might be pulled toward a subgroup mean more strongly than to the sample mean, and even pulled away from the sample mean. Consider, for example, the pitcher with the highest batting average in the National League by the All-Star break, and assume his batting average is below the average for all National League players. His batting average over the second half of the season will regress up toward the mean of all players, and down toward the mean of all pitchers. For that matter, if he is left-handed he is pulled toward the mean of all left-handers, if he is a rookie he is pulled to the mean of all rookies, and so on. Which of these effects dominates depends on the data under consideration.

The concept does not apply, however, to supersets. While the pitcher above may be pulled to the mean of all humans, or the mean of all things made of matter, our sample does not give us estimates of those means.

In general, you can expect the net effect of regressions toward all points to pull an individual toward the closest mode of the distribution. If you have information about subgroups, and the subgroup means are far apart relative to the differences between individuals, you can expect individuals to be pulled toward subgroup means, even if those do not show up as modes of the distribution. For unimodal distributions, without strong subgroup effects or asymmetries, individuals will likely be pulled toward the mean, median and mode which should be close together. For bimodal and multimodal distributions, asymmetric distributions or data with strong subgroup effects, regression toward the mean should be applied with caution.

## Regression fallacies Edit

Main article: regression fallacy

Misunderstandings of the principle (known as "regression fallacies") have repeatedly led to mistaken claims in the scientific literature.

An extreme example is Horace Secrist's 1933 book The Triumph of Mediocrity in Business, in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend toward the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer, Harold Hotelling, likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".[4]

The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.

The psychologist Daniel Kahneman referred to regression to the mean in his speech when he won the 2002 Bank of Sweden prize for economics.

 “ I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, "On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don't tell us that reinforcement works and punishment does not, because the opposite is the case." This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency. ”

UK law enforcement policies have encouraged the visible siting of static or mobile speed cameras at accident blackspots. This policy was justified by a perception that there is a corresponding reduction in serious road traffic accidents after a camera is set up. However, statisticians have pointed out that, although there is a net benefit in lives saved, failure to take into account the effects of regression to the mean results in the beneficial effects' being overstated. It is thus claimed that some of the money currently spent on traffic cameras could be more productively directed elsewhere.[5]

Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "Sophomore Slump." For example, Carmelo Anthony of the NBA's Denver Nuggets had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in 2005, Anthony's numbers had dropped from his rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any.

Regression to the mean in sports performance may be the reason for the "Sports Illustrated Cover Jinx" and the "Madden Curse." John Hollinger has an alternate name for the law of regression to the mean: the "fluke rule," while Bill James calls it the "Plexiglass Principle."

Because popular lore has focused on "regression toward the mean" as an account of declining performance of athletes from one season to the next, it has usually overlooked the fact that such regression can also account for improved performance. For example, if one looks at the batting average of Major League Baseball players in one season, those whose batting average was above the league mean tend to regress downward toward the mean the following year, while those whose batting average was below the mean tend to progress upward toward the mean the following year.[6]

## Mathematics Edit

Let x1, x2, . . .,xn be the first set of measurements and y1, y2, . . .,yn be the second set. Regression toward the mean tells us for all i, the expected value of yi is closer to $\overline{x}$ (the mean of the xi's) than xi is. We can write this as:

$E(|y_i - \overline{x}|) < |x_i - \overline{x}|$

Where E() denotes the expectation operator. We can also write:

$0 \leq E(\frac{y_i - \overline{x}}{x_i - \overline{x}}) < 1$

which is stronger than the first inequality because it requires that the expected value of yi is on the same side of the mean as xi. A natural way to test this is to look at the values of:

$\frac{y_i - \overline{x}}{x_i - \overline{x}}$

in the sample. Taking an arithmetic mean is not a good idea, because $x_i - \overline{x}$ might be zero. Even if it's only close to zero, those points could dominate the calculation, when we're really concerned about larger movements of points farther from the mean. Suppose instead we take a weighted mean, weighted by $(x_i - \overline{x})^2$:

$\frac{\sum_{i=1}^{n}(y_i - \overline{x})(x_i - \overline{x})}{\sum_{i=1}^{n}(x_i - \overline{x})^2}$

which can be rewritten:

$\frac{\sum_{i=1}^{n}y_ix_i-\overline{x}\sum_{i=1}^{n}y_i-\overline{x}\sum_{i=1}^{n}x_i+n\overline{x}^2}{\sum_{i=1}^{n}(x_i - \overline{x})^2}$

or:

$\frac{\sum_{i=1}^{n}y_ix_i-n\overline{x}\ \overline{y}}{\sum_{i=1}^{n}(x_i - \overline{x})^2}$

which is the well-known formula for the regression co-efficient $\beta$. Therefore, asserting that there is regression toward the mean can be interpreted as asserting:

$0 \leq \beta_{x,y} < 1$

This will generally be true of two sets of measurements on the same sample. We would expect the standard deviation of the two sets of measurements to be the same, so the regression co-efficient $\beta$ is equal to the correlation co-efficient $\rho$. That's enough to tell us $\beta\leq 1$ since $\rho\leq 1$. If the measurements are not perfect, we expect $\beta< 1$. However, if the measurements have any information content at all, $\rho > 0$, so $\beta > 0$. $\rho = 1$ corresponds to the case of perfect measurement while $\rho = 0$ corresponds to the case of the measurement being all error.

## NotesEdit

1. Howard Raiffa and Robert Schlaifer, Applied Statistical Decision Theory, Wiley-Interscience (2000) ISBN: 978-0471383499
2. George Casella and Roger L. Berger, Statistical Inference, Duxbury Press (2001) ISBN: 978-0534243128
3. Galton, F. (1886). Regression Toward Mediocrity in Hereditary Stature. Nature.
4. Hotelling, H. (1933). Review of The triumph of mediocrity in business by Secrist, H., Journal of the American Statistical Association, 28, 433-435.
5. For an illustration see Nate Silver, "Randomness: Catch the Fever!", Baseball Prospectus, May 14, 2003.