Variance
From Psychology Wiki
Community portal · Tasks to do · News · Help
Clinical · Educational · Ind&Org · Other fields · Professional · Transpersonal · World
Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Personality | Philosophy | Research Methods | Social | Statistics
Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory
In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant. The variance of a random variable is the square of its standard deviation.
Contents |
[edit] Definition
If μ = E(X) is the expected value (mean) of the random variable X, then the variance is
That is, it is the expected value of the square of the deviation of X from its own mean. In plain language, it can be expressed as "The average of the square of the distance of each data point from the mean". It is thus the mean squared deviation. The variance of random variable X is typically designated as
,
, or simply
.
Note that the above definition can be used for both discrete and continuous random variables.
Many distributions, such as the Cauchy distribution, do not have a variance because the relevant integral diverges. In particular, if a distribution does not have expected value, it does not have variance either. The opposite is not true: there are distributions for which expected value exists, but variance does not.
[edit] Properties
If the variance is defined, we can conclude that it is never negative because the squares are positive or zero. The unit of variance is the square of the unit of observation. For example, the variance of a set of heights measured in centimeters will be given in square centimeters. This fact is inconvenient and has motivated many statisticians to instead use the square root of the variance, known as the standard deviation, as a summary of dispersion.
It can be proven easily from the definition that the variance does not depend on the mean value
. That is, if the variable is "displaced" an amount b by taking X+b, the variance of the resulting random variable is left untouched. By contrast, if the variable is multiplied by a scaling factor a, the variance is multiplied by a2. More formally, if a and b are real constants and X is a random variable whose variance is defined,
Another formula for the variance that follows in a straightforward manner from the linearity of expected values and the above definition is:
This is often used to calculate the variance in practice.
One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum (or the difference) of independent random variables is the sum of their variances. A weaker condition than independence, called uncorrelatedness also suffices. In general,
Here
is the covariance, which is zero for independent random variables (if it exists).
[edit] Approximating the variance of a function
The Delta method uses second-order Taylor expansions to approximate the variance of a function of one or more random variables. For example, the approximate variance of a function of one variable is given by
provided that
is twice differentiable and that the mean and variance of
are finite.
[edit] Population variance and sample variance
In general, the population variance of a finite population is given by
where
is the population mean. This is merely a special case of the general definition of variance introduced above, but restricted to finite populations.
In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with large finite populations, it is almost never possible to find the exact value of the population variance, due to time, cost, and other resource constraints. When dealing with infinite populations, this is generally impossible.
A common method of estimating the variance of large (finite or infinite) populations is sampling. We start with a finite sample of values taken from the overall population. Suppose that our sample is the sequence
. There are two distinct things we can do with this sample: first, we can treat it as a finite population and describe its variance; second, we can estimate the underlying population variance from this sample.
The variance of the sample
, viewed as a finite population, is
where
is the sample mean. This is sometimes known as the sample variance; however, that term is ambiguous. Some electronic calculators can calculate
at the press of a button, in which case that button is usually labelled "
".
When using the sample
to estimate the variance of the underlying larger population the sample was drawn from, it may be tempting to equate the population variance with
. However,
is a biased estimator of the population variance. The following is an unbiased estimator:
where
is the sample mean. Note that the term
in the denominator above contrasts with the equation for
, which has
in the denominator. Note that
is generally not identical to the true population variance; it is merely an estimate, though perhaps a very good one if
is large. Because
is a variance estimate and is based on a finite sample, it too is sometimes referred to as the sample variance.
One common source of confusion is that the term sample variance may refer to either the unbiased estimator
of the population variance, or to the variance
of the sample viewed as a finite population. Both can be used to estimate the true population variance, but
is unbiased. Intuitively, computing the variance by dividing by
instead of
underestimates the population variance. This is because we are using the sample mean
as an estimate of the unknown population mean
, and the raw counts of repeated elements in the sample instead of the unknown true probabilities.
In practice, for large
, the distinction is often a minor one. In the course of statistical measurements, sample sizes so small as to warrant the use of the unbiased variance
virtually never occur. In this context Press et al.[1] commented that if the difference between n and n−1 ever matters to you, then you are probably up to no good anyway - e.g., trying to substantiate a questionable hypothesis with marginal data.
[edit] An unbiased estimator
We will demonstrate why
is an unbiased estimator of the population variance. An estimator
for a parameter
is unbiased if
. Therefore, to prove that
is unbiased, we will show that
. As an assumption, the population which the
are drawn from has mean
and variance
.
- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./d8a80be0ac950018993b04885f352
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \operatorname{E}
\left\{ \frac{1}{n-1} \sum_{i=1}^n \left( x_i - \ove...
! Missing $ inserted.
$
l.8
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.9 \end{equation*}
! Missing $ inserted.
$
l.9 \end{equation*}
! Display math should end with $$.
\endgroup
l.9 \end{equation*}
[1] (./d8a80be0ac950018993b04885f352.aux) ) (see the transcript file for additional information) Output written on d8a80be0ac950018993b04885f352.dvi (1 page, 708 bytes).
Transcript written on d8a80be0ac950018993b04885f352.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./44d6ea6e121a43a2c335f9a40d689
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ \left( x_i - \ove...
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ \left( x_i - \ove...
! Missing $ inserted.
$
l.8
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.9 \end{equation*}
! Missing $ inserted.
$
l.9 \end{equation*}
! Display math should end with $$.
\endgroup
l.9 \end{equation*}
! Missing } inserted.
}
l.9 \end{equation*}
[1] (./44d6ea6e121a43a2c335f9a40d689.aux) ) (see the transcript file for additional information) Output written on 44d6ea6e121a43a2c335f9a40d689.dvi (1 page, 700 bytes).
Transcript written on 44d6ea6e121a43a2c335f9a40d689.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./d1027f5485fc08e18e195777c7921
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ \left( (x_i - \mu...
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ \left( (x_i - \mu...
! Missing $ inserted.
$
l.8
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.9 \end{equation*}
! Missing $ inserted.
$
l.9 \end{equation*}
! Display math should end with $$.
\endgroup
l.9 \end{equation*}
! Missing } inserted.
}
l.9 \end{equation*}
[1] (./d1027f5485fc08e18e195777c7921.aux) ) (see the transcript file for additional information) Output written on d1027f5485fc08e18e195777c7921.dvi (1 page, 712 bytes).
Transcript written on d1027f5485fc08e18e195777c7921.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./e5654c59a66e290e8b487210f37ad
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ (x_i - \mu)^2 \ri...
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \operatorname{E} \left\{ (x_i - \mu)^2 \ri...
! Missing $ inserted.
$
l.8
! Missing $ inserted.
$
l.9 - 2 \operatorname{E}
\left\{ (x_i - \mu) (\overline{x} - \mu) \right\}
! Missing $ inserted.
$
l.10
! Missing $ inserted.
$
l.11 + \operatorname{E}
\left\{ (\overline{x} - \mu) ^ 2 \right\}
! Missing $ inserted.
$
l.12
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.13 \end{equation*}
! Missing $ inserted.
$
l.13 \end{equation*}
! Display math should end with $$.
\endgroup
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
[1] (./e5654c59a66e290e8b487210f37ad.aux) ) (see the transcript file for additional information) Output written on e5654c59a66e290e8b487210f37ad.dvi (1 page, 892 bytes).
Transcript written on e5654c59a66e290e8b487210f37ad.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./7816035b7bfeaf1a655f966b4247c
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \sigma^2
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \sigma^2
! Missing $ inserted.
$
l.8
! Missing $ inserted.
$
l.9 - 2 \left
( \frac{1}{n} \sum_{j=1}^n \operatorname{E} \left\{ (x_i - \mu)...
! Missing $ inserted.
$
l.10
! Missing $ inserted.
$
l.11 + \frac{1}{n^2}
\sum_{j=1}^n \sum_{k=1}^n \operatorname{E} \left\{ (x_j...
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.11 + \frac{1}{n^2}
\sum_{j=1}^n \sum_{k=1}^n \operatorname{E} \left\{ (x_j...
! Missing $ inserted.
$
l.12
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.13 \end{equation*}
! Missing $ inserted.
$
l.13 \end{equation*}
! Display math should end with $$.
\endgroup
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
[1] (./7816035b7bfeaf1a655f966b4247c.aux) ) (see the transcript file for additional information) Output written on 7816035b7bfeaf1a655f966b4247c.dvi (1 page, 1064 bytes).
Transcript written on 7816035b7bfeaf1a655f966b4247c.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./78fb5071d4d4e78223c64b82c761d
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \sigma^2
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{1}{n-1}
\sum_{i=1}^n \sigma^2
! Missing $ inserted.
$
l.8
! Missing $ inserted.
$
l.9 - \frac{2 \sigma^2}{n}
! Missing $ inserted.
$
l.9 - \frac{2 \sigma^2}{n}
! Missing $ inserted.
$
l.9 - \frac{2 \sigma^2}{n}
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.9 - \frac{2 \sigma^2}{n}
! Missing $ inserted.
$
l.10
! Missing $ inserted.
$
l.11 + \frac{\sigma^2}{n}
! Missing $ inserted.
$
l.11 + \frac{\sigma^2}{n}
! Missing $ inserted.
$
l.11 + \frac{\sigma^2}{n}
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.11 + \frac{\sigma^2}{n}
! Missing $ inserted.
$
l.12
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.13 \end{equation*}
! Missing $ inserted.
$
l.13 \end{equation*}
! Display math should end with $$.
\endgroup
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
! Missing } inserted.
}
l.13 \end{equation*}
[1] (./78fb5071d4d4e78223c64b82c761d.aux) ) (see the transcript file for additional information) Output written on 78fb5071d4d4e78223c64b82c761d.dvi (1 page, 724 bytes).
Transcript written on 78fb5071d4d4e78223c64b82c761d.log.- WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
- WikiTeX: latex reported a failure, namely:
entering extended mode
(./f40198418831d64008ad75e3efd1c
LaTeX2e <2003/12/01>
Babel
$
l.6
! Display math should end with $$.
\par
l.6
! Missing $ inserted.
$
l.7 = \frac{(n-1)\sigma^2}{n-1}
= \sigma^2
! Missing $ inserted.
$
l.7 = \frac{(n-1)\sigma^2}{n-1}
= \sigma^2
! Missing $ inserted.
$
l.7 = \frac{(n-1)\sigma^2}{n-1}
= \sigma^2
! Extra }, or forgotten $. \frac #1#2->{\begingroup #1\endgroup \@@over #2}
l.7 = \frac{(n-1)\sigma^2}{n-1}
= \sigma^2
! Missing $ inserted.
$
l.8
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation.
Type H
...
l.9 \end{equation*}
! Missing $ inserted.
$
l.9 \end{equation*}
! Display math should end with $$.
\endgroup
l.9 \end{equation*}
! Missing } inserted.
}
l.9 \end{equation*}
[1] (./f40198418831d64008ad75e3efd1c.aux) ) (see the transcript file for additional information) Output written on f40198418831d64008ad75e3efd1c.dvi (1 page, 492 bytes).
Transcript written on f40198418831d64008ad75e3efd1c.log.See also algorithms for calculating variance.
[edit] Alternate proof
[edit] Generalizations
If X is a vector-valued random variable, with values in Rn, and thought of as a column vector, then the natural generalization of variance is E[(X − μ)(X − μ)T], where μ = E(X) and XT is the transpose of X, and so is a row vector. This variance is a nonnegative-definite square matrix, commonly referred to as the covariance matrix.
If X is a complex-valued random variable, then its variance is E[(X − μ)(X − μ)*], where X* is the complex conjugate of X. This variance is a nonnegative real number.
[edit] History
The term variance was first introduced by Ronald Fisher in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance.
[edit] Moment of inertia
The variance of a probability distribution is analagous to the moment of inertia in classical mechanics of a corresponding linear mass distribution, with respect to rotation about its center of mass. It is because of this analogy that such things as the variance are called moments of probability distributions.
[edit] See also
- an inequality on location and scale parameters
- expected value
- kurtosis
- law of total variance
- skewness
- semivariance
- standard deviation
- statistical dispersion
[edit] References
- ^ Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1986) Numerical recipes: The art of scientific computing. Cambridge: Cambridge University Press. (online)de:Varianz
es:Varianza fr:Variance gl:Varianzalt:Variacija nl:Variantieno:Varianspt:Variância su:Varian sv:Varianszh:方差
| This page uses content from the English-language version of Wikipedia. The original article was at Variance. The list of authors can be seen in the page history. As with Psychology Wiki, the text of Wikipedia is available under the GNU Free Documentation License. |















