Normal distribution
From Psychology Wiki
Community portal · Tasks to do · News · Help
Clinical · Educational · Ind&Org · Other fields · Professional · Transpersonal · World
Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Personality | Philosophy | Research Methods | Social | Statistics
Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory
| Probability density function The green line is the standard normal distribution | |
| Cumulative distribution function Colors match the pdf above | |
| Parameters | location (real) squared scale (real)
|
| Support |
|
| |
| cdf |
|
| Mean |
|
| Median |
|
| Mode |
|
| Variance |
|
| Skewness | 0 |
| Kurtosis | 0 |
| Entropy |
|
| mgf |
|
| Char. func. |
|
The normal distribution, also called Gaussian distribution, is an extremely important probability distribution in many fields. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean ("average") and standard deviation ("variability"), respectively. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one (the green curves in the plots to the right). It is often called the bell curve because the graph of its probability density resembles a bell.
Contents |
[edit] Overview
The normal distribution is a convenient model of quantitative phenomena in the natural and behavioral sciences. A variety of psychological test scores have been found to approximately follow a normal distribution. While the underlying causes of these phenomena are often unknown, the use of the normal distribution can be theoretically justified in situations where many small effects are added together into a score or variable that can be observed. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.
[edit] History
The normal distribution was first introduced by Abraham de Moivre| in an article in 1733 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Pierre Simon de Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace.
Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Adrien Marie Legendre in 1805. Carl Friedrich Gauss who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors.
The name "bell curve" goes back to Jouffret who first used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around [1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all probability distributions are "normal". (See the discussion of "occurrence" below.)
[edit] Specification of the normal distribution
There are various ways to specify a random variable. The most visual is the probability density function (plot at the top), which represents how likely each value of the random variable is. The cumulative distribution function is a conceptually cleaner way to specify the same information, but to the untrained eye its plot is much less informative (see below). Equivalent ways to specify the normal distribution are: the moments, the cumulants, the characteristic function, the moment-generating function, and the cumulant-generating function. Some of these are very useful for theoretical work, but not intuitive. See probability distribution for a discussion.
All of the cumulants of the normal distribution are zero, except the first two.
[edit] Probability density function
The probability density function of the normal distribution with mean
and variance
(equivalently, standard deviation
) is an example of a Gaussian function,
(See also exponential function and pi.)
If a random variable
has this distribution, we write
~
.
If
and
, the distribution is called the standard normal distribution and the probability density function reduces to
The image to the right gives the graph of the probability density function of the normal distribution various parameter values.
Some notable qualities of the normal distribution:
- The density function is symmetric about its mean value.
- The mean is also its mode and median.
- 68.268949% of the area under the curve is within one standard deviation of the mean.
- 95.449974% of the area is within two standard deviations.
- 99.730020% of the area is within three standard deviations.
- 99.993666% of the area is within four standard deviations.
- The inflection points of the curve occur at one standard deviation away from the mean.
[edit] Cumulative distribution function
The cumulative distribution function (cdf) is defined as the probability that a variable
has a value less than or equal to
, and it is expressed in terms of the density function as
The standard normal cdf, conventionally denoted
, is just the general cdf evaluated with
and
,
The standard normal cdf can be expressed in terms of a special function called the error function, as
The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function:
This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved.
Values of Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, or asymptotic series.
[edit] Generating functions
[edit] Moment generating function
The moment generating function is defined as the expected value of
.
For a normal distribution, it can be shown that the moment generating function is
as can be seen by completing the square in the exponent.
[edit] Characteristic function
The characteristic function is defined as the expected value of
, where
is the imaginary unit.
For a normal distribution, the characteristic function is
The characteristic function is obtained by replacing
with
in the moment-generating function.
[edit] Properties
Some of the properties of the normal distribution:
- If
and
and
are real numbers, then
(see expected value and variance).
- If
and
are independent normal random variables, then:
- Their sum is normally distributed with
(proof).
- Their difference is normally distributed with
.
- Both
and
are independent of each other.
- Their sum is normally distributed with
- If
and
are independent normal random variables, then:
- Their product
follows a distribution with density
given by
where
is a modified Bessel function.
- Their ratio follows a Cauchy distribution with
.
- Their product
- If
are independent standard normal variables, then
has a chi-square distribution with n degrees of freedom.
[edit] Standardizing normal random variables
As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.
is a standard normal random variable:
~
.
An important consequence is that the cdf of a general normal distribution is therefore
is a normal random variable with mean
and variance
.
The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.
[edit] Moments
Some of the first few moments of the normal distribution are:
| Number | Raw moment | Central moment | Cumulant |
|---|---|---|---|
| 0 | 1 | 0 | |
| 1 | | 0 |
|
| 2 | | |
|
| 3 | | 0 | 0 |
| 4 | | | 0 |
All of cumulants of the normal distribution beyond the second cumulant are zero.
[edit] Generating normal random variables
For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal cdf. More efficient methods are also known, one such method being the Box-Muller transform.
The Box-Muller transform takes two uniformly distributed values as input and maps them to two normally distributed values. This requires generating values from a uniform distribution, for which many methods are known. See also random number generators.
The Box-Muller transform is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.



























