# Kurtosis

34,142pages on
this wiki

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations.

## Definition of kurtosis

The fourth standardized moment is defined as μ4 / σ4, where μ4 is the fourth moment about the mean and σ is the standard deviation. This is sometimes used as the definition of kurtosis in older works, but is not the definition used here.

Kurtosis is more commonly defined as the ratio of the fourth cumulant and the square of the second cumulant of the probability distribution,

$\gamma_2 = \frac{\kappa_4}{\kappa_2^2} = \frac{\mu_4}{\sigma^4} - 3, \!$

which is also known as excess kurtosis. The minus 3 at the end of this formula is often explained as a correction to make the kurtosis of the normal distribution equal to zero. Another reason can be seen by looking at the formula for the kurtosis of the sum of random variables. If Y is the sum of n independent random variables, all with the same distribution as X, then Kurt[Y] = Kurt[X] / n, while the formula would be more complicated if kurtosis were defined as μ4 / σ4.

## Terminology and examples

A high kurtosis distribution has a sharper "peak" and fatter "tails", while a low kurtosis distribution has a more rounded peak with wider "shoulders".

Distributions with zero kurtosis are called mesokurtic. The most prominent example of a mesokurtic distribution is the normal distribution family, regardless of the values of its parameters. A few other well-known distributions can be mesokurtic, depending on parameter values: for example the binomial distribution is mesokurtic for $p = 1/2 \pm \sqrt{1/12}$.

A distribution with positive kurtosis is called leptokurtic. In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values). Examples of leptokurtic distributions include the Laplace distribution and the logistic distribution.

A distribution with negative kurtosis is called platykurtic. In terms of shape, a platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values). Examples of platykurtic distributions include the continuous uniform distribution, and the Maxwell-Boltzmann distribution.

## Sample kurtosis

For a sample of n values the sample kurtosis is

$g_2 = \frac{m_4}{m_{2}^2} -3 = \frac{n\,\sum_{i=1}^n (x_i - \overline{x})^4}{\left(\sum_{i=1}^n (x_i - \overline{x})^2\right)^2} - 3$

where m4 is the fourth sample moment about the mean, m2 is the second sample moment about the mean (that is, the sample variance), xi is the ith value, and $\overline{x}$ is the sample mean.

## Estimators of population kurtosis

Given a sub-set of samples from a population, the sample kurtosis above is a biased estimator of the population kurtosis. The usual estimator of the population kurtosis (used in SAS, SPSS, and Excel but not by MINITAB or BMDP) is G2, defined as follows:

 $G_2 \!\!\!\!$ $= \frac{k_4}{k_{2}^2}\!$ $= \frac{n^2\,((n+1)\,m_4 - 3\,(n-1)\,m_{2}^2)}{(n-1)\,(n-2)\,(n-3)} \; \frac{(n-1)^2}{n^2\,m_{2}^2}\!$ $= \frac{n-1}{(n-2)\,(n-3)} \left( (n+1)\,\frac{m_4}{m_{2}^2} - 3\,(n-1) \right)\!$ $= \frac{n-1}{(n-2) (n-3)} \left( (n+1)\,g_2 + 6 \right)\!$ $= \frac{(n+1)\,n\,(n-1)}{(n-2)\,(n-3)} \; \frac{\sum_{i=1}^n (x_i - \bar{x})^4}{\left(\sum_{i=1}^n (x_i - \bar{x})^2\right)^2} - 3\,\frac{(n-1)^2}{(n-2)\,(n-3)}\!$ $= \frac{(n+1)\,n}{(n-1)\,(n-2)\,(n-3)} \; \frac{\sum_{i=1}^n (x_i - \bar{x})^4}{k_{2}^2} - 3\,\frac{(n-1)^2}{(n-2) (n-3)} \!$

where k4 is the unique symmetric unbiased estimator of the fourth cumulant, k2 is the unbiased estimator of the population variance, m4 is the fourth sample moment about the mean, m2 is the sample variance, xi is the ith value, and $\bar{x}$ is the sample mean. Unfortunately, $G_2$ is itself generally biased. For the normal distribution it is unbiased because its expected value is then zero.