# Covariance matrix

34,142pages on
this wiki

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.

## Definition Edit

If entries in the column vector

$X = \begin{bmatrix}X_1 \\ \vdots \\ X_n \end{bmatrix}$

are random variables, each with finite variance, then the covariance matrix Σ is the matrix whose (ij) entry is the covariance

$\Sigma_{ij} =\mathrm{E}\begin{bmatrix} (X_i - \mu_i)(X_j - \mu_j) \end{bmatrix}$

where

$\mu_i = \mathrm{E}(X_i)\,$

is the expected value of the ith entry in the vector X. In other words, we have

$\Sigma = \begin{bmatrix} \mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)] \end{bmatrix}.$

### As a generalization of the variance Edit

The definition above is equivalent to the matrix equality

$\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$

This form can be seen as a generalization of the scalar-valued variance to higher dimensions. Recall that for a scalar-valued random variable X

$\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2], \,$

where

$\mu = \mathrm{E}(X).\,$

The matrix $\Sigma$ is also often called the variance-covariance matrix since the diagonal terms are in fact variances.

## Conflicting nomenclatures and notationsEdit

Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector $X$, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector $X$. Thus

$\operatorname{var}(\textbf{X}) = \operatorname{cov}(\textbf{X}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top \right]$

However, the notation for the "cross-covariance" between two vectors is standard:

$\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top \right]$

The $var$ notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them.

## Properties Edit

For $\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$ and $\mu = \mathrm{E}(\textbf{X})$ the following basic properties apply:

1. $\Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top}$
2. $\mathbf{\Sigma}$ is positive semi-definite
3. $\operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{var}(\mathbf{X})\, \mathbf{A^\top}$
4. $\operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$
5. $\operatorname{cov}(\mathbf{X_1} + \mathbf{X_2},\mathbf{Y}) = \operatorname{cov}(\mathbf{X_1},\mathbf{Y}) + \operatorname{cov}(\mathbf{X_2}, \mathbf{Y})$
6. If p = q, then $\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})$
7. $\operatorname{cov}(\mathbf{AX}, \mathbf{BY}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X}, \mathbf{Y}) \,\mathbf{B}^\top$
8. If $\mathbf{X}$ and $\mathbf{Y}$ are independent, then $\operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0$

where $\mathbf{X}, \mathbf{X_1}$ and $\mathbf{X_2}$ are a random $\mathbf{(p \times 1)}$ vectors, $\mathbf{Y}$ is a random $\mathbf{(q \times 1)}$ vector, $\mathbf{a}$ is $\mathbf{(p \times 1)}$ vector, $\mathbf{A}$ and $\mathbf{B}$ are $\mathbf{(p \times q)}$ matrices.

This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.

## Which matrices are covariance matricesEdit

From the identity

$\operatorname{var}(\mathbf{a^\top}\mathbf{X}) = \mathbf{a^\top} \operatorname{var}(\mathbf{X}) \mathbf{a}\,$

and the fact that the variance of any real-valued random variable is nonnegative, it follows immediately that only a nonnegative-definite matrix can be a covariance matrix. The converse question is whether every nonnegative-definite symmetric matrix is a covariance matrix. The answer is "yes". To see this, suppose M is a p×p nonnegative-definite symmetric matrix. From the finite-dimensional case of the spectral theorem, it follows that M has a nonnegative symmetric square root, which let us call M1/2. Let $\mathbf{X}$ be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then

$\operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,$

## Complex random vectorsEdit

The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:

$\operatorname{var}(z) = \operatorname{E} \left[ (z-\mu)(z-\mu)^{*} \right]$

where the complex conjugate of a complex number $z$ is denoted $z^{*}$.

If $Z$ is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:

$\operatorname{E} \left[ (Z-\mu)(Z-\mu)^{*} \right]$

where $Z^{*}$ denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.

LaTeX provides useful features for dealing with covariance matrices. These are available through the extendedmath package.

## EstimationEdit

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1 × 1 matrix than as a mere scalar. See estimation of covariance matrices.