Information geometry

34,143pages on
this wiki

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In mathematics and especially in statistical inference, information geometry is the study of probability and information by way of differential geometry. It reached maturity through the work of Shun'ichi Amari in the 1980s, with what is currently the canonical reference book: Methods of information geometry.

Introduction Edit

The main tenet of information geometry is that many important structures in probability theory, information theory and statistics can be treated as structures in differential geometry by regarding a space of probabilities as a differentiable manifold endowed with a Riemannian metric and a family of affine connections distinct from the canonical affine connection. The e-affine connection and m-affine connection geometrize expectation and maximization, as in the expectation-maximization algorithm.

For example,

The importance of studying statistical structures as geometrical structures lies in the fact that geometric structures are invariant under coordinate transforms. For example, a family of probability distributions, such as Gaussian distributions, may be transformed into another family of distributions, such as log-normal distributions, by a change of variables. However, the fact of it being an exponential family is not changed, since the latter is a geometric property. The distance between two distributions in this family defined through Fisher metric will also be preserved.

The statistician Fisher recognized in the 1920s that there is an intrinsic measure of amount of information for statistical estimators. The Fisher information matrix was shown by Cramer and Rao to be a Riemannian metric on the space of probabilities, and became known as Fisher information metric.

The mathematician Cencov (Chentsov) proved in the 1960s and 1970s that on the space of probability distributions on a sample space containing at least three points,

• There exists a unique intrinsic metric. It is the Fisher information metric.
• There exists a unique one parameter family of affine connections. It is the family of $\alpha$-affine connections later popularized by Amari.

Both of these uniqueness are, of course, up to the multiplication by a constant.

Amari and Nagaoka's study in the 1980s brought all these results together, with the introduction of the concept of dual-affine connections, and the interplay among metric, affine connection and divergence. In particular,

• Given a Riemannian metric g and a family of dual affine connections $\Gamma_\alpha$, there exists a unique set of dual divergences $D_\alpha$ defined by them.
• Given the family of dual divergences $D_\alpha$, the metric and affine connections can be uniquely determined by second order and third order differentiations.

Also, Amari and Kumon showed that asymptotic efficiency of estimates and tests can be represented by geometrical quantities.

Basic concepts Edit

• Statistical manifold: space of probability distribution, statistical model.
• Point on the manifold: probability distribution.
• Coordinates: parameters in the statistical model.
• Tangent vector: Fisher score function.
• Riemannian metric: Fisher information metric.
• Affine connections.
• Curvatures: associated with information loss
• Information divergence.

Fisher information metric as a Riemannian metric Edit

Main article: Fisher information metric

Information geometry is based primarily on the Fisher information metric:

$g_{jk}=\int \frac{\partial \log p(x,\theta)}{\partial \theta_j} \frac{\partial \log p(x,\theta)}{\partial \theta_k} p(x,\theta)\, dx.$

Substituting i = −log(p) from information theory, the formula becomes:

$g_{jk}=\int \frac{\partial i(x,\theta)}{\partial \theta_j} \frac{\partial i(x,\theta)}{\partial \theta_k} p(x,\theta)\, dx.$

History Edit

The history of information geometry is associated with the discoveries of at least the following people, and many others

Some applications Edit

An important concept in information geometry is the natural gradient. The concept and theory of the natural gradient suggests an adjustment to the energy function of a learning rule. This adjustment takes into account the curvature of the (prior) statistical differential manifold, by way of the Fisher information metric.

This concept has many important applications in blind signal separation, neural networks, artificial intelligence, and other engineering problems that deal with information. Experimental results have shown that application of the concept leads to substantial performance gains.

Nonlinear filteringEdit

Other applications concern statistics of stochastic processes and approximate finite dimensional solutions of the filtering problem (stochastic processes). As the nonlinear filtering problem admits an infinite dimensional solution in general, one can use a geometric structure in the space of probability distributions to project the infinite dimensional filter into an approximate finite dimensional one, leading to the projection filters introduced in 1987 by Bernard Hanzon.

ReferencesEdit

• Shun'ichi Amari - Differential-geometrical methods in statistics, Lecture notes in statistics, Springer-Verlag, Berlin, 1985
• Shun'ichi Amari, Hiroshi Nagaoka - Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society, 2000 (ISBN 978-0821805312)
• M. Murray and J. Rice - Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993.
• R. E. Kass and P. W. Vos - Geometrical Foundations of Asymptotic Inference, Series in Probability and Statistics, Wiley, 1997.
• N. N. Cencov - Statistical Decision Rules and Optimal Inference, Translations of Mathematical Monographs; v. 53, American Mathematical Society, 1982
• Giovanni Pistone, and Sempi, C. (1995). An infinitedimensional geometric structure on the space of all the probability measures equivalent to a given one, Ann. Statist. 23, no. 5, 1543–1561.
• Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear filtering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 - 534, ISSN: 1350-7265
• Brigo, D, Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. Barndorff-Nielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientific, 1999