Psychology Wiki

Empirical process

34,203pages on
this wiki
Add New Page
Talk0 Share

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory

The study of empirical processes is a branch of mathematical statistics and a sub-area of probability theory.

The motivation for studying empirical processes is that it is often impossible to know the true underlying probability measure P. We collect observations X_1, X_2, \dots , X_n and compute relative frequencies. We can estimate P, or a related distribution function F by means of the empirical measure or empirical distribution function, respectively. Theorems in the area of empirical processes confirm that these are uniformly good estimates or determine accuracy of the estimation.

Suppose X is a sample space of observations. X can be quite general; for example: the real line, some Euclidean space, a space of functions, a Riemannian manifold, or whatever might be of interest. Let X_1, X_2, \dots , X_n be independent identically distributed (iid) random variables (rv's), with probability measure P on X. For a measurable set A, the empirical measure P_n is defined as

P_n(A) = {1 \over n} \operatorname{card}\{\, j \in \{\,1,\dots,n\,\} :  X_j \in  A\,\}.

If C is a collection of subsets of X, then the collection

\{P_n(c): c \in C\}

is the empirical measure indexed by C. The empirical process B_n is defined as

B_n = \sqrt n(P_n-P).


\{B_n(c): c \in C\}

is the empirical process indexed by C.

A special case is the empirical process G_n associated with empirical distribution functions F_n.

G_n(x) = \sqrt n(F_n(x)-F(x)),

where X_1, X_2, \dots , X_n are real-valued random variables with distribution function F and F_n is defined by

F_n(x) =  {1 \over n} \operatorname{card}\{\,j \in \{\,1,\dots,n\,\} : X_j \leq  x \,\}.

In this case,

C = \{(-\infty, x): x \in R\}.

Major results for this special case include Kolmogorov-Smirnov statistics, the Glivenko-Cantelli theorem and Donsker's theorem. Moreover, the empirical distribution function F_n of a finite sequence of realizations of a random variable is the very essence of statistical inference.

Glivenko-Cantelli theorem Edit

By the strong law of large numbers, we know that

F_n(x) {\longrightarrow} _{a.s.} F(x) .

However, Glivenko and Cantelli strengthened this result.

The Glivenko-Cantelli theorem (1933):

\|F_n - F\|_\infty = \sup_{x\in R} |F_n(x) - F(x)| {\longrightarrow} _{a.s.} 0.

Another way to state this is as follows: the sample paths of F_n get uniformly closer to F as n increases; hence F_n, which we observe, is almost surely a good approximation for F, which becomes better as we collect more observations.

Donsker's theorem Edit

By the classical central limit theorem, it follows that

G_n(x){\longrightarrow}_{dist} G(x),

that is, G_n(x) converges in distribution to a Gaussian (normal) random variable G(x) with mean 0 and variance F(x)[1-F(x)]. Donsker (1952) showed that the sample paths of G_n(x), as functions on the real line R, converge in distribution to a stochastic process G in the space l of all bounded functions f:R{\rightarrow}R. The function space l is used in this context to remind us that we are concerned with distributional convergence in terms of sample paths. The limit process G is a Gaussian process with zero mean and covariance given by

cov[G(s), G(t)] = E[G(s)G(t)] = F[min(s, t)] − F(s)F(t).

The process G(x) can be written as B(F(x)) where B is a standard Brownian bridge on the unit interval.

If the observations X_1, X_2, \dots, X_n are in a more general sample space X, we seek generalizations of the Glivenko-Cantelli theorem and Donsker's theorem. Also, we seek other theorems to determine rates of convergence and accuracy of estimation.

The classical empirical distribution function for real-valued random variables is a special case of the general theory with X = R and the class of sets C = \{(\infty, x]: x \in R\}.

See alsoEdit


  • P. Billingsley, Probability and Measure, John Wiley and Sons, New York, third edition, 1995.
  • M.D. Donsker, Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems, Annals of Mathematical Statistics, 23:277--281, 1952.
  • R.M. Dudley, Central limit theorems for empirical measures, Annals of Probability, 6(6): 899–929, 1978.
  • R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.
  • J. Wolfowitz, Generalization of the theorem of Glivenko-Cantelli. Annals of Mathematical Statistics, 25, 131-138, 1954.

External links Edit

it:Teorema di Glivenko-Cantelli
ru:Теорема Гливенко — Кантелли
This page uses Creative Commons Licensed content from Wikipedia (view authors).

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.

Also on Fandom

Random Wiki