Education
 

Binomial distribution

From Psychology Wiki

Community portal · Tasks to do · News · Help

Clinical · Educational · Ind&Org · Other fields · Professional · Transpersonal · World

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language
Personality | Philosophy | Research Methods | Social | Statistics

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory



Binomial
Probability mass function
File:Binomial distribution pmf.png
The lines connecting the dots are added for clarity
Cumulative distribution function
File:Binomial distribution cdf.png
Colors match the image above
Parameters math number of trials (integer)
math success probability (real)
Support math
Template:Probability distribution/link mass math
cdf math
Mean math
Median one of math
Mode math
Variance math
Skewness math
Kurtosis math
Entropy math
mgf math
Char. func. math

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

Contents

[edit] Examples

An elementary example is this: Roll a standard die ten times and count the number of sixes. The distribution of this random number is a binomial distribution with n = 10 and p = 1/6.

As another example, assume 5% of a very large population to be green-eyed. You pick 100 people randomly. The number of green-eyed people you pick is a random variable X which follows a binomial distribution with n = 100 and p = 0.05.

[edit] Specification

[edit] Probability mass function

In general, if the random variable K follows the binomial distribution with parameters n and p, we write K ~ B(n, p). The probability of getting exactly k successes is given by the probability mass function:

math

for k = 0, 1, 2, ..., n and where

math

is the binomial coefficient (hence the name of the distribution) "n choose k" (also denoted C(n, k) or nCk). The formula can be understood as follows: we want k successes (pk) and nk failures (1 − p)nk. However, the k successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

math

So, one must look to a different k and a different p (the binomial is not symmetrical in general).

[edit] Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function, as follows:

math

provided k is an integer and 0 ≤ k ≤ n. If x is not necessarily an integer or not necessarily positive, one can express it thus:

math

For knp, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

math

and Chernoff's inequality can be used to derive the bound

math

[edit] Mean, variance, and mode

If X ~ B(n, p) (that is, X is a binomially distributed random variable), then the expected value of X is

math

and the variance is

math

This fact is easily proven as follows. Suppose first that we have exactly one Bernoulli trial. We have two possible outcomes, 1 and 0, with the first having probability p and the second having probability 1 − p; the mean for this trial is given by μ = p. Using the definition of variance, we have

math

Now suppose that we want the variance for n such trials (i.e. for the general binomial distribution). Since the trials are independent, we may add the variances for each trial, giving

math

The mode of X is the greatest integer less than or equal to (n + 1)p; if m = (n + 1)p is an integer, then m − 1 and m are both modes.

[edit] Explicit derivations of mean and variance

We derive these quantities from first principles. Certain particular sums occur in these two derivations. We rearrange the sums and terms so that sums solely over complete binomial probability mass functions (pmf) arise, which are always unity

math

[edit] Mean

We apply the definition of the expected value of a discrete random variable to the binomial distribution

WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)

entering extended mode (./937d47cda4c1510d98e75acedb5d5 LaTeX2e <2003/12/01> Babel and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2004/02/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size10.clo)) (/usr/share/texmf/tex/latex/amsfonts/amssymb.sty (/usr/share/texmf/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/amsmath/amscd.sty) (/usr/share/texmf/tex/latex/concmath/concmath.sty) (./937d47cda4c1510d98e75acedb5d5.aux) (/usr/share/texmf/tex/latex/concmath/ot1ccr.fd) (/usr/share/texmf/tex/latex/concmath/omlccm.fd) (/usr/share/texmf/tex/latex/concmath/omsccsy.fd) (/usr/share/texmf/tex/latex/concmath/omxccex.fd) (/usr/share/texmf/tex/latex/amsfonts/umsa.fd) (/usr/share/texmf/tex/latex/amsfonts/umsb.fd) ! Missing $ inserted.

               $

l.6

! Display math should end with $$.

                  \par 

l.6

! Missing $ inserted.

               $

l.7 = \sum

         _{k=0}^n k \cdot {n\choose k}p^k(1-p)^{n-k}\end{equation*}

Package amsmath Warning: Foreign command \atopwithdelims; (amsmath) \frac or \genfrac should be used instead (amsmath) on input line 7.

! Missing $ inserted.

               $

l.7 ...t {n\choose k}p^k(1-p)^{n-k}\end{equation*}

[1] (./937d47cda4c1510d98e75acedb5d5.aux) ) (see the transcript file for additional information) Output written on 937d47cda4c1510d98e75acedb5d5.dvi (1 page, 820 bytes).

Transcript written on 937d47cda4c1510d98e75acedb5d5.log.

The first term of the series (with index k = 0) has value 0 since the first factor, k, is zero. It may thus be discarded, i.e. we can change the lower limit to: k = 1

WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)

entering extended mode (./1c439c1befb6472c7a65129fac5c4 LaTeX2e <2003/12/01> Babel and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2004/02/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size10.clo)) (/usr/share/texmf/tex/latex/amsfonts/amssymb.sty (/usr/share/texmf/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/amsmath/amscd.sty) (/usr/share/texmf/tex/latex/concmath/concmath.sty) (./1c439c1befb6472c7a65129fac5c4.aux) (/usr/share/texmf/tex/latex/concmath/ot1ccr.fd) (/usr/share/texmf/tex/latex/concmath/omlccm.fd) (/usr/share/texmf/tex/latex/concmath/omsccsy.fd) (/usr/share/texmf/tex/latex/concmath/omxccex.fd) (/usr/share/texmf/tex/latex/amsfonts/umsa.fd) (/usr/share/texmf/tex/latex/amsfonts/umsb.fd) ! Missing $ inserted.

               $

l.6

! Display math should end with $$.

                  \par 

l.6

! Missing $ inserted.

               $

l.7 = \sum

          _{k=1}^n k \cdot \frac{n\cdot(n-1)!}{k\cdot(k-1)!(n-k)!} \cdot p ...

! Missing $ inserted.

               $

l.7 ...t p \cdot p^{k-1}(1-p)^{n-k}\end{equation*}

[1] (./1c439c1befb6472c7a65129fac5c4.aux) ) (see the transcript file for additional information) Output written on 1c439c1befb6472c7a65129fac5c4.dvi (1 page, 872 bytes).

Transcript written on 1c439c1befb6472c7a65129fac5c4.log.

We've pulled factors of n and k out of the factorials, and one power of p has been split off. We are preparing to redefine the indices.

math

We rename m = n - 1 and s = k - 1. The value of the sum is not changed by this, but it now becomes readily recognizable

math

The ensuing sum is a sum over a complete binomial pmf (of one order lower than the initial sum, as it happens). Thus

math

[edit] Variance

It can be shown that the variance is equal to (see: variance, 10. Computational formula for variance):

math

In using this formula we see that we now also need the expected value of X2, which is

WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)

entering extended mode (./9d5794eb3b1402ca0f06d24a6ee67 LaTeX2e <2003/12/01> Babel and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. (/usr/share/texmf/tex/latex/base/article.cls Document Class: article 2004/02/16 v1.4f Standard LaTeX document class (/usr/share/texmf/tex/latex/base/size10.clo)) (/usr/share/texmf/tex/latex/amsfonts/amssymb.sty (/usr/share/texmf/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texmf/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texmf/tex/latex/amsmath/amstext.sty (/usr/share/texmf/tex/latex/amsmath/amsgen.sty)) (/usr/share/texmf/tex/latex/amsmath/amsbsy.sty) (/usr/share/texmf/tex/latex/amsmath/amsopn.sty)) (/usr/share/texmf/tex/latex/amsmath/amscd.sty) (/usr/share/texmf/tex/latex/concmath/concmath.sty) (./9d5794eb3b1402ca0f06d24a6ee67.aux) (/usr/share/texmf/tex/latex/concmath/ot1ccr.fd) (/usr/share/texmf/tex/latex/concmath/omlccm.fd) (/usr/share/texmf/tex/latex/concmath/omsccsy.fd) (/usr/share/texmf/tex/latex/concmath/omxccex.fd) (/usr/share/texmf/tex/latex/amsfonts/umsa.fd) (/usr/share/texmf/tex/latex/amsfonts/umsb.fd) ! Missing $ inserted.

               $

l.6

! Display math should end with $$.

                  \par 

l.6

! Missing $ inserted.

               $

l.7 = \sum

         _{k=0}^n k^2 \cdot {n\choose k}p^k(1-p)^{n-k}.\end{equation*}

Package amsmath Warning: Foreign command \atopwithdelims; (amsmath) \frac or \genfrac should be used instead (amsmath) on input line 7.

! Missing $ inserted.

               $

l.7 ... {n\choose k}p^k(1-p)^{n-k}.\end{equation*}

[1] (./9d5794eb3b1402ca0f06d24a6ee67.aux) ) (see the transcript file for additional information) Output written on 9d5794eb3b1402ca0f06d24a6ee67.dvi (1 page, 788 bytes).

Transcript written on 9d5794eb3b1402ca0f06d24a6ee67.log.

We can use our experience gained above in deriving the mean. We know how to process one factor of k. This gets us as far as

math

(again, with m = n - 1 and s = k - 1). We split the sum into two separate sums and we recognize each one

math

The first sum is identical in form to the one we calculated in the Mean (above). It sums to mp. The second sum is unity.

math

Using this result in the expression for the variance, along with the Mean (E(X) = np), we get

math

[edit] Relationship to other distributions

[edit] Sums of binomials

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is

math

[edit] Normal approximation

Binomial PDF and normal approximation for n = 6 and p = 0.5.

If n is large enough, the skew of the distribution is not too great, and a suitable continuity correction is used, then an excellent approximation to B(n, p) is given by the normal distribution

math

Various rules of thumb may be used to decide whether n is large enough. One rule is that both np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10. Another commonly used rule holds that the above normal approximation is appropriate only if

math

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation is a huge time-saver (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1733. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed 0-1 indicator variables.

For example, suppose you randomly sample n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)n)1/2. Large sample sizes n are good because the standard deviation gets smaller, which allows a more precise estimate of the unknown parameter p.

[edit] Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[1]

[edit] Limits of binomial distributions

  • As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the Binomial(np) distribution approaches the Poisson distribution with expected value λ.
  • As n approaches ∞ while p remains fixed, the distribution of
math
approaches the normal distribution with expected value 0 and variance 1 (this is just a specific case of the Central Limit Theorem).

[edit] References

  1. NIST/SEMATECH, '6.3.3.1. Counts Control Charts', e-Handbook of Statistical Methods, <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm> [accessed 25 October 2006]

[edit] See also

[edit] External links

Image:Bvn-small.png Probability distributions [[[:Template:Tnavbar-plain-nodiv]]]
Univariate Multivariate
Discrete: BernoullibinomialBoltzmanncompound PoissondegeneratedegreeGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomial
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)ParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVoigtvon MisesWeibullWigner semicircle DirichletKentmatrix normalmultivariate normalvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropy phase-typeposterior priorquasisampling
</center>
Smallwikipedialogo.png This page uses content from the English-language version of Wikipedia. The original article was at Binomial distribution. The list of authors can be seen in the page history. As with Psychology Wiki, the text of Wikipedia is available under the GNU Free Documentation License.