# Bayesian probability

34,135pages on
this wiki

## Redirected from Bayesianism

In the philosophy of mathematics Bayesianism is the tenet that the mathematical theory of probability is applicable to the degree to which a person believes a proposition. Bayesians also hold that Bayes' theorem can be used as the basis for a rule for updating beliefs in the light of new information —such updating is known as Bayesian inference. In this sense, Bayesianism is an application of the probability calculus and a probability interpretation of the term probable, or —as it is usually put —an interpretation of probability.

## Controversy Edit

A quite different interpretation of the term probable has been developed by frequentists. In this interpretation, what are probable are not propositions entertained by believers, but events considered as members of collectives to which the tools of statistical analysis can be applied.

The Bayesian interpretation of probability allows probabilities to be assigned to all propositions (or, in some formulations, to the events signified by those propositions) independently of any reference class within which purported facts can be thought to have a relative frequency. Although Bayesian probability is not relative to a reference class, it is relative to the subject: it is not inconsistent for different persons to assign different Bayesian probabilities to the same proposition. For this reason Bayesian probabilities are sometimes called personal probabilities (although there are theories of personal probability which lack some features that have come to be identified with Bayesianism).

Although there is no reason why different interpretations (senses) of a word cannot be used in different contexts, there is a history of antagonism between Bayesians and frequentists, with the latter often rejecting the Bayesian interpretation as ill-grounded. The groups have also disagreed about which of the two senses reflects what is commonly meant by the term 'probable'.

To illustrate, whereas both a frequency probability and a Bayesian probability (of, e.g., 0.5) could be assigned to the proposition that the next tossed coin will land heads, only a Bayesian probability could be assigned to the proposition, entertained by a particular person, that there was life on Mars a billion years ago—because this assertion is made without reference to any population relative to which the relative frequency could be defined.

## History of Bayesian probability Edit

"Bayesian" probability or "Bayesian" theory is named after Thomas Bayes (1701? — 1761), who proved a special case of what is called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and in fact it is not clear that Bayes would have endorsed the very broad interpretation of probability now called "Bayesian." Laplace independently proved a more general version of Bayes' theorem and put it to good use in solving problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this theorem to be of fundamental philosophical importance for probability theory. He endorsed the classical interpretation of probability, as did everyone else at his time.

The application of probability calculus to subjective belief which later became an important aspect of the "Bayesian" approach was proposed for the first time by the philosopher Frank P. Ramsey in his book The Foundations of Mathematics from 1931. Ramsey himself saw this interpretation as merely a complement to a frequency interpretation of probability. The one taking this interpretation seriously for the first time was the statistician Bruno de Finetti in 1937. The first detailed theory came in 1954 in the book The Foundations of Statistics by the mathematician and statistician L. J. Savage.

Bayesian probability is a measure of the degree of belief a person has in some proposition. Several attempts have been made to operationalize the intuitive notion of a "degree of belief". The most common approach is based on betting: a degree of belief is reflected in the odds and stakes that the subject is willing to bet on the proposition in question.

When beliefs have degrees, the theorems of the probability calculus become criteria for the rationality of sets of beliefs in the same way that the theorems of first order logic are criteria for the rationality of sets of beliefs. Many authors regard degrees of belief as extensions of the classical truth values (true and false).

The Bayesian approach has been explored by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman.

## Varieties of Bayesian probabilityEdit

The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis. Some of the people mentioned here would not call themselves Bayesians.

Bayesian probability is supposed to measure the degree of belief an individual has in an uncertain proposition, and is in that respect subjective. Some people who call themselves Bayesians do not accept this subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.

Advocates of logical (or objective epistemic) probability, such as Harold Jeffreys, Rudolf Carnap, Richard Threlkeld Cox and Edwin Jaynes, hope to codify techniques whereby any two persons having the same information relevant to the truth of an uncertain proposition would calculate the same probability. Such probabilities are not relative to the person but to the epistemic situation, and thus lie somewhere between subjective and objective. However, the methods proposed are controversial. Critics challenge the claim that there are grounds for preferring one degree of belief over another in the absence of information about the facts to which those beliefs refer. Another problem is that the techniques developed so far are inadequate for dealing with realistic cases.

## Bayesian and frequentist probabilityEdit

The Bayesian approach is in contrast to the concept of frequency probability where probability is held to be derived from observed or defined frequency distributions or proportions of populations, with the usefulness of probability narrowly limited to such scenarios. The difference has many implications for the methods by which statistics is practiced when following one model or the other, and also for the way in which conclusions are expressed.

For example, Laplace estimated the mass of Saturn using Bayesian methods. However, on the frequency interpretation of probability the laws of probability cannot be applied to this problem. This is because the mass of Saturn isn't a well defined random experiment or sample. From what population is the mass of Saturn taken? In what sense is Saturn picked at random from that population? Similarly, when comparing two hypotheses and using the same information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis with a particular degree of confidence, while Bayesian methods would yield statements that one hypothesis was more probable than the other or that the expected loss associated with one was less than the expected loss of the other.

The rejection of the classical notion of probability, and the development of the theory of statistics and probability based narrowly on the frequency interpretation was pursued by some of the most influential figures in statistics during the first half of the twentieth century, including R.A. Fisher, Egon Pearson and Jerzy Neyman. At the same time, the mathematical foundation of probability in measure theory via the Lebesgue integral was elucidated by A. N. Kolmogorov in the book Foundations of the Theory of Probability in 1933. In the years to 1950 these two approaches almost completely eclipsed the previous broader classical interpretation. However since that time, and continuing into the present day, the work of Savage, Koopman, Abraham Wald, and others, has led to renewed broader acceptance of the alternative, Bayesian point of view.

## Applications of Bayesian probability Edit

Today, there are a variety of applications of Bayesian probability that have gained wide acceptance. Some schools of thought emphasise Cox's theorem and Jaynes' principle of maximum entropy as cornerstones of the theory, others (e.g., Ramsey, di Finetti) approach it from the point of view of a Dutch book argument, still others may claim that Bayesian methods are more general and give better results in practice than frequency probability. See Bayesian inference for applications and Bayes' Theorem for the mathematics.

Some philosophers of science regard Bayesian inference as a model of the scientific method. That is, updating probabilities via Bayes' theorem is similar to the scientific method insofar as one starts with an initial set of beliefs about the relative plausibility of various hypotheses, collects new information (for example by conducting an experiment), and then adjusts the original set of beliefs in the light of the new information to produce a more refined set of beliefs. However, this view is controversial. Similarly, Bayes factors have been employed in discussions of Occam's Razor.

Bayesian techniques have recently been applied to filter out e-mail spam with good success. After submitting a selection of known spam to the filter, it then uses their word occurrences to help it discriminate between spam and legitimate email.

See Bayesian inference and Bayesian filtering for more information in this regard.

## Probabilities of probabilitiesEdit

One criticism levelled at the Bayesian probability interpretation has been that a single probability assignment cannot convey how well grounded the belief is—i.e., how much evidence one has. Consider the following situations:

1. You have a box with white and black balls, but no knowledge as to the quantities
2. You have a box from which you have drawn n balls, half black and the rest white
3. You have a box and you know that there are the same number of white and black balls

The Bayesian probability of the next ball drawn is black is 0.5 all three cases. To reflect difference in evidential support one can assign probabilities to these probabilities (so-called metaprobabilities) in the following manner:

1. You have a box with white and black balls, but no knowledge as to the quantities
Letting $\theta = p$ represent the statement that the probability that the next ball is black is $p$, a Bayesian might assign a uniform Beta prior distribution:
$\forall \theta \in [0,1]$
$P(\theta) = \Beta(\alpha_B=1,\alpha_W=1) = \frac{\Gamma(\alpha_B + \alpha_W)}{\Gamma(\alpha_B)\Gamma(\alpha_W)}\theta^{\alpha_B-1}(1-\theta)^{\alpha_W-1} = \frac{\Gamma(2)}{\Gamma(1)\Gamma(1)}\theta^0(1-\theta)^0=1$
Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, $P(\theta|m,n)$, after drawing m additional black balls and n white balls is still a Beta distribution, with parameters $\alpha_B=1+m$, $\alpha_W=1+n$. An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.
2. You have a box from which you have drawn N balls, half black and the rest white
Letting $\theta = p$ represent the statement that the probability that the next ball is black is $p$, a Bayesian might assign a Beta prior distribution, $\Beta(N/2+1,N/2+1)$. The maximum aposteriori (MAP) estimate of $\theta$ is $\theta_{MAP}=\frac{N/2+1}{N+2}$, precisely Laplace's rule of succession.
3. You have a box and you know that there are the same number of white and black balls
In this case a Bayesian would define the prior probability $P(\Theta)=\delta(\frac{1}{2})$.

Because there is no room for metaprobabilities on the frequency interpretation, frequentists have had to find different ways of representing difference of evidential support. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.