# Self-information

*34,191*pages on

this wiki

Within the context of information theory, **self-information** is defined as the amount of information that knowledge about (the outcome of) a certain event, adds to someone's overall knowledge. The amount of self-information is expressed in the unit of information: a bit.

By definition, the amount of self-information contained in a probabilistic event depends only on the probability of that event. More specifically: the smaller this probability is, the larger is the self-information associated with receiving information that the event indeed occurred.

Further, by definition, the measure of self-information has the following property. If an event *C* is composed of two mutually independent events *A* and *B*, then the amount of information at the proclamation that *C* has happened, equals the **sum** of the amounts of information at proclamations of event *A* and event *B* respectively.

Taking into account these properties, the self-information (measured in bits) associated with outcome whose outcome has probability is defined as:

This definition, using the binary logarithm function, complies with the above conditions.

This measure has also been called **surprisal**, as it represents the "surprise" of seeing the outcome (a certain outcome is not surprising). This term was coined by Myron Tribus in his 1961 book *Thermostatics and Thermodynamics*. Some claim it is more accurate than "self-information", but it has not been widely used.

## Information Entropy

The concept is related to that of information entropy; the information entropy of a random event is the expected value of its self-information:

## Examples

- On tossing a coin, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts to

*I*('tail') = log_{2}(1/0.5) = log_{2}2 = 1 bits of information.

- When throwing a die, the probability of 'four' is 1/6. When there is proclaimed that 'four' has been thrown, the amount of self-information is

*I*('four') = log_{2}(1/(1/6)) = log_{2}(6) = 2.585 bits.

- When, independently, two dice are thrown, the amount of information associated with {throw 1 = 'two' & throw 2 = 'four'} equals

*I*('throw 1 is two & throw 2 is four') = log_{2}(1/Pr(throw 1 = 'two' & throw 2 = 'four')) = log_{2}(1/(1/36)) = log_{2}(36) = 5.170 bits.

This outcome equals the sum of the individual amounts of self-information associated with {throw 1 = 'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.