Psychology Wiki

Word frequency lists

34,203pages on
this wiki
Add New Page
Talk0 Share

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Language: Linguistics · Semiotics · Speech

In computational linguistics, a frequency list is a sorted list of words (word types) together with their frequency, where frequency here usually means the number of occurrences in a given corpus. A short example could be: </br> </br>

he 2098762
king 57897
boy 56975
outrageous 76
stringyfy 5
transducionalify 1

It seems that Zipf's law holds for frequency lists drawn from longer texts of any natural language. Frequency lists are a necessary prerequisite for building of an electronic dictionary, which is by itself a prerequisite for a wide range of applications in computational linguistics.

German linguists define the häufigkeitsklasse (frequency class) N of an item in the list using the base 2 logarithm of the ratio between its frequency and the frequency of the most frequent item. The most common item belongs to frequency class 0 (zero) and any item that is approximately half as frequent belongs in class 1. In the example list above, the misspelled word outragious has a ratio of 76/3789654 and belongs in class 16.

N=\left\lfloor0.5-\log_2\left(\frac{\text{Frequency of this item}}{\text{Frequency of most common item}}\right)\right\rfloor

where \lfloor\ldots\rfloor is the floor function.

Lists of wordsEdit

See also Edit

This page uses Creative Commons Licensed content from Wikipedia (view authors).

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.

Also on Fandom

Random Wiki