In linguistics, prosody describes all the acoustic properties of speech that cannot be predicted from a local window on the orthographic (or similar) transcription. So, prosody is relative to a default pronunciation of a phoneme/feature bundle/segment/syllable; it does not include coarticulation because coarticulation is predictable from the immediate phonological or orthographic neighborhood. Qualitatively, one can understand prosody as the difference between a well-performed play, and one on first reading.

The term generally covers intonation, rhythm, and focus in speech. Acoustically, prosody describes changes in the syllable length, loudness, pitch, and certain details of the formant structure of speech sounds. Looking at the speech articulators, it describes changes in the velocity and range of motion in articulators like the jaw and tongue, along with quantities like the air pressure in the trachea and the tensions in the laryngeal muscles. Phonologically, prosody is described by tone, intonation, rhythm, and lexical stress.

A precise definition of prosody and its effects depends upon the language. For instance, some languages make lexical distinctions based on vowel duration. In such languages, syllable length would thus be at least partly predictable from a transcription and thus not completely prosodic. Likewise, in tone languages such as Mandarin, the pitch and/or intonation is at least partially predictable from the lexical tone of a word, and thus not completely prosodic.

Similarly, the formant structure of vowels is primarily determined by a phonological or orthographic transcription, but not entirely. Vowels are generally more completely realized in accented or focussed syllables. From an acoustic point of view, it means that the formant structure is farther from the structure of a neutral vowel (typically the schwa), and closer to the vowels that one might see in the stressed syllables of a carefully spoken word. Thus, the precise formant structure of vowels is normally contains a mixture of prosodic and lexical information.

The prosodic features of a unit of speech, whether a syllable, word, phrase, or clause, are typically called suprasegmental features because they typically affect all the segments of the unit.

Prosodic units do not always correspond to grammatical units, although both may reflect how the brain processes speech. Phrases and clauses are grammatical concepts, but they may have prosodic equivalents, commonly called prosodic units, intonation units, or declination units, which are the actual phonetic spurts or chunks of speech. These are often believed to exist as a hierarchy of levels. Such units are characterized by several phonetic cues, such as a coherent pitch contour, and the gradual decline in pitch and lengthening of vowels over the duration of the unit, until the pitch and speed are reset to begin the next unit. Breathing, both inhalation and exhalation, only seems to occur at these boundaries.

Different schools of linguistics describe somewhat different prosodic units. One common distinction is between continuing prosody, which in English orthography we might mark with a comma, and final prosody, which we might mark with a full stop (period). This is the common usage of the IPA symbols for "minor" and "major" prosodic breaks (American English pronunciation):

Jack, preparing the way, went on.
[ˈdʒæk | pɹəˌpɛəɹɪŋ ðə ˈweɪ | wɛnt ˈɒn ‖ ]
Jacques, préparant le sol, tomba.
[ˈʒak | pʁepaʁɑ̃ lɵ ˈsɔl | tɔ̃ˈba ‖ ]

Note that the last syllable with a full vowel in a French prosodic unit is stressed, and that the last stressed syllable in an English prosodic unit has primary stress. This shows that stress is not phonemic in French, and that the difference between primary and secondary stress is not phonemic in English; they are both elements of prosody rather than inherent in the words.

The pipe symbols – the vertical bars | and ‖ – used above are phonetic, and so will often disagree with English punctuation, which only partially correlates with prosody.

However, the pipes may also be used for metrical breaks -- a single pipe being used to mark metrical feet, and a double pipe to mark both continuing and final prosody, as their alternate names "foot group" and "intonation group" suggest. In such usage, each foot group would include one and only one heavy syllable. In English, this would mean one and only one stressed syllable:

Jack, preparing the way, went on.
[ˈdʒæk ‖ pɹəˌpɛəɹɪŋ | ðə ˈweɪ ‖ wɛnt ˈɒn ‖ ]

In many tone languages with downdrift, such as Hausa, [ | ] is often used to represent a minor prosodic break that does not interrupt the overall decline in pitch of the utterance, while [ ‖ ] marks either continuing or final prosody that creates a pitch reset. In such cases, some linguists use only the single pipe, with continuing and final prosody marked by a comma and period, respectively.

In transcriptions of non-tonal languages, the three symbols pipe, comma, and period may also be used, with the pipe representing a break more minor than the comma, the so-called list prosody often used to separate items when reading lists, spelling words, or giving out telephone numbers.

It can be assumed that many people can communicate and interpret extensibly using slight colours, tonation and rhythm in the voice to extend emotions and clever nuances in conversation. However, it should be noted that not everyone is assumed able to fully understand or even acknowledge such extensive tonal characteristics in particular speech - even in their native language. See Sociolinguistics

Prosody and emotion

Emotional prosody describes the perception of feelings expressed in speech, and was recognized by Charles Darwin in The Descent of Man to predate the evolution of human language: "Even monkeys express strong feelings in different tones — anger and impatience by low, fear and pain by high notes."[1] Native speakers listening to actors reading neutral text to project emotions were able to recognize happiness 62%, anger 95%, surprise 91%, sadness 81%, and neutral tone 76% correctly in trials. When a database of this speech was processed by computer, segmental features allowed >90% recognition of happiness and anger, while supra-segmental prosodic features allowed only 44-49% recognition. The reverse was true of surprise, which was recognized only 69% by segmental features and 96% by supra-segmental prosody.[2]

  1. Charles Darwin (1871). The Descent of Man. citing Johann Rudolph Rengger, Natural History of the Mammals of Paraguay, s. 49
  2. R. Barra, J.M. Montero, J. Macías-Guarasa, L.F. D’Haro, R. San-Segundo, R. Córdoba. Prosodic and segmental rubrics in emotion identification.
