Authors: Einat Liebenthal, David A. Silbersweig, and Emily Stern.
Article: The Language, Tone and Prosody of Emotions: Neural Substrates and Dynamics of Spoken-Word Emotion Perception.
Publication: Frontiers in Neuroscience (Frontiers). 10:506 2016 | DOI: 10.3389/fnins.2016.00506
Rapid assessment of emotions is important for detecting and prioritizing salient input. Emotions are conveyed in spoken words via verbal and non-verbal channels that are mutually informative and unveil in parallel over time, but the neural dynamics and interactions of these processes are not well understood. In this paper, we review the literature on emotion perception in faces, written words, and voices, as a basis for understanding the functional organization of emotion perception in spoken words. The characteristics of visual and auditory routes to the amygdala—a subcortical center for emotion perception—are compared across these stimulus classes in terms of neural dynamics, hemispheric lateralization, and functionality. Converging results from neuroimaging, electrophysiological, and lesion studies suggest the existence of an afferent route to the amygdala and primary visual cortex for fast and subliminal processing of coarse emotional face cues. We suggest that a fast route to the amygdala may also function for brief non-verbal vocalizations (e.g., laugh, cry), in which emotional category is conveyed effectively by voice tone and intensity. However, emotional prosody which evolves on longer time scales and is conveyed by fine-grained spectral cues appears to be processed via a slower, indirect cortical route. For verbal emotional content, the bulk of current evidence, indicating predominant left lateralization of the amygdala response and timing of emotional effects attributable to speeded lexical access, is more consistent with an indirect cortical route to the amygdala. Top-down linguistic modulation may play an important role for prioritized perception of emotions in words. Understanding the neural dynamics and interactions of emotion and language perception is important for selecting potent stimuli and devising effective training and/or treatment approaches for the alleviation of emotional dysfunction across a range of neuropsychiatric states.
Liebenthal E, Silbersweig DA and Stern E (2016) The Language, Tone and Prosody of Emotions: Neural Substrates and Dynamics of Spoken-Word Emotion Perception. Front. Neurosci. 10:506. doi: 10.3389/fnins.2016.00506
The voice is also the natural carrier of speech. The voice paralinguistic and linguistic cues are separated such that the low-frequency band primarily carries prosodic cues important for communication of emotions, whereas the high-frequency band primarily carries phonemic cues critical for verbal communication (Remez et al., 1981; Scherer, 1986). Neural processing of the spectrally slow-varying emotional prosody cues appears to involve more anterior auditory cortical areas in the superior temporal lobe than the processing of spectrally fast-varying phonemic cues (Belin et al., 2004; Liebenthal et al., 2005). Neural processing of emotional voice cues is also thought to involve auditory cortical areas predominantly in the right hemisphere, whereas that of phonemic cues predominantly auditory areas in the left hemisphere (Kotz et al., 2006; Scott and McGettigan, 2013). Which voice emotional cues confer a processing advantage, and through what neural routes, is an ongoing topic of investigation.
Compared to written words, spoken words contain additional non-verbal emotional information (i.e., emotional prosody) that is physically and perceptually intertwined with the verbal information (Kotz and Paulmann, 2007; Pell and Kotz, 2011). The verbal and emotional cues in speech differ in their spectrotemporal properties. The phonemic cues consist primarily of relatively fast spectral changes occurring within 50 ms speech segments, whereas the prosodic cues consist of slower spectral changes occurring over more than 200 ms speech segments (syllabic and suprasegmental range).
In addition to differences in spectrotemporal response properties within auditory cortex in each hemisphere there are differences between the two hemispheres. The right hemisphere has been suggested to be more sensitive to fine spectral details over relatively long time scales and the left hemisphere more sensitive to brief spectral changes (Zatorre and Belin, 2001; Boemio et al., 2005; Poeppel et al., 2008). A related theory is that resting state oscillatory properties of neurons predispose the left auditory cortex for processing at short time scales relevant to the rate of phonemes (gamma band) and the right auditory cortex for processing at longer time scales relevant to the rate of syllables (theta band) (Giraud et al., 2007; Giraud and Poeppel, 2012). Such differences in auditory cortex spectrotemporal sensitivity have been suggested as the basis for the common fMRI finding of right hemisphere dominance for emotional prosody perception, and left hemisphere dominance for speech comprehension (Mitchell et al., 2003; Grandjean et al., 2005).
The emotional content of written words can systematically and continuously be deconstructed along several primary dimensions, and in particular valence (degree of positive or negative emotional association) and arousal (degree of emotional salience), that are separable but interact (Bradley and Lang, 1999; Warriner et al., 2013). The question of whether an expedited subcortical route exists for visual processing of symbolic, detailed emotional input such as written words is contentious (Naccache and Dehaene, 2001; Gaillard et al., 2006). Semantic processing of words is associated with activity across extensive cortical networks (Binder and Desai, 2011), but it is unclear whether some level of analysis related to emotional content is accomplished subcortically. Observations that compared to neutral words, emotional words are more likely to be attended (Williams et al., 1996; Mogg et al., 1997; Anderson and Phelps, 2001), are better remembered (Kensinger and Corkin, 2004; Krolak-Salmon et al., 2004; Strange and Dolan, 2004; Vuilleumier et al., 2004; Kissler et al., 2006), and are also more quickly detected in a lexical decision task (Kanske and Kotz, 2007; Kousta et al., 2009; Scott et al., 2009; Vigliocco et al., 2014), have led to the suggestion that analysis of some emotional linguistic content (in particular, salience and emotional category) could be facilitated at a subcortical level. Connections from the amygdala to visual cortex (Amaral et al., 2003) and to the orbitofrontal cortex (Timbie and Barbas, 2015) could mediate the enhanced cortical processing of emotional words detected subliminally in the amygdala.