Audiovisual Phonetic Processing

Lynne E. Bernstein, Communication Neuroscience Department, House Ear Institute, Los Angeles, California, USZ

Abstract
Audiovisual speech stimuli are well-known to produce multisensory interactions. These stimuli, which are typically natural recordings, afford multiple and complex attributes. Not all of the observed multisensory interactions are necessarily due to linguistically relevant (phonetic) stimulus attributes. Understanding multisensory phonetic processing requires isolating effects of the physical stimulus attributes that are due to the production of speech segments (consonants and vowels) and supra-segmental prosodic features. We suggest that quantitative measures of AV phonetic stimulus attributes can be used to isolate cortical-level AV phonetic processing interactions. We examined the hypothesis that during perception of incongruent auditory and visual speech stimuli, as in the McGurk effect, bottom-up stimulus information mismatches with stored representations, leading to distinct neural responses. Those responses were predicted to vary as a function of AV stimulus congruity, which was estimated using distance metrics between optical and acoustic signals. Subjects susceptible to the McGurk effect were enrolled in an fMRI study and were imaged at 3T. AV nonsense stimuli were selected to have three quantified levels of congruity: high (HC—matched), medium (MC—mismatched), and low congruity (LC—mismatched). To screen for areas sensitive to AV speech interactions in the fMRI data, overall activity was tested with the contrast [(HC – REST) ∩ (MC – REST) ∩ (LC – REST)]. Then, contrasts were formed to find those areas that specifically were sensitive to differences in AV stimulus congruity [i.e., within the conjunction [(HC – REST) ∩ (MC – REST) ∩ (LC – REST)], the contrasts (LC – HC), (MC – HC), and (LC – MC) were formed]. Many areas were active in the contrast between the LC (highly mismatched) versus HC (matched), but those areas that were also active in the contrast between the medium mismatch and the matched, (MC – HC), were in left STS and right occipitotemporal sulcus. The contrast between low and medium congruity mismatch resulted in activation in right inferior frontal gyrus. Thus, our results showed sensitivity to AV phonetic congruity levels that had been estimated quantitatively. The demonstration of this type of second-order isomorphism, that is, between stimulus congruity relationships and differential cortical activity, opens up an avenue to more sensitive studies of AV phonetic processing. The differential levels of activity across congruity levels support the hypothesis that AV phonetic perception involves registering the correspondence between modality specific phonetic representations.

Not available

Back to Abstract