Visual influences on voice-selective neurons in the anterior superior-temporal plane

Catherine Perrodin, Christoph Kayser, Nikos K. Logothetis, Christopher I. Petkov
Time: 2009-06-30  01:45 PM – 02:00 PM
Last modified: 2009-06-04


For social interaction and survival primates rely heavily on vocal and facial communication signals from their conspecifics. To date many studies have evaluated the unisensory representations of either vocal or facial information in regions thought to be “voice� or “face� selective. Other studies have directly evaluated the multisensory interactions of voices and faces but have focused on posterior auditory regions closer to the primary auditory cortex. This work investigates multisensory interactions at the neuronal level in an auditory region in the anterior superior temporal plane, which contains one of the important regions for processing “voice�-related information.

Extracellular recordings were obtained from the auditory cortex of macaque monkeys, targeting an anterior “voice� region that we have previously described with functional magnetic resonance imaging (fMRI). For stimulation we used movies of vocalizing monkeys and humans which we matched in their low-level auditory and visual features. These dynamic face and voice stimuli allowed us to evaluate how neurons responded to auditory, visual or audio-visual components of the stimuli. Our experiments also contained control conditions consisting of several mismatched audiovisual stimuli combinations, such as 1) a voice matched to a face from a different species, 2) adding a temporal delay in the visual component of the stimulus, or 3) using an acoustically manipulated voice with the original facial stimulus.

Our neuronal recordings identified a clustered population of voice-selective sites in the anterior superior temporal plane, ~5 mm anterior to field RT. A significant visual influence of the dynamic faces on the corresponding (“matched�) vocalizations was observed in both the local-field potential (LFP) and the spiking activity (analog multiunit activity, AMUA): 38% of the sites showed audiovisual interactions in the LFP signals, and 60% in the AMUA. In addition, the multisensory influence was significantly stronger for the matching voice and face stimuli than to any of the incongruent (“mismatched�) control conditions, confirming the specificity of the cross-sensory influence on the neuronal activity.

Our results provide evidence for visual influences in what has been characterized as an auditory ‘voice’ area. This visual modulation was specific for behaviorally relevant voice-face associations and demonstrates that the processing of voice related information in higher auditory regions can be influenced by multisensory input.

Conference System by Open Conference Systems & MohSho Interactive Multimedia