iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://pubmed.ncbi.nlm.nih.gov/21975452/
Neural coding of continuous speech in auditory cortex during monaural and dichotic listening - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;107(1):78-89.
doi: 10.1152/jn.00297.2011. Epub 2011 Oct 5.

Neural coding of continuous speech in auditory cortex during monaural and dichotic listening

Affiliations

Neural coding of continuous speech in auditory cortex during monaural and dichotic listening

Nai Ding et al. J Neurophysiol. 2012 Jan.

Abstract

The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of the working principle of the spectrotemporal response function (STRF). The STRF models which stimulus features drive a neural response most powerfully. When an acoustic feature strongly (weakly) resembling the time-reversed STRF appears in the stimulus, the model predicts a strong (weak) neural response. The STRF is iteratively optimized with the use of cross-validation to provide as accurate a prediction of the magnetoencephalography (MEG) response as possible (David et al. 2007).
Fig. 2.
Fig. 2.
Predictive power of the STRF model by frequency band. The grand-averaged predictive power is shown as the black line, with error bars representing 1 SE on each side. The gray-shaded area covers from 5 to 95 percentiles of chance level predictive power, estimated based on pseudo-STRFs. The predictive power of STRF of MEG speech response was significantly higher than chance level below 8 Hz.
Fig. 3.
Fig. 3.
STRF derived from the MEG speech response to monaurally presented speech (A) and dichotically presented simultaneous speech signals (B). The most salient feature of the STRF was a negative peak (same polarity as M100/N1) at ∼100 ms poststimulus, sometimes followed by a later peak of opposite polarity. In the dichotic listening condition, the amplitude of the STRF was higher for the attended speech than for the interfering (unattended) speech. All examples are from the right hemisphere for speech presented contralaterally. The STRF was smoothed using a 2-dimensional (2-D) Gaussian function with SD of 5 semitones and 25 ms. Data are from representative subject R1474.
Fig. 4.
Fig. 4.
Predictive power and separability of the STRF. Each point is the result from an individual subject in 1 condition. STRFs with any substantial predictive power are skewed toward high separability. Circles and squares are the results from monaural and binaural listening conditions, respectively; filled and open symbols are results from left and right hemispheres, respectively. The background contour map shows the joint probability distribution density of predictive power and STRF separability. The probability distribution density was obtained by smoothing the 2-D histogram using a Gaussian function (SD = 0.1 in both directions).
Fig. 5.
Fig. 5.
Temporal response function and spectral sensitivity functions. A: grand average of the temporal response functions to speech stimuli under 3 different listening conditions. The amplitude of the temporal response function was higher in the monaural speech condition and was strongly modulated by attention in the dichotic listening condition. B: the normalized spectral sensitivity function (grand average over subjects) had a peak between 400 and 2,000 Hz in both hemispheres and all listening conditions. Normalized spectral sensitivity functions to contralateral and ipsilateral stimuli were not significantly different and therefore were averaged. The spectral sensitivity function was smoothed using a Gaussian function with an SD of 5 semitones.
Fig. 6.
Fig. 6.
Amplitude and latency of the M100-like response (grand average). Error bars represent SE. The response amplitude was universally larger and the response latency was universally shorter for monaurally presented speech. In the dichotic condition, the response was stronger for the attended speech than for the unattended speech.
Fig. 7.
Fig. 7.
Stimulus information encoded in the MEG response. A: the correlation (grayscale intensity) between the stimulus speech envelope and the envelope reconstructed from the right hemisphere MEG response. The stimulus envelope most correlated with each reconstructed envelope is marked by a square. B: stimulus decoding accuracy as a function of the number of stimulus segments per second for monaural speech. The black and gray curves are the results from the left and right hemispheres, respectively; solid and dashed curves are based on the left- and right-side stimuli, respectively. The information decoded from the right and left hemispheres was roughly 4 and 1 bit/s, respectively, for a monaural speech stimulus and is a conservative estimate of the stimulus information available in the MEG response.
Fig. 8.
Fig. 8.
Correlation between the MEG response to dichotic speech stimuli and the MEG responses to the 2 speech components presented monaurally. Each symbol is the result from 1 subject. The responses in the right and left hemispheres are plotted as stars and squares, respectively. For each hemisphere, if the attended ear in the dichotic condition was the contralateral ear, the result is plotted as a filled symbol, but otherwise it is plotted as an open symbol. The response to dichotic stimuli was more correlated with the response to the attended speech component, especially in the contralateral hemisphere.
Fig. 9.
Fig. 9.
Predictive power of the STRF derived from each denoising source separation (DSS) component. The first DSS component resulted in significantly higher predictive power than other components and therefore was the only one used to localize the source of the MEG response.

Similar articles

Cited by

References

    1. Abrams DA, Nicol T, Zecker S, Kraus N. Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J Neurosci 28: 3958–3965, 2008 - PMC - PubMed
    1. Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H, Merzenich MM. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc Natl Acad Sci USA 98: 13367–13372, 2001 - PMC - PubMed
    1. Ahveninen J, Hämäläinen M, Jääskeläinen IP, Ahlfors SP, Huang S, Lin FH, Raij T, Sams M, Vasios CE, Belliveau JW. Attention-driven auditory cortex short-term plasticity helps segregate relevant sounds from noise. Proc Natl Acad Sci USA 108: 4182–4187, 2011 - PMC - PubMed
    1. Aiken SJ, Picton TW. Human cortical responses to the speech envelope. Ear Hear 29: 139–157, 2008 - PubMed
    1. Bidet-Caulet A, Fischer C, Besle J, Aguera PE, Giard MH, Bertrand O. Effects of selective attention on the electrophysiological representation of concurrent sounds in the human auditory cortex. J Neurosci 27: 9252–9261, 2007 - PMC - PubMed

Publication types

LinkOut - more resources