噪音中语音声学的神经跟踪与大型语言模型估计的词汇可预测性相结合。

IF 2.7 3区医学 Q3 NEUROSCIENCES eNeuro Pub Date : 2024-08-20 Print Date: 2024-08-01 DOI:10.1523/ENEURO.0507-23.2024

Paul Iverson, Jieun Song

{"title":"噪音中语音声学的神经跟踪与大型语言模型估计的词汇可预测性相结合。","authors":"Paul Iverson, Jieun Song","doi":"10.1523/ENEURO.0507-23.2024","DOIUrl":null,"url":null,"abstract":"Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.","PeriodicalId":11617,"journal":{"name":"eNeuro","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11335968/pdf/","citationCount":"0","resultStr":"{\"title\":\"Neural Tracking of Speech Acoustics in Noise Is Coupled with Lexical Predictability as Estimated by Large Language Models.\",\"authors\":\"Paul Iverson, Jieun Song\",\"doi\":\"10.1523/ENEURO.0507-23.2024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.\",\"PeriodicalId\":11617,\"journal\":{\"name\":\"eNeuro\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11335968/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"eNeuro\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1523/ENEURO.0507-23.2024\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/1 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"Q3\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"eNeuro","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/ENEURO.0507-23.2024","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"Print","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

成人聆听了两个在空间上分开的演讲者朗读报纸和杂志文章的录音。他们被要求聆听其中一位，忽略另一位，并记录脑电图以评估他们的神经处理过程。机器学习从三个层面提取了追踪目标和分心发言者的神经源：语音的声包络（delta 波段和 theta 波段调制）、单个词的词频以及由 GPT-4 和早期词法模型估算的单个词的上下文可预测性。为了更广泛地了解语音感知，一半的受试者完成了同步视觉任务，听者包括母语为英语和非母语为英语的人。研究提取了听觉和词汇处理两个层面的不同神经成分，结果表明，在大多数测量中，母语为英语的人比母语为非英语的人具有更强的目标--分隔符分离能力，而词汇处理能力则因视觉任务而减弱。此外，词汇的可预测性和频率与听觉处理之间存在着一种新的交互作用；对于词汇难度较大的单词，听觉跟踪作用更强，这表明人们在需要进行词汇选择时会更加努力地倾听声音。这表明，语音感知并不只是一个从声学处理到词库的前馈过程。意义声明在具有挑战性的听力条件下，人们会利用集中注意力来帮助理解个别说话者，而忽略其他人，这改变了他们在听觉和词汇层面对语音的神经处理。然而，自然材料（如对话、有声读物等）的词汇处理一直难以测量，原因是估算较长话语中单个词汇可预测性的工具存在局限性。本研究使用当代大型语言模型 GPT-4 来估计单词的可预测性，并证明听者会根据这些预测对其听觉神经处理进行在线调整；当单词与上下文的可预测性较低时，神经活动会更紧密地跟踪目标谈话者的声音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Neural Tracking of Speech Acoustics in Noise Is Coupled with Lexical Predictability as Estimated by Large Language Models.

Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

eNeuro Neuroscience-General Neuroscience

CiteScore

5.00

自引率

2.90%

发文量

486

审稿时长

16 weeks

期刊介绍： An open-access journal from the Society for Neuroscience, eNeuro publishes high-quality, broad-based, peer-reviewed research focused solely on the field of neuroscience. eNeuro embodies an emerging scientific vision that offers a new experience for authors and readers, all in support of the Society’s mission to advance understanding of the brain and nervous system.