Comparing accounts of formant normalization against US English listeners' vowel perception.

IF 2.3 2区物理与天体物理 Q2 ACOUSTICS Journal of the Acoustical Society of America Pub Date : 2025-02-01 DOI:10.1121/10.0035476

Anna Persson, Santiago Barreda, T Florian Jaeger

{"title":"Comparing accounts of formant normalization against US English listeners' vowel perception.","authors":"Anna Persson, Santiago Barreda, T Florian Jaeger","doi":"10.1121/10.0035476","DOIUrl":null,"url":null,"abstract":"<p><p>Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 2","pages":"1458-1482"},"PeriodicalIF":2.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0035476","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

比较构音正常化与美国英语听者的元音感知。

人类的语音识别往往是稳健的，尽管有大量的对话者的变化。这种能力的关键是听者适应声道生理个体差异的听觉正常化机制。本研究探讨了这种归一化所涉及的计算。两个8-way选择性强迫选择实验评估了L1听众在整个美国英语元音空间中的分类——包括未改变的和合成的刺激。在这些实验中，听众的反应与20种有影响力的标准化解释的预测进行了比较，这些解释在语音感知的推理和记忆能力方面存在明显差异。这包括到心理声学空间的无估计变换的变体，相对于并发声学特性的内在归一化，以及相对于说话者特定统计的外在归一化。听者的反应最好用外在规格化来解释，这表明听者学习并储存了说话者话语的分布特性。具体来说，计算简单（单参数）的外在归一化最适合听众的反应。这种简单的外在归一化也明显优于Lobanov归一化，Lobanov归一化是一种计算上更复杂的方法，在语音学和音系学、社会语言学、类型学和语言习得的研究中仍然很流行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the Acoustical Society of America 物理-声学

CiteScore

4.60

自引率

16.70%

发文量

1433

审稿时长

4.7 months

期刊介绍： Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.