一个消声、高保真、多向语音语料库。

IF 2.2 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY Journal of Speech Language and Hearing Research Pub Date : 2025-01-02 Epub Date: 2024-12-02 DOI:10.1044/2024_JSLHR-24-00296

Margaret K Miller, Vahid Delaram, Allison Trine, Rohit M Ananthanarayana, Emily Buss, Brian B Monson, G Christopher Stecker

{"title":"一个消声、高保真、多向语音语料库。","authors":"Margaret K Miller, Vahid Delaram, Allison Trine, Rohit M Ananthanarayana, Emily Buss, Brian B Monson, G Christopher Stecker","doi":"10.1044/2024_JSLHR-24-00296","DOIUrl":null,"url":null,"abstract":"Introduction: We currently lack speech testing materials faithful to broader aspects of real-world auditory scenes such as speech directivity and extended high frequency (EHF; > 8 kHz) content that have demonstrable effects on speech perception. Here, we describe the development of a multidirectional, high-fidelity speech corpus using multichannel anechoic recordings that can be used for future studies of speech perception in complex environments by diverse listeners.Design: Fifteen male and 15 female talkers (21.3-60.5 years) recorded Bamford-Kowal-Bench (BKB) Standard Sentence Test lists, digits 0-10, and a 2.5-min unscripted narrative. Recordings were made in an anechoic chamber with 17 free-field condenser microphones spanning 0°-180° azimuth angle around the talker using a 48 kHz sampling rate.Results: Recordings resulted in a large corpus containing four BKB lists, 10 digits, and narratives produced by 30 talkers, and an additional 17 BKB lists (21 total) produced by a subset of six talkers.Conclusions: The goal of this study was to create an anechoic, high-fidelity, multidirectional speech corpus using standard speech materials. More naturalistic narratives, useful for the creation of babble noise and speech maskers, were also recorded. A large group of 30 talkers permits testers to select speech materials based on talker characteristics relevant to a specific task. The resulting speech corpus allows for more diverse and precise speech recognition testing, including testing effects of speech directivity and EHF content. Recordings are publicly available.","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"411-418"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Anechoic, High-Fidelity, Multidirectional Speech Corpus.\",\"authors\":\"Margaret K Miller, Vahid Delaram, Allison Trine, Rohit M Ananthanarayana, Emily Buss, Brian B Monson, G Christopher Stecker\",\"doi\":\"10.1044/2024_JSLHR-24-00296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: We currently lack speech testing materials faithful to broader aspects of real-world auditory scenes such as speech directivity and extended high frequency (EHF; > 8 kHz) content that have demonstrable effects on speech perception. Here, we describe the development of a multidirectional, high-fidelity speech corpus using multichannel anechoic recordings that can be used for future studies of speech perception in complex environments by diverse listeners.Design: Fifteen male and 15 female talkers (21.3-60.5 years) recorded Bamford-Kowal-Bench (BKB) Standard Sentence Test lists, digits 0-10, and a 2.5-min unscripted narrative. Recordings were made in an anechoic chamber with 17 free-field condenser microphones spanning 0°-180° azimuth angle around the talker using a 48 kHz sampling rate.Results: Recordings resulted in a large corpus containing four BKB lists, 10 digits, and narratives produced by 30 talkers, and an additional 17 BKB lists (21 total) produced by a subset of six talkers.Conclusions: The goal of this study was to create an anechoic, high-fidelity, multidirectional speech corpus using standard speech materials. More naturalistic narratives, useful for the creation of babble noise and speech maskers, were also recorded. A large group of 30 talkers permits testers to select speech materials based on talker characteristics relevant to a specific task. The resulting speech corpus allows for more diverse and precise speech recognition testing, including testing effects of speech directivity and EHF content. Recordings are publicly available.\",\"PeriodicalId\":51254,\"journal\":{\"name\":\"Journal of Speech Language and Hearing Research\",\"volume\":\" \",\"pages\":\"411-418\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Speech Language and Hearing Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1044/2024_JSLHR-24-00296\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Speech Language and Hearing Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1044/2024_JSLHR-24-00296","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/2 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

我们目前缺乏忠实于现实世界听觉场景更广泛方面的语音测试材料，如语音指向性和扩展高频(EHF；> 8 kHz)的内容对语音感知有明显的影响。在这里，我们描述了使用多通道消声录音的多向高保真语音语料库的开发，该语料库可用于未来不同听者在复杂环境下的语音感知研究。设计：15名男性和15名女性谈话者（21.3-60.5岁）记录Bamford-Kowal-Bench （BKB）标准句子测试列表，数字0-10，并进行2.5分钟的无脚本叙述。录音是在消声室中进行的，17个自由场电容麦克风在扬声器周围的方位角为0°-180°，采样率为48 kHz。结果：录音产生了一个大型语料库，包含4个BKB列表，10个数字，30个谈话者产生的叙述，以及6个谈话者的子集产生的另外17个BKB列表（总共21个）。结论：本研究的目的是使用标准语音材料创建一个无回声、高保真、多向的语音语料库。更自然的叙述，有助于创造牙牙学语的噪音和语音面具，也被记录下来。一个由30名说话者组成的大小组允许测试者根据与特定任务相关的说话者特征选择演讲材料。由此产生的语音语料库允许更多样化和精确的语音识别测试，包括语音指向性和EHF内容的测试效果。录音是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Anechoic, High-Fidelity, Multidirectional Speech Corpus.

Introduction: We currently lack speech testing materials faithful to broader aspects of real-world auditory scenes such as speech directivity and extended high frequency (EHF; > 8 kHz) content that have demonstrable effects on speech perception. Here, we describe the development of a multidirectional, high-fidelity speech corpus using multichannel anechoic recordings that can be used for future studies of speech perception in complex environments by diverse listeners.

Design: Fifteen male and 15 female talkers (21.3-60.5 years) recorded Bamford-Kowal-Bench (BKB) Standard Sentence Test lists, digits 0-10, and a 2.5-min unscripted narrative. Recordings were made in an anechoic chamber with 17 free-field condenser microphones spanning 0°-180° azimuth angle around the talker using a 48 kHz sampling rate.

Results: Recordings resulted in a large corpus containing four BKB lists, 10 digits, and narratives produced by 30 talkers, and an additional 17 BKB lists (21 total) produced by a subset of six talkers.

Conclusions: The goal of this study was to create an anechoic, high-fidelity, multidirectional speech corpus using standard speech materials. More naturalistic narratives, useful for the creation of babble noise and speech maskers, were also recorded. A large group of 30 talkers permits testers to select speech materials based on talker characteristics relevant to a specific task. The resulting speech corpus allows for more diverse and precise speech recognition testing, including testing effects of speech directivity and EHF content. Recordings are publicly available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Speech Language and Hearing Research AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY-REHABILITATION

CiteScore

4.10

自引率

19.20%

发文量

538

审稿时长

4-8 weeks

期刊介绍： Mission: JSLHR publishes peer-reviewed research and other scholarly articles on the normal and disordered processes in speech, language, hearing, and related areas such as cognition, oral-motor function, and swallowing. The journal is an international outlet for both basic research on communication processes and clinical research pertaining to screening, diagnosis, and management of communication disorders as well as the etiologies and characteristics of these disorders. JSLHR seeks to advance evidence-based practice by disseminating the results of new studies as well as providing a forum for critical reviews and meta-analyses of previously published work. Scope: The broad field of communication sciences and disorders, including speech production and perception; anatomy and physiology of speech and voice; genetics, biomechanics, and other basic sciences pertaining to human communication; mastication and swallowing; speech disorders; voice disorders; development of speech, language, or hearing in children; normal language processes; language disorders; disorders of hearing and balance; psychoacoustics; and anatomy and physiology of hearing.