Human voice directivity shows horizontal asymmetries caused by the shape of the lips or the position of the tooth and tongue during vocalization. This study presents and analyzes the asymmetries of voice directivity datasets of 23 different phonemes. The asymmetries were determined from datasets obtained in previous measurements with 13 subjects in a surrounding spherical microphone array. The results show that asymmetries are inherent to human voice production and that they differ between the phoneme groups with the strongest effect on the [s], the [l], and the nasals [m], [n], and [ŋ]. The least asymmetries were found for the plosives.
{"title":"Phoneme dependence of horizontal asymmetries in voice directivity.","authors":"Christoph Pörschmann, Johannes M Arend","doi":"10.1121/10.0024878","DOIUrl":"10.1121/10.0024878","url":null,"abstract":"<p><p>Human voice directivity shows horizontal asymmetries caused by the shape of the lips or the position of the tooth and tongue during vocalization. This study presents and analyzes the asymmetries of voice directivity datasets of 23 different phonemes. The asymmetries were determined from datasets obtained in previous measurements with 13 subjects in a surrounding spherical microphone array. The results show that asymmetries are inherent to human voice production and that they differ between the phoneme groups with the strongest effect on the [s], the [l], and the nasals [m], [n], and [ŋ]. The least asymmetries were found for the plosives.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139731133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A study conducted in Japan aimed to understand how childcare facilities should coexist with the local community. The researchers used a sound survey, demographic survey, and logistic regression analysis to study residents' noise awareness in various areas. They found that higher land prices led to lower approval of new childcare facilities. The study also revealed that those more sensitive to noise and less willing to participate in public events at childcare facilities were more significantly opposed to the establishment of new facilities.
{"title":"An awareness survey on childcare facilities and acoustic environment in Japan's residential areas.","authors":"Hiroko Kataoka, Yuki Yoshitomi, Haruto Hirai, Masayuki Takada","doi":"10.1121/10.0024879","DOIUrl":"10.1121/10.0024879","url":null,"abstract":"<p><p>A study conducted in Japan aimed to understand how childcare facilities should coexist with the local community. The researchers used a sound survey, demographic survey, and logistic regression analysis to study residents' noise awareness in various areas. They found that higher land prices led to lower approval of new childcare facilities. The study also revealed that those more sensitive to noise and less willing to participate in public events at childcare facilities were more significantly opposed to the establishment of new facilities.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Partial credit scoring for speech recognition tasks can improve measurement precision. However, assessing the magnitude of this improvement with partial credit scoring is challenging because meaningful speech contains contextual cues, which create correlations between the probabilities of correctly identifying each token in a stimulus. Here, beta-binomial distributions were used to estimate recognition accuracy and intraclass correlation for phonemes in words and words in sentences in listeners with cochlear implants (N = 20). Estimates demonstrated substantial intraclass correlation in recognition accuracy within stimuli. These correlations were invariant across individuals. Intraclass correlations should be addressed in power analysis of partial credit scoring.
{"title":"Characterizing correlations in partial credit speech recognition scoring with beta-binomial distributions.","authors":"Adam K Bosen","doi":"10.1121/10.0024633","DOIUrl":"10.1121/10.0024633","url":null,"abstract":"<p><p>Partial credit scoring for speech recognition tasks can improve measurement precision. However, assessing the magnitude of this improvement with partial credit scoring is challenging because meaningful speech contains contextual cues, which create correlations between the probabilities of correctly identifying each token in a stimulus. Here, beta-binomial distributions were used to estimate recognition accuracy and intraclass correlation for phonemes in words and words in sentences in listeners with cochlear implants (N = 20). Estimates demonstrated substantial intraclass correlation in recognition accuracy within stimuli. These correlations were invariant across individuals. Intraclass correlations should be addressed in power analysis of partial credit scoring.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10848658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139652383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Confidence intervals of location (CIL) of calling marine mammals, derived from time-differences-of-arrival (TDOA) between receivers, depend on errors of TDOAs, receiver location, clocks, and sound speeds. Simulations demonstrate a time-differences-of-arrival-beamforming-locator (TDOA-BL) yields CIL in error by O(10-100) km for experimental scenarios because it is not designed to account for relevant errors. The errors are large and sometimes exceed the distances of detection. Another locator designed for all errors, sequential bound estimation, yields CIL always containing the true location. TDOA-BL have and are being used to understand potential effects of environmental stress on marine mammals; a use worth reconsidering.
{"title":"Confidence intervals of location for marine mammal calls via time-differences-of-arrival: Sensitivity analysis.","authors":"Maya Mathur, John L Spiesberger, Devin Pascoe","doi":"10.1121/10.0024634","DOIUrl":"10.1121/10.0024634","url":null,"abstract":"<p><p>Confidence intervals of location (CIL) of calling marine mammals, derived from time-differences-of-arrival (TDOA) between receivers, depend on errors of TDOAs, receiver location, clocks, and sound speeds. Simulations demonstrate a time-differences-of-arrival-beamforming-locator (TDOA-BL) yields CIL in error by O(10-100) km for experimental scenarios because it is not designed to account for relevant errors. The errors are large and sometimes exceed the distances of detection. Another locator designed for all errors, sequential bound estimation, yields CIL always containing the true location. TDOA-BL have and are being used to understand potential effects of environmental stress on marine mammals; a use worth reconsidering.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139652384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seung-Eun Kim, Bronya R Chernyak, Olga Seleznova, Joseph Keshet, Matthew Goldrick, Ann R Bradlow
Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard-human transcription-is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR--based speech intelligibility modeling.
测量人类听者在不同环境条件下识别语音的能力(语音清晰度)是语音通信理论、技术和临床方法面临的一项挑战。目前的黄金标准--人工转录--耗费大量时间和资源。自动语音识别(ASR)系统的最新进展为自动测量可懂度提供了可能。这项研究用第二语言噪音语音测试了 4 种最先进的 ASR 系统,发现其中一种系统(whisper)的准确度达到或超过了人类听者的准确度。然而,whisper 的反应内容与人类的反应有很大差异,尤其是在信噪比较低的情况下,这表明基于 ASR 的语音可懂度建模既有机会也有局限性。
{"title":"Automatic recognition of second language speech-in-noise.","authors":"Seung-Eun Kim, Bronya R Chernyak, Olga Seleznova, Joseph Keshet, Matthew Goldrick, Ann R Bradlow","doi":"10.1121/10.0024877","DOIUrl":"10.1121/10.0024877","url":null,"abstract":"<p><p>Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard-human transcription-is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR--based speech intelligibility modeling.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139731132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current study explores whether Mandarin initial and medial voiceless unaspirated and voiceless aspirated stops differ in their tongue positions and post-vocalic voicing during closure. Ultrasound tongue imaging and acoustic data from five Mandarin speakers revealed (1) no consistent pattern for tongue positions among speakers, and (2) no difference in degree of voicing during closure between the two stop series. These findings suggest that tongue position is not a reliable articulatory correlate for Mandarin laryngeal contrasts. This further suggests that aspiration is not correlated with tongue position differences, unlike the reported correlation between voicing and tongue root advancement.
{"title":"Tongue position in Mandarin Chinese voiceless stops.","authors":"Suzy Ahn, Harim Kwon, Matthew Faytak","doi":"10.1121/10.0024997","DOIUrl":"10.1121/10.0024997","url":null,"abstract":"<p><p>The current study explores whether Mandarin initial and medial voiceless unaspirated and voiceless aspirated stops differ in their tongue positions and post-vocalic voicing during closure. Ultrasound tongue imaging and acoustic data from five Mandarin speakers revealed (1) no consistent pattern for tongue positions among speakers, and (2) no difference in degree of voicing during closure between the two stop series. These findings suggest that tongue position is not a reliable articulatory correlate for Mandarin laryngeal contrasts. This further suggests that aspiration is not correlated with tongue position differences, unlike the reported correlation between voicing and tongue root advancement.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.
{"title":"Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments.","authors":"Mark Hedrick, Kristen Thornton","doi":"10.1121/10.0024334","DOIUrl":"10.1121/10.0024334","url":null,"abstract":"<p><p>Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139426203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels. These findings offer potential cues for identifying shouting, particularly useful when fine-structure and dynamic range cues are absent, for example, in cochlear implant users.
{"title":"Shouting affects temporal properties of the speech amplitude envelope.","authors":"Kostis Dimos, Lei He, Volker Dellwo","doi":"10.1121/10.0023995","DOIUrl":"10.1121/10.0023995","url":null,"abstract":"<p><p>Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels. These findings offer potential cues for identifying shouting, particularly useful when fine-structure and dynamic range cues are absent, for example, in cochlear implant users.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study was designed to investigate the relationship between sound level and autonomic arousal using acoustic signals similar in level and acoustic properties to common sounds in the built environment. Thirty-three young adults were exposed to background sound modeled on ventilation equipment noise presented at levels ranging from 35 to 75 dBA sound pressure level (SPL) in 2 min blocks while they sat and read quietly. Autonomic arousal was measured in terms of skin conductance level. Results suggest that there is a direct relationship between sound level and arousal, even at these realistic levels. However, the effect of habituation appears to be more important overall.
{"title":"Effects of background noise on autonomic arousal (skin conductance level).","authors":"Ann Alvar, Alexander L Francis","doi":"10.1121/10.0024272","DOIUrl":"10.1121/10.0024272","url":null,"abstract":"<p><p>This study was designed to investigate the relationship between sound level and autonomic arousal using acoustic signals similar in level and acoustic properties to common sounds in the built environment. Thirty-three young adults were exposed to background sound modeled on ventilation equipment noise presented at levels ranging from 35 to 75 dBA sound pressure level (SPL) in 2 min blocks while they sat and read quietly. Autonomic arousal was measured in terms of skin conductance level. Results suggest that there is a direct relationship between sound level and arousal, even at these realistic levels. However, the effect of habituation appears to be more important overall.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139378852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A method for superimposing the shape of the palate on three-dimensional (3D) electromagnetic articulography (EMA) data is proposed. A biteplate with a dental impression tray and EMA sensors is used to obtain the palatal shape and record the sensor positions. The biteplate is then 3D scanned, and the scanned palate is mapped to the EMA data by matching the sensor positions on the scanned image with those in the EMA readings. The average distance between the mapped palate and the EMA palate traces is roughly 1 mm for nine speakers and is comparable to the measurement error of the EMA.
本文提出了一种在三维(3D)电磁发音成像(EMA)数据上叠加腭部形状的方法。使用带有牙科印模托盘和 EMA 传感器的咬合板来获取腭部形状并记录传感器位置。然后对咬合板进行三维扫描,通过匹配扫描图像上的传感器位置和 EMA 读数中的传感器位置,将扫描的腭部映射到 EMA 数据中。在九个扬声器中,绘制的上颚与 EMA 上颚痕迹之间的平均距离约为 1 毫米,与 EMA 的测量误差相当。
{"title":"Mapping palatal shape to electromagnetic articulography data: An approach using 3D scanning and sensor matching.","authors":"Yukiko Nota, Tatsuya Kitamura, Hironori Takemoto, Kikuo Maekawa","doi":"10.1121/10.0024215","DOIUrl":"https://doi.org/10.1121/10.0024215","url":null,"abstract":"<p><p>A method for superimposing the shape of the palate on three-dimensional (3D) electromagnetic articulography (EMA) data is proposed. A biteplate with a dental impression tray and EMA sensors is used to obtain the palatal shape and record the sensor positions. The biteplate is then 3D scanned, and the scanned palate is mapped to the EMA data by matching the sensor positions on the scanned image with those in the EMA readings. The average distance between the mapped palate and the EMA palate traces is roughly 1 mm for nine speakers and is comparable to the measurement error of the EMA.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139076129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}