JASA express letters最新文献

英文中文

The JIBO Kids Corpus: A speech dataset of child-robot interactions in a classroom environment. JIBO 儿童语料库：课堂环境中儿童与机器人互动的语音数据集。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034195

Natarajan Balaji Shankar, Amber Afshan, Alexander Johnson, Aurosweta Mahapatra, Alejandra Martin, Haolun Ni, Hae Won Park, Marlen Quintero Perez, Gary Yeung, Alison Bailey, Cynthia Breazeal, Abeer Alwan

This paper describes an original dataset of children's speech, collected through the use of JIBO, a social robot. The dataset encompasses recordings from 110 children, aged 4-7 years old, who participated in a letter and digit identification task and extended oral discourse tasks requiring explanation skills, totaling 21 h of session data. Spanning a 2-year collection period, this dataset contains a longitudinal component with a subset of participants returning for repeat recordings. The dataset, with session recordings and transcriptions, is publicly available, providing researchers with a valuable resource to advance investigations into child language development.

本文介绍了通过使用社交机器人智宝收集到的儿童语音原始数据集。该数据集包含 110 名 4-7 岁儿童的录音，他们参与了字母和数字识别任务以及需要解释技能的扩展口头话语任务，共计 21 小时的会话数据。该数据集的收集时间跨度为两年，其中包含一个纵向部分，一部分参与者会返回进行重复记录。该数据集包括会话记录和转录，可公开获取，为研究人员提供了宝贵的资源，有助于推进对儿童语言发展的研究。

引用次数: 0

Direct articulatory observation reveals phoneme recognition performance characteristics of a self-supervised speech model. 直接发音观察揭示了自监督语音模型的音素识别性能特征。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034430

Xuan Shi, Tiantian Feng, Kevin Huang, Sudarsana Reddy Kadiri, Jihwan Lee, Yijing Lu, Yubin Zhang, Louis Goldstein, Shrikanth Narayanan

Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition, especially in front vowels, between American English and Indian English speakers. To gain a deeper understanding of these differences, we conduct real-time MRI-based articulatory analysis, revealing distinct velar region patterns during the production of specific front vowels. This underscores the need to deepen the scientific understanding of self-supervised speech model variances to advance robust and inclusive speech technology.

不同语言背景下的语音发音差异很大，这影响了现代自动语音识别的性能。在此，我们利用直接发音证据评估了自监督语音模型在音素识别方面的性能。研究结果表明，美式英语和印度英语发音人在音素识别方面存在明显差异，尤其是前元音。为了更深入地了解这些差异，我们进行了基于核磁共振成像的实时发音分析，揭示了在发出特定前元音时不同的 velar 区域模式。这突出表明，有必要加深对自监督语音模型差异的科学理解，以推进稳健而包容的语音技术。

引用次数: 0

Ambient noise source characterization using spectral, coherence, and directionality estimates at Kongsfjorden. 在康斯峡湾利用频谱、相干性和方向性估计进行环境噪声源鉴定。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034307

Sanjana M C, Latha G, Thirunavukkarasu A

Ambient noise measurements from an Arctic fjord during summer and winter are analyzed using spectral, coherence, and directionality estimates from a vertically separated pair of hydrophones. The primary noise sources attributed to wind, shipping, and ice activity are categorized and coherence is arrived at. Estimates of the noise field directionality in the vertical and its variation over time and between seasons are used to strengthen the analysis of the time-varying nature of noise sources. Source identification using such processing techniques serves as a valuable tool in passive acoustic monitoring systems for studying ice dynamics in glacierized fjords.

利用一对垂直分离的水听器的频谱、相干性和方向性估计值，对北极峡湾夏季和冬季的环境噪声测量结果进行分析。对归因于风、航运和冰上活动的主要噪声源进行了分类，并得出了相干性。垂直方向的噪声场方向性及其随时间和季节变化的估计值，用于加强对噪声源时变性质的分析。利用这种处理技术进行声源识别，是被动声学监测系统研究冰川峡湾冰层动态的宝贵工具。

引用次数: 0

An experimental approach for comparing the influence of cello string type on bowed attack response. 比较大提琴琴弦类型对弓弦攻击响应影响的实验方法。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034330

Alessio Lampis, Alexander Mayer, Vasileios Chatziioannou

This study investigates the influence of string properties on bowed string attack playability. To assess the attack playability of different string types, a variety of bow forces and bow accelerations were chosen to excite the strings and measure the transient response under different bowing control parameters. The experimentally obtained playability maps of transient duration as function of bow force and acceleration (Guettler diagram) were obtained with a robotic bowing machine, from four different types of cello G2 strings. Results indicate variations in playability across string types, suggesting that string properties impact attack duration.

本研究探讨了琴弦特性对弓弦演奏性的影响。为了评估不同类型琴弦的演奏性能，我们选择了不同的弓力和弓子加速度来激发琴弦，并测量了不同弓法控制参数下的瞬态响应。通过实验获得了四种不同类型的大提琴 G2 琴弦的瞬态持续时间与弓力和加速度的函数关系图（盖特勒图）。结果表明，不同类型琴弦的可演奏性存在差异，这表明琴弦的特性会影响攻击持续时间。

引用次数: 0

Speech recognition in adverse conditions by humans and machines. 人类和机器在恶劣条件下的语音识别。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0032473

Chloe Patman, Eleanor Chodroff

In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise.

在自动语音识别系统的开发过程中，实现与人类相似的性能是一个长期坚持的目标。最近发布的大型口语模型声称可以达到这样的性能，但与人类的直接比较却受到严重限制。本研究针对两个自动语音识别系统（wav2vec 2.0 和 Whisper，基本型和大型），在不利的听力条件下对英国英语中级听者进行了测试：不同信噪比下的语音噪声和酒吧噪声，以及带或不带面罩的录音。人类在所有系统中都保持优势，只有 Whisper large 除酒吧噪音外在所有条件下都优于人类。

引用次数: 0

Starship super heavy acoustics: Far-field noise measurements during launch and the first-ever booster catch. 星际飞船超重型声学：发射和首次接住助推器期间的远场噪声测量。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034453

Kent L Gee, Noah L Pulsipher, Makayle S Kellison, Logan T Mathews, Mark C Anderson, Grant W Hart

Far-field (9.7-35.5 km) noise measurements were made during the fifth flight test of SpaceX's Starship Super Heavy, which included the first-ever booster catch. Key results involving launch and flyback sonic boom sound levels include (a) A-weighted sound exposure levels during launch are 18 dB less than predicted at 35 km; (b) the flyback sonic boom exceeds 10 psf at 10 km; and (c) comparing Starship launch noise to Space Launch System and Falcon 9 shows that Starship is substantially louder; the far-field noise produced during a Starship launch is at least ten times that of Falcon 9.

远场（9.7-35.5 千米）噪声测量是在 SpaceX 的 "超重型星际飞船 "第五次飞行测试期间进行的，其中包括首次助推器接力。涉及发射和回程音爆声级的主要结果包括：（a）在 35 千米处，发射期间的 A 加权声暴露级别比预测值低 18 分贝；（b）在 10 千米处，回程音爆超过 10 psf；以及（c）将星际飞船的发射噪声与太空发射系统和猎鹰 9 号进行比较后发现，星际飞船的噪声要大得多；星际飞船发射期间产生的远场噪声至少是猎鹰 9 号的十倍。

引用次数: 0

Speaker adaptation using codebook integrated deep neural networks for speech enhancement. 利用代码集集成深度神经网络进行语音增强的扬声器适配。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034308

B Chidambar, D Hanumanth Rao Naidu

Deep neural network (DNN) based speech enhancement techniques have shown superior performance compared to the traditional speech enhancement approaches in handling nonstationary noise. However, their performance is often compromised as a result of mismatch between their testing and training conditions. In this work, a codebook integrated deep neural network (CI-DNN) approach is introduced for speech enhancement, which mitigates this mismatch by employing existing speaker adapted codebooks with a DNN. The proposed CI-DNN demonstrates better speech enhancement performance compared to the corresponding speaker independent DNNs. The CI-DNN approach essentially involves a post processing operation for DNN and, hence, is applicable to any DNN architecture.

与传统的语音增强方法相比，基于深度神经网络（DNN）的语音增强技术在处理非稳态噪声方面表现出更优越的性能。然而，由于测试和训练条件不匹配，它们的性能往往大打折扣。在这项工作中，为语音增强引入了一种代码集集成深度神经网络（CI-DNN）方法，该方法通过将现有的扬声器适配代码集与 DNN 结合使用，缓解了这种不匹配问题。与相应的独立于说话人的 DNN 相比，所提出的 CI-DNN 具有更好的语音增强性能。CI-DNN 方法主要涉及 DNN 的后处理操作，因此适用于任何 DNN 架构。

引用次数: 0

The perceptual distinctiveness of the [n-l] contrast in different vowel and tonal contexts. 不同元音和音调语境中 [n-l] 对比的感知独特性。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034196

Pauline Bolin Liu, Mingxing Li

This study investigates the relative perceptual distinction of the [n] vs [l] contrast in different vowel contexts ([_a] vs [_i]) and tonal contexts (high-initial such as HH, HL, vs low-initial such as LL, LH). The results of two speeded AX discrimination experiments indicated that a [n-l] contrast is perceptually more distinct in the [_a] context and with a high-initial tone. The results are consistent with the typology of the [n] vs [l] contrast across Chinese dialects, which is more frequently observed in the [_a] context and with a high-initial tone, supporting a connection between phonological typology and perceptual distinctiveness.

本研究调查了[n]与[l]对比在不同元音语境（[_a]与[_i]）和音调语境（高首音如 HH、HL，与低首音如 LL、LH）中的相对知觉区分。两个加速 AX 分辨实验的结果表明，[n-l] 对比在[_a]语境和高首音时更明显。这些结果与汉语方言中[n]与[l]对比的类型学一致，即[n]与[l]对比在[_a]上下文和高首音中出现的频率更高，从而支持了语音类型学与知觉独特性之间的联系。

引用次数: 0

Fundamental frequency predominantly drives talker differences in auditory brainstem responses to continuous speech. 基频主要驱动说话者对连续语音的听觉脑干反应差异。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0034329

Melissa J Polonenko, Ross K Maddox

Deriving human neural responses to natural speech is now possible, but the responses to male- and female-uttered speech have been shown to differ. These talker differences may complicate interpretations or restrict experimental designs geared toward more realistic communication scenarios. This study found that when a male talker and a female talker had the same fundamental frequency, auditory brainstem responses (ABRs) were very similar. Those responses became smaller and later with increasing fundamental frequency, as did click ABRs with increasing stimulus rates. Modeled responses suggested that the speech and click ABR differences were reasonably predicted by peripheral and brainstem processing of stimulus acoustics.

现在已经可以推导出人类对自然语音的神经反应，但对男性和女性口述语音的反应已被证明是不同的。这些说话者的差异可能会使解释复杂化，或限制面向更真实交流场景的实验设计。本研究发现，当男性说话者和女性说话者的基频相同时，听觉脑干反应（ABRs）非常相似。随着基频的增加，这些反应变得更小更晚，随着刺激率的增加，点击 ABR 也是如此。模拟反应表明，外周和脑干对刺激声学的处理可合理预测语音和点击 ABR 的差异。

引用次数: 0

Evaluating audio quality ratings and scene analysis performance of hearing-impaired listeners for multi-track music. 评估听力受损听众对多轨音乐的音频质量评级和场景分析性能。

IF 1.2 Q3 ACOUSTICS

JASA express letters

Pub Date : 2024-11-01 DOI: 10.1121/10.0032474

Aravindan Joseph Benjamin, Kai Siedenburg

This study assessed musical scene analysis (MSA) performance and subjective quality ratings of multi-track mixes as a function of spectral manipulations using the EQ-transform (% EQT). This transform exaggerates or reduces the spectral shape changes in a given track with respect to a relatively flat, smooth reference spectrum. Data from 30 younger normal hearing (yNH) and 23 older hearing-impaired (oHI) participants showed that MSA performance was robust to changes in % EQT. However, audio quality ratings elicited from yNH participants were more sensitive to % EQT than those of oHI participants. A significant positive correlation between MSA performance and quality ratings among oHI showed that oHI participants with better MSA performances gave higher-quality ratings, whereas there was no significant correlation for yNH listeners. Overall, these data indicate the complementary virtue of measures of MSA and audio quality ratings for assessing the suitability of music mixes for hearing-impaired listeners.

本研究评估了音乐场景分析（MSA）性能和多轨混音的主观质量评分，作为使用均衡器变换（% EQT）进行频谱处理的函数。相对于相对平坦、平滑的参考频谱，这种变换可夸大或缩小特定音轨中的频谱形状变化。来自 30 名听力正常的年轻人（yNH）和 23 名听力受损的老年人（oHI）的数据显示，MSA 的表现对 EQT 百分比的变化很稳定。然而，与 oHI 参与者相比，yNH 参与者的音频质量评分对 EQT 百分比更为敏感。oHI 听众的 MSA 表现与音质评分之间存在明显的正相关，表明 MSA 表现较好的 oHI 听众给出的音质评分较高，而 yNH 听众则没有明显的相关性。总之，这些数据表明，在评估音乐混音是否适合听障听众时，MSA 和音频质量评分的测量方法具有互补性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

JASA express letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀