首页 > 最新文献

Speech Communication最新文献

英文 中文
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"178 ","pages":"Article 103353"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146490516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103329"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103333"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103345"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103350"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103332"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"178 ","pages":"Article 103363"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146490515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2026-01-01
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103331"},"PeriodicalIF":3.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146636076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MS-VBRVQ: Multi-scale variable bitrate speech residual vector quantization 多尺度可变比特率语音残差矢量量化
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-12-18 DOI: 10.1016/j.specom.2025.103346
Yukun Qian, Shiyun Xu, Xuyi Zhuang, Zehua Zhang, Mingjiang Wang
Recent speech quantization compression models have adopted residual vector quantization (RVQ) methods. However, these models typically use fixed bitrates, allocating the same number of time frames at a constant scale across all speech segments. This approach may lead to bitrate inefficiency, particularly when the audio contains simpler segments. To address this limitation, we introduce a multi-scale variable bitrate approach by incorporating a relative importance map, adaptive threshold masks, and a gradient estimation function into the RVQ-GAN model. This method allows the allocation of time frames at varying time scales, depending on the complexity of the audio. For more complex audio, a greater number of time frames are allocated, while fewer time frames are assigned to simpler segments. Additionally, we propose both symmetric and asymmetric decoding methods. Asymmetric decoding is easier to implement and integrates seamlessly into the system, while symmetric decoding delivers superior audio quality at lower bitrates. Subjective and objective experiments demonstrate that, compared to EnCodec, both of our decoding methods deliver excellent audio quality at lower bitrates across various speech and singing datasets, with only a slight increase in computational cost. In comparison to the VRVQ method, we achieve comparable audio quality at even lower bitrates, while requiring less computational cost.
最近的语音量化压缩模型都采用了残差矢量量化(RVQ)方法。然而,这些模型通常使用固定的比特率,在所有语音段中以恒定的比例分配相同数量的时间帧。这种方法可能导致比特率效率低下,特别是当音频包含更简单的片段时。为了解决这一限制,我们引入了一种多尺度可变比特率方法,将相对重要性图、自适应阈值掩码和梯度估计函数合并到RVQ-GAN模型中。这种方法允许在不同的时间尺度上分配时间框架,这取决于音频的复杂性。对于更复杂的音频,更多的时间框架被分配,而更少的时间框架被分配给更简单的片段。此外,我们提出了对称和非对称解码方法。非对称解码更容易实现并无缝集成到系统中,而对称解码以更低的比特率提供更高的音频质量。主观和客观实验表明,与EnCodec相比,我们的两种解码方法都能在各种语音和歌唱数据集上以更低的比特率提供出色的音频质量,而计算成本仅略有增加。与VRVQ方法相比,我们以更低的比特率获得了相当的音频质量,同时需要更少的计算成本。
{"title":"MS-VBRVQ: Multi-scale variable bitrate speech residual vector quantization","authors":"Yukun Qian,&nbsp;Shiyun Xu,&nbsp;Xuyi Zhuang,&nbsp;Zehua Zhang,&nbsp;Mingjiang Wang","doi":"10.1016/j.specom.2025.103346","DOIUrl":"10.1016/j.specom.2025.103346","url":null,"abstract":"<div><div>Recent speech quantization compression models have adopted residual vector quantization (RVQ) methods. However, these models typically use fixed bitrates, allocating the same number of time frames at a constant scale across all speech segments. This approach may lead to bitrate inefficiency, particularly when the audio contains simpler segments. To address this limitation, we introduce a multi-scale variable bitrate approach by incorporating a relative importance map, adaptive threshold masks, and a gradient estimation function into the RVQ-GAN model. This method allows the allocation of time frames at varying time scales, depending on the complexity of the audio. For more complex audio, a greater number of time frames are allocated, while fewer time frames are assigned to simpler segments. Additionally, we propose both symmetric and asymmetric decoding methods. Asymmetric decoding is easier to implement and integrates seamlessly into the system, while symmetric decoding delivers superior audio quality at lower bitrates. Subjective and objective experiments demonstrate that, compared to EnCodec, both of our decoding methods deliver excellent audio quality at lower bitrates across various speech and singing datasets, with only a slight increase in computational cost. In comparison to the VRVQ method, we achieve comparable audio quality at even lower bitrates, while requiring less computational cost.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"177 ","pages":"Article 103346"},"PeriodicalIF":3.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145814282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hand gesture realisation of contrastive focus in real-time whisper-to-speech synthesis: Investigating the transfer from implicit to explicit control of intonation 实时耳语-语音合成中对比焦点的手势实现:研究语调从隐式到显式控制的转移
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-12-15 DOI: 10.1016/j.specom.2025.103344
Delphine Charuau , Nathalie Henrich Bernardoni , Silvain Gerber , Olivier Perrotin
The ability of speakers to externalise the control of their intonation in the context of voice substitution communication is evaluated in terms of the realisation of a contrastive focus in French. A whisper-to-speech synthesiser is used with gestural interfaces for intonation control, enabling two types of gesture: an isometric finger pressure and an isotonic wrist movement. An original experimental paradigm is designed to elicit a contrastive focus on the
syllables of nine-syllable sentences by means of a read-question-answer scenario. For all 16 participants, focus was successfully achieved in speech and in both modality transfer situations by increasing the fundamental frequency and duration of the target syllable. Coordination of the articulation of the whispered syllables and the manual intonational control was acquired quickly and easily. Focus realisation by finger pressure or wrist movement showed very similar dynamics in intonation and duration. Overall, although wrist movement was preferred in terms of ease of control, both interfaces were judged to be equal in terms of learning, performance, emotional experience, and cognitive load.
在语音替代交际的语境中,说话者外化语调控制的能力通过对比焦点在法语中的实现来评估。耳语-语音合成器与手势接口一起用于语调控制,支持两种类型的手势:等距手指压力和等张力手腕运动。设计了一个独创的实验范式,通过阅读-提问-回答的方式,引起对九音节句子音节的对比关注。对于所有16名参与者来说,通过增加目标音节的基本频率和持续时间,在语音和两种情态转移情况下都成功地实现了焦点。低语速音节的发音与人工语调控制的协调是快速而容易地获得的。通过手指按压或手腕运动实现的焦点在语调和持续时间上表现出非常相似的动态。总体而言,尽管手腕运动在易于控制方面更受青睐,但在学习、表现、情绪体验和认知负荷方面,两种界面被认为是平等的。
{"title":"Hand gesture realisation of contrastive focus in real-time whisper-to-speech synthesis: Investigating the transfer from implicit to explicit control of intonation","authors":"Delphine Charuau ,&nbsp;Nathalie Henrich Bernardoni ,&nbsp;Silvain Gerber ,&nbsp;Olivier Perrotin","doi":"10.1016/j.specom.2025.103344","DOIUrl":"10.1016/j.specom.2025.103344","url":null,"abstract":"<div><div>The ability of speakers to externalise the control of their intonation in the context of voice substitution communication is evaluated in terms of the realisation of a contrastive focus in French. A whisper-to-speech synthesiser is used with gestural interfaces for intonation control, enabling two types of gesture: an isometric finger pressure and an isotonic wrist movement. An original experimental paradigm is designed to elicit a contrastive focus on the <figure><img></figure> syllables of nine-syllable sentences by means of a read-question-answer scenario. For all 16 participants, focus was successfully achieved in speech and in both modality transfer situations by increasing the fundamental frequency and duration of the target syllable. Coordination of the articulation of the whispered syllables and the manual intonational control was acquired quickly and easily. Focus realisation by finger pressure or wrist movement showed very similar dynamics in intonation and duration. Overall, although wrist movement was preferred in terms of ease of control, both interfaces were judged to be equal in terms of learning, performance, emotional experience, and cognitive load.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"177 ","pages":"Article 103344"},"PeriodicalIF":3.0,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1