首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Preserving speaker information in direct Speech-to-Speech Translation with non-autoregressive generation and pre-training 基于非自回归生成和预训练的直接语音翻译中说话人信息的保存
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-06 DOI: 10.1016/j.csl.2025.101902
Rui Zhou, Akinori Ito, Takashi Nose
Speech-to-Speech Translation (S2ST) refers to the conversion of speech in one language into semantically equivalent speech in another language, facilitating communication between speakers of different languages. Speech-to-Discrete Unit Translation (S2UT), a mainstream approach for end-to-end S2ST, addresses challenges such as error propagation across modules and slow inference speed often encountered in traditional cascade systems. However, as discrete units primarily capture content information, conventional S2UT methods fail to retain speaker-specific characteristics from the source.
Our previous work, Speaker Consistent S2UT (SC-S2UT), introduced a speaker adapter and a unit-to-mel structure, enabling the preservation of speaker information and non-autoregressive speech generation. Based on this foundation, this study proposes a self-supervised pre-training method to enrich the information extracted by both the speaker adapter and the unit-to-mel structure. Additionally, we investigate different feature fusion strategies to further improve the integration of speaker and content features.
Experiments conducted in the CVSS-T dataset for ES–EN, FR–EN and DE–EN tasks demonstrate that our proposed method achieves a BLEU score improvement of 1.14 compared to SC-S2UT, along with significant improvements in UTMOS and speaker similarity. Furthermore, our approach achieves translation quality comparable to traditional S2UT, with a minimal increase of 0.04s per utterance in inference time, while maintaining high speaker similarity. These results validate the effectiveness of the proposed method.
语音到语音翻译(speech -to- speech Translation,简称S2ST)是指将一种语言中的语音转换成另一种语言中语义相当的语音,从而促进不同语言使用者之间的交流。语音到离散单元转换(S2UT)是端到端S2ST的主流方法,它解决了传统级联系统中经常遇到的跨模块错误传播和慢推理速度等挑战。然而,由于离散单元主要捕获内容信息,传统的S2UT方法无法从源中保留讲话者特定的特征。我们之前的工作,说话人一致S2UT (SC-S2UT),引入了一个说话人适配器和一个单元到单元的结构,实现了说话人信息的保存和非自回归语音生成。在此基础上,本研究提出了一种自监督预训练方法,以丰富说话人适配器和单元到单元结构提取的信息。此外,我们研究了不同的特征融合策略,以进一步提高说话人和内容特征的集成。在ES-EN、FR-EN和DE-EN任务的CVSS-T数据集中进行的实验表明,与SC-S2UT相比,我们提出的方法实现了1.14的BLEU分数提高,同时在UTMOS和说话人相似度方面也有显著改善。此外,我们的方法达到了与传统的S2UT相当的翻译质量,每句话的推理时间只增加了0.04秒,同时保持了很高的说话人相似度。这些结果验证了所提方法的有效性。
{"title":"Preserving speaker information in direct Speech-to-Speech Translation with non-autoregressive generation and pre-training","authors":"Rui Zhou,&nbsp;Akinori Ito,&nbsp;Takashi Nose","doi":"10.1016/j.csl.2025.101902","DOIUrl":"10.1016/j.csl.2025.101902","url":null,"abstract":"<div><div>Speech-to-Speech Translation (S2ST) refers to the conversion of speech in one language into semantically equivalent speech in another language, facilitating communication between speakers of different languages. Speech-to-Discrete Unit Translation (S2UT), a mainstream approach for end-to-end S2ST, addresses challenges such as error propagation across modules and slow inference speed often encountered in traditional cascade systems. However, as discrete units primarily capture content information, conventional S2UT methods fail to retain speaker-specific characteristics from the source.</div><div>Our previous work, Speaker Consistent S2UT (SC-S2UT), introduced a speaker adapter and a unit-to-mel structure, enabling the preservation of speaker information and non-autoregressive speech generation. Based on this foundation, this study proposes a self-supervised pre-training method to enrich the information extracted by both the speaker adapter and the unit-to-mel structure. Additionally, we investigate different feature fusion strategies to further improve the integration of speaker and content features.</div><div>Experiments conducted in the CVSS-T dataset for ES–EN, FR–EN and DE–EN tasks demonstrate that our proposed method achieves a BLEU score improvement of 1.14 compared to SC-S2UT, along with significant improvements in UTMOS and speaker similarity. Furthermore, our approach achieves translation quality comparable to traditional S2UT, with a minimal increase of 0.04s per utterance in inference time, while maintaining high speaker similarity. These results validate the effectiveness of the proposed method.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101902"},"PeriodicalIF":3.4,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature extraction from speech signals using empirical mode decomposition for depression detection: A comparative study with machine learning models 基于经验模态分解的语音信号特征提取:与机器学习模型的比较研究
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-05 DOI: 10.1016/j.csl.2025.101898
Xavier Sánchez Corrales , Jordi Solé-Casals
Depression is a prevalent mental disorder that affects quality of life, and its early detection through voice analysis could improve diagnosis. This study investigates the effectiveness of Intrinsic Mode Functions in differentiating depression through voice signals. We used data from the Distress Analysis Interview Corpus. Empirical Mode Decomposition was applied to extract IMFs, and statistical characteristics and similarities were analysed using a Gaussian kernel between depression and healthy groups according to sex. The results revealed significant differences in the mean of the first IMFs in men, but not in women, while there were no differences in other statistics. Gaussian kernel analysis showed variations in the probability density function in the first Intrinsic Mode Functions (up to IMF 7), with differences according to sex. Six Machine Learning models were experimentally tested, trained, and adjusted. The best accuracy results in women were obtained with Gradient Boosting (94.1%), while in men they were obtained with Gradient Boosting (88.0%). Intrinsic Mode Functions proved to be useful for detecting depression, suggesting their potential in developing non-invasive tools for early detection of this.
抑郁症是一种影响生活质量的普遍精神障碍,通过语音分析进行早期发现可以提高诊断水平。本研究探讨了内在模式函数通过语音信号识别抑郁症的有效性。我们使用来自窘迫分析访谈语料库的数据。采用经验模态分解方法提取imf,并按性别采用高斯核分析抑郁组与健康组的统计特征和相似性。结果显示,第一次IMFs的平均值在男性中有显著差异,但在女性中没有,而在其他统计数据中没有差异。高斯核分析表明,在第一个本征模态函数的概率密度函数(高达IMF 7)中存在变化,且性别不同。实验测试、训练和调整了六个机器学习模型。在女性中,梯度增强的准确率最高(94.1%),而在男性中,梯度增强的准确率最高(88.0%)。事实证明,内在模式函数对检测抑郁症是有用的,这表明它们在开发非侵入性工具以早期检测抑郁症方面具有潜力。
{"title":"Feature extraction from speech signals using empirical mode decomposition for depression detection: A comparative study with machine learning models","authors":"Xavier Sánchez Corrales ,&nbsp;Jordi Solé-Casals","doi":"10.1016/j.csl.2025.101898","DOIUrl":"10.1016/j.csl.2025.101898","url":null,"abstract":"<div><div>Depression is a prevalent mental disorder that affects quality of life, and its early detection through voice analysis could improve diagnosis. This study investigates the effectiveness of Intrinsic Mode Functions in differentiating depression through voice signals. We used data from the Distress Analysis Interview Corpus. Empirical Mode Decomposition was applied to extract IMFs, and statistical characteristics and similarities were analysed using a Gaussian kernel between depression and healthy groups according to sex. The results revealed significant differences in the mean of the first IMFs in men, but not in women, while there were no differences in other statistics. Gaussian kernel analysis showed variations in the probability density function in the first Intrinsic Mode Functions (up to IMF 7), with differences according to sex. Six Machine Learning models were experimentally tested, trained, and adjusted. The best accuracy results in women were obtained with Gradient Boosting (94.1%), while in men they were obtained with Gradient Boosting (88.0%). Intrinsic Mode Functions proved to be useful for detecting depression, suggesting their potential in developing non-invasive tools for early detection of this.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101898"},"PeriodicalIF":3.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges 远程会话语音识别的最新趋势:对CHiME-7和8 DASR挑战的回顾
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-01 DOI: 10.1016/j.csl.2025.101901
Samuele Cornell , Christoph Boeddeker , Taejin Park , He Huang , Desh Raj , Matthew Wiesner , Yoshiki Masuyama , Xuankai Chang , Zhong-Qiu Wang , Stefano Squartini , Paola Garcia , Shinji Watanabe
The CHiME-7 and 8 distant speech recognition (DASR) challenges focus on multi-channel, generalizable, joint automatic speech recognition (ASR) and diarization of conversational speech. With participation from 9 teams submitting 32 diverse systems, these challenges have contributed to state-of-the-art research in the field. This paper outlines the challenges’ design, evaluation metrics, datasets, and baseline systems while analyzing key trends from participant submissions. From this analysis it emerges that: (1) Most participants use end-to-end (e2e) ASR systems, whereas hybrid systems were prevalent in previous CHiME challenges. This transition is mainly due to the availability of robust large-scale pre-trained models, which lowers the data burden for e2e-ASR. (2) Despite recent advances in neural speech separation and enhancement (SSE), all teams still heavily rely on guided source separation, suggesting that current neural SSE techniques are still unable to reliably deal with complex scenarios and different recording setups. (3) All best systems employ diarization refinement via target-speaker diarization techniques. Accurate speaker counting in the first diarization pass is thus crucial to avoid compounding errors and CHiME-8 DASR participants especially focused on this part. (4) Downstream evaluation via meeting summarization can correlate weakly with transcription quality due to the remarkable effectiveness of large-language models in handling errors. On the NOTSOFAR-1 scenario, even systems with over 50% time-constrained minimum permutation WER can perform roughly on par with the most effective ones (around 11%). (5) Despite recent progress, accurately transcribing spontaneous speech in challenging acoustic environments remains difficult, even when using computationally intensive system ensembles.
CHiME-7和chime - 8远程语音识别(DASR)挑战的重点是多通道、通用、联合自动语音识别(ASR)和会话语音的拨号化。共有9个团队提交了32个不同的系统,这些挑战为该领域的最新研究做出了贡献。本文概述了挑战的设计、评估指标、数据集和基线系统,同时分析了参与者提交的主要趋势。从这个分析中可以看出:(1)大多数参与者使用端到端(e2e) ASR系统,而混合系统在以前的CHiME挑战赛中很普遍。这种转变主要是由于强大的大规模预训练模型的可用性,这降低了e2e-ASR的数据负担。(2)尽管近年来在神经语音分离和增强(SSE)方面取得了进展,但所有团队仍然严重依赖引导源分离,这表明当前的神经语音分离技术仍然无法可靠地处理复杂的场景和不同的记录设置。(3)所有最佳系统均采用目标-说话人dialarization技术进行diarization细化。因此,在第一次拨号过程中准确的说话人计数对于避免复合错误至关重要,而CHiME-8 DASR参与者尤其关注这一部分。(4)由于大语言模型在处理错误方面的显著有效性,通过会议摘要进行的下游评价与转录质量的相关性较弱。在NOTSOFAR-1场景中,即使系统具有超过50%的时间约束最小排列WER,其性能也可以与最有效的系统(约11%)大致相当。(5)尽管最近取得了进展,但即使使用计算密集型系统集成,在具有挑战性的声学环境中准确转录自发语音仍然很困难。
{"title":"Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges","authors":"Samuele Cornell ,&nbsp;Christoph Boeddeker ,&nbsp;Taejin Park ,&nbsp;He Huang ,&nbsp;Desh Raj ,&nbsp;Matthew Wiesner ,&nbsp;Yoshiki Masuyama ,&nbsp;Xuankai Chang ,&nbsp;Zhong-Qiu Wang ,&nbsp;Stefano Squartini ,&nbsp;Paola Garcia ,&nbsp;Shinji Watanabe","doi":"10.1016/j.csl.2025.101901","DOIUrl":"10.1016/j.csl.2025.101901","url":null,"abstract":"<div><div>The CHiME-7 and 8 distant speech recognition (DASR) challenges focus on multi-channel, generalizable, joint automatic speech recognition (ASR) and diarization of conversational speech. With participation from 9 teams submitting 32 diverse systems, these challenges have contributed to state-of-the-art research in the field. This paper outlines the challenges’ design, evaluation metrics, datasets, and baseline systems while analyzing key trends from participant submissions. From this analysis it emerges that: (1) Most participants use end-to-end (e2e) ASR systems, whereas hybrid systems were prevalent in previous CHiME challenges. This transition is mainly due to the availability of robust large-scale pre-trained models, which lowers the data burden for e2e-ASR. (2) Despite recent advances in neural speech separation and enhancement (SSE), all teams still heavily rely on guided source separation, suggesting that current neural SSE techniques are still unable to reliably deal with complex scenarios and different recording setups. (3) All best systems employ diarization refinement via target-speaker diarization techniques. Accurate speaker counting in the first diarization pass is thus crucial to avoid compounding errors and CHiME-8 DASR participants especially focused on this part. (4) Downstream evaluation via meeting summarization can correlate weakly with transcription quality due to the remarkable effectiveness of large-language models in handling errors. On the NOTSOFAR-1 scenario, even systems with over 50% time-constrained minimum permutation WER can perform roughly on par with the most effective ones (around 11%). (5) Despite recent progress, accurately transcribing spontaneous speech in challenging acoustic environments remains difficult, even when using computationally intensive system ensembles.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101901"},"PeriodicalIF":3.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition 压缩、对齐和转移:一种将预先训练好的语言模型知识转移到基于ctc的语音识别中的新方法
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-21 DOI: 10.1016/j.csl.2025.101900
Jieun Choi , Dohee Kim , Joon-Hyuk Chang
Connectionist temporal classification (CTC) model is a leading approach for end-to-end (E2E) automatic speech recognition (ASR), known for its simplicity and fast speed, enabled by non-autoregressive decoding and conditional independence assumptions. However, they often struggle to model token sequence relationships accurately due to its underlying assumptions, leading to lower recognition performance compared to attention-based encoder–decoder (AED) and transducer. This issue becomes particularly pronounced when the training data is limited or model size is small, leading to frequent spelling errors and reduced overall accuracy. In this study, we propose a new distillation approach named “Compress, Align, and Transfer” (COMAT) aimed at enhancing CTC-based ASR systems. COMAT addresses these challenges by integrating knowledge from pre-trained language models (PLMs) into CTC-based ASR systems. Our method involves a compressing module that adjusts speech embeddings to condense with the length of PLM embeddings, enabling a more effective and direct knowledge transfer and a monotonic alignment search (MAS) to align for two different embeddings. COMAT not only preserves the rapid decoding benefits of CTC-based models but also significantly enhances their ability to model complex tokens by linking the CTC-based models and the linguistic depth of PLMs.
连接主义时间分类(CTC)模型是端到端(E2E)自动语音识别(ASR)的一种主要方法,以其简单和快速而着称,通过非自回归解码和条件独立性假设实现。然而,由于其潜在的假设,他们经常难以准确地建模标记序列关系,导致与基于注意力的编码器-解码器(AED)和换能器相比,识别性能较低。当训练数据有限或模型规模很小时,这个问题变得特别明显,这会导致频繁的拼写错误和降低整体准确性。在本研究中,我们提出了一种名为“压缩、对齐和转移”(COMAT)的新蒸馏方法,旨在增强基于ctc的ASR系统。COMAT通过将预训练语言模型(plm)中的知识集成到基于ctc的ASR系统中来解决这些挑战。我们的方法包括一个压缩模块,该模块可以调整语音嵌入以压缩PLM嵌入的长度,从而实现更有效和直接的知识转移,并使用单调对齐搜索(MAS)来对齐两个不同的嵌入。COMAT不仅保留了基于ctc的模型的快速解码优势,而且通过将基于ctc的模型与plm的语言深度联系起来,显著增强了基于ctc的模型对复杂令牌的建模能力。
{"title":"Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition","authors":"Jieun Choi ,&nbsp;Dohee Kim ,&nbsp;Joon-Hyuk Chang","doi":"10.1016/j.csl.2025.101900","DOIUrl":"10.1016/j.csl.2025.101900","url":null,"abstract":"<div><div>Connectionist temporal classification (CTC) model is a leading approach for end-to-end (E2E) automatic speech recognition (ASR), known for its simplicity and fast speed, enabled by non-autoregressive decoding and conditional independence assumptions. However, they often struggle to model token sequence relationships accurately due to its underlying assumptions, leading to lower recognition performance compared to attention-based encoder–decoder (AED) and transducer. This issue becomes particularly pronounced when the training data is limited or model size is small, leading to frequent spelling errors and reduced overall accuracy. In this study, we propose a new distillation approach named “Compress, Align, and Transfer” (COMAT) aimed at enhancing CTC-based ASR systems. COMAT addresses these challenges by integrating knowledge from pre-trained language models (PLMs) into CTC-based ASR systems. Our method involves a compressing module that adjusts speech embeddings to condense with the length of PLM embeddings, enabling a more effective and direct knowledge transfer and a monotonic alignment search (MAS) to align for two different embeddings. COMAT not only preserves the rapid decoding benefits of CTC-based models but also significantly enhances their ability to model complex tokens by linking the CTC-based models and the linguistic depth of PLMs.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101900"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145374638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IberBench: LLM evaluation on Iberian languages IberBench:法学硕士对伊比利亚语言的评价
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-21 DOI: 10.1016/j.csl.2025.101899
José Ángel González , Ian Borrego Obrador , Álvaro Romo Herrero , Areg Mikael Sarvazyan , Mara Chinea-Ríos , Angelo Basile , Marc Franco-Salvador
Despite their remarkable success, Large Language Models (LLMs) remain difficult to evaluate comprehensively, particularly for languages other than English, where high-quality data is often limited. Existing benchmarks and leaderboards are predominantly English-centric, with only a few addressing other languages. These benchmarks fall short in several key areas: they overlook the diversity of language varieties, prioritize fundamental Natural Language Processing (NLP) capabilities over tasks of industrial relevance, and are static. With these aspects in mind, we present IberBench, a comprehensive and extensible benchmark designed to assess LLM performance on both fundamental and industry-relevant NLP tasks, in languages spoken across the Iberian Peninsula and Ibero-America, including Spanish, Portuguese, Catalan, Basque, Galician, and English, along with Spanish varieties like Mexican, Uruguayan, Peruvian, Costa Rican, and Cuban. IberBench integrates 101 datasets from evaluation campaigns and recent benchmarks, covering 22 task categories such as sentiment and emotion analysis, toxicity detection, and summarization. The benchmark addresses key limitations in current evaluation practices, such as the lack of linguistic diversity and static evaluation setups by enabling continual updates and community-driven model and dataset submissions moderated by a committee of experts. We evaluate 23 LLMs ranging from 100 million to 14 billion parameters and provide empirical insights into their strengths and limitations. Our findings indicate that (i) LLMs perform worse on industry-relevant tasks than in fundamental ones, (ii) performance is on average lower for Galician and Basque, (iii) some tasks show results close to random, and (iv) in other tasks LLMs perform above random but below shared task systems. IberBench offers open-source implementations for the entire evaluation pipeline, including dataset normalization and hosting, incremental evaluation of LLMs, and a publicly accessible leaderboard. 3
尽管取得了显著的成功,但大型语言模型(llm)仍然难以全面评估,特别是对于英语以外的语言,高质量的数据通常是有限的。现有的基准和排行榜主要以英语为中心,只有少数涉及其他语言。这些基准在几个关键领域存在不足:它们忽略了语言品种的多样性,优先考虑基本的自然语言处理(NLP)能力而不是工业相关的任务,并且是静态的。考虑到这些方面,我们提出了IberBench,这是一个全面和可扩展的基准,旨在评估法学硕士在基础和行业相关的NLP任务上的表现,使用伊比利亚半岛和伊比利亚美洲的语言,包括西班牙语,葡萄牙语,加泰罗尼亚语,巴斯克语,加利西亚语和英语,以及墨西哥语,乌拉圭语,秘鲁语,哥斯达黎加语和古巴语等西班牙语品种。IberBench集成了101个来自评估活动和最近基准测试的数据集,涵盖了22个任务类别,如情绪和情绪分析、毒性检测和总结。该基准解决了当前评估实践中的关键限制,例如缺乏语言多样性和静态评估设置,通过支持持续更新和由专家委员会主持的社区驱动的模型和数据集提交。我们评估了23个llm,范围从1亿个到140亿个参数,并提供了经验见解,以了解它们的优势和局限性。我们的研究结果表明(i) llm在行业相关任务上的表现比基础任务差,(ii)加利西亚和巴斯克的表现平均较低,(iii)一些任务显示接近随机的结果,以及(iv)在其他任务中llm的表现高于随机但低于共享任务系统。IberBench为整个评估管道提供开源实现,包括数据集规范化和托管,法学硕士的增量评估,以及公开访问的排行榜。3
{"title":"IberBench: LLM evaluation on Iberian languages","authors":"José Ángel González ,&nbsp;Ian Borrego Obrador ,&nbsp;Álvaro Romo Herrero ,&nbsp;Areg Mikael Sarvazyan ,&nbsp;Mara Chinea-Ríos ,&nbsp;Angelo Basile ,&nbsp;Marc Franco-Salvador","doi":"10.1016/j.csl.2025.101899","DOIUrl":"10.1016/j.csl.2025.101899","url":null,"abstract":"<div><div>Despite their remarkable success, Large Language Models (LLMs) remain difficult to evaluate comprehensively, particularly for languages other than English, where high-quality data is often limited. Existing benchmarks and leaderboards are predominantly English-centric, with only a few addressing other languages. These benchmarks fall short in several key areas: they overlook the diversity of language varieties, prioritize fundamental Natural Language Processing (NLP) capabilities over tasks of industrial relevance, and are static. With these aspects in mind, we present IberBench, a comprehensive and extensible benchmark designed to assess LLM performance on both fundamental and industry-relevant NLP tasks, in languages spoken across the Iberian Peninsula and Ibero-America, including Spanish, Portuguese, Catalan, Basque, Galician, and English, along with Spanish varieties like Mexican, Uruguayan, Peruvian, Costa Rican, and Cuban. IberBench integrates 101 datasets from evaluation campaigns and recent benchmarks, covering 22 task categories such as sentiment and emotion analysis, toxicity detection, and summarization. The benchmark addresses key limitations in current evaluation practices, such as the lack of linguistic diversity and static evaluation setups by enabling continual updates and community-driven model and dataset submissions moderated by a committee of experts. We evaluate 23 LLMs ranging from 100 million to 14 billion parameters and provide empirical insights into their strengths and limitations. Our findings indicate that (i) LLMs perform worse on industry-relevant tasks than in fundamental ones, (ii) performance is on average lower for Galician and Basque, (iii) some tasks show results close to random, and (iv) in other tasks LLMs perform above random but below shared task systems. IberBench offers open-source implementations for the entire evaluation pipeline, including dataset normalization and hosting, incremental evaluation of LLMs, and a publicly accessible leaderboard. <span><span><sup>3</sup></span></span></div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101899"},"PeriodicalIF":3.4,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The use of variable length stimuli for assessing segmental distortion in TTS evaluation 在TTS评估中使用可变长度刺激来评估节段扭曲
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-20 DOI: 10.1016/j.csl.2025.101894
Ayushi Pandey , Jens Edlund , Sébastien Le Maguer , Naomi Harte
This paper presents the use of variable length stimuli for assessing segmental distortion in Text-to-Speech synthesizers. The design is based on the well-established principle of stimulus accumulation phenomenon in psychophysics. The length of the stimuli is varied logarithmically, in accordance with the Weber–Fechner law. User opinion is collected in a binary, two-choice format, suspending the vagueness of the term “naturalness”. The participants’ responses are captured using a 2-alternative forced choice task. The study found that while the length of the stimuli did not reliably affect participants’ accuracy in the task, the concentration of voiceless obstruents did have a significant effect. Participants were consistently more accurate in identifying WaveNet stimuli as machine-made when the phrases were obstruent-rich. These findings show that the deviation in obstruents reported in WaveNet voices is perceivable by human listeners. The design of the subjective listening test shows similar trends to Mean-Opinion-Score evaluation, suggesting that the design may be of utility to the wider community of Text-to-Speech evaluation.
本文介绍了使用可变长度刺激来评估文本到语音合成器中的片段失真。该设计基于心理物理学中刺激积累现象的既定原理。刺激的长度按照韦伯-费希纳定律呈对数变化。用户意见以二选一的形式收集,暂停了术语“自然”的模糊性。参与者的反应是通过两个选项的强制选择任务来捕捉的。研究发现,虽然刺激的长度并不一定会影响参与者在任务中的准确性,但无声障碍的集中确实有显著的影响。当短语充满障碍物时,参与者始终更准确地识别出WaveNet刺激是机器制造的。这些发现表明,在WaveNet的声音中报告的障碍偏差是可以被人类听众感知的。主观听力测试的设计显示出与Mean-Opinion-Score评估相似的趋势,表明该设计可能对更广泛的文本到语音评估社区有用。
{"title":"The use of variable length stimuli for assessing segmental distortion in TTS evaluation","authors":"Ayushi Pandey ,&nbsp;Jens Edlund ,&nbsp;Sébastien Le Maguer ,&nbsp;Naomi Harte","doi":"10.1016/j.csl.2025.101894","DOIUrl":"10.1016/j.csl.2025.101894","url":null,"abstract":"<div><div>This paper presents the use of variable length stimuli for assessing segmental distortion in Text-to-Speech synthesizers. The design is based on the well-established principle of stimulus accumulation phenomenon in psychophysics. The length of the stimuli is varied logarithmically, in accordance with the Weber–Fechner law. User opinion is collected in a binary, two-choice format, suspending the vagueness of the term “naturalness”. The participants’ responses are captured using a 2-alternative forced choice task. The study found that while the length of the stimuli did not reliably affect participants’ accuracy in the task, the concentration of voiceless obstruents did have a significant effect. Participants were consistently more accurate in identifying WaveNet stimuli as machine-made when the phrases were obstruent-rich. These findings show that the deviation in obstruents reported in WaveNet voices is perceivable by human listeners. The design of the subjective listening test shows similar trends to Mean-Opinion-Score evaluation, suggesting that the design may be of utility to the wider community of Text-to-Speech evaluation.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101894"},"PeriodicalIF":3.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145374637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFEC: A knowledge graph capturing social intelligence in casual conversations AFEC:一个知识图谱,捕捉随意对话中的社会智能
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-16 DOI: 10.1016/j.csl.2025.101897
Yubo Xie , Junze Li , Fahui Miao , Pearl Pu
This paper presents AFEC, an automatically curated knowledge graph derived from everyday casual conversations. The graph’s encoded knowledge enhances conversational systems by modeling how people express acknowledgment, consolation, and empathy in social interactions. To construct a comprehensive and meaningful dataset, we curated a large-scale corpus from the r/CasualConversation subreddit. By extracting the first two turns of all conversations, we obtained 134K speaker nodes and 666K listener nodes. To illustrate the utility of AFEC, we developed a retrieval-based chatbot and compared its performance with existing empathetic dialogue models. Experimental results demonstrate that our chatbot generates significantly more diverse responses (achieving at least 15% higher diversity scores in human evaluations) while outperforming two of the four baseline models in terms of response quality. The data and code are publicly available at https://github.com/yuboxie/afec.
本文介绍了AFEC,一个自动策划的知识图谱,来源于日常的随意对话。该图表的编码知识通过模拟人们在社交互动中如何表达承认、安慰和同情来增强对话系统。为了构建一个全面而有意义的数据集,我们从reddit的r/CasualConversation子版块中策划了一个大规模的语料库。通过提取所有对话的前两轮,我们获得了134K的说话者节点和666K的倾听者节点。为了说明AFEC的效用,我们开发了一个基于检索的聊天机器人,并将其性能与现有的共情对话模型进行了比较。实验结果表明,我们的聊天机器人产生了更多样化的响应(在人类评估中达到至少15%的多样性分数),同时在响应质量方面优于四个基线模型中的两个。数据和代码可在https://github.com/yuboxie/afec上公开获取。
{"title":"AFEC: A knowledge graph capturing social intelligence in casual conversations","authors":"Yubo Xie ,&nbsp;Junze Li ,&nbsp;Fahui Miao ,&nbsp;Pearl Pu","doi":"10.1016/j.csl.2025.101897","DOIUrl":"10.1016/j.csl.2025.101897","url":null,"abstract":"<div><div>This paper presents AFEC, an automatically curated knowledge graph derived from everyday casual conversations. The graph’s encoded knowledge enhances conversational systems by modeling how people express acknowledgment, consolation, and empathy in social interactions. To construct a comprehensive and meaningful dataset, we curated a large-scale corpus from the <span>r/CasualConversation</span> subreddit. By extracting the first two turns of all conversations, we obtained 134K speaker nodes and 666K listener nodes. To illustrate the utility of AFEC, we developed a retrieval-based chatbot and compared its performance with existing empathetic dialogue models. Experimental results demonstrate that our chatbot generates significantly more diverse responses (achieving at least 15% higher diversity scores in human evaluations) while outperforming two of the four baseline models in terms of response quality. The data and code are publicly available at <span><span>https://github.com/yuboxie/afec</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101897"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Persona Generation through commonsense inference in dialogues 通过对话中的常识性推理生成动态角色
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-15 DOI: 10.1016/j.csl.2025.101896
Honghee Lee, Youngjoong Ko
Generating consistent responses reflecting personal information is crucial for building human-like dialogue systems. However, prior studies have primarily focused on utilizing predefined personas, overlooking scenarios where new facts emerge during a conversation and require the personas to be expanded in real-world applications. To tackle this issue, we propose a novel framework, Dynamic Persona Generation and Selection (DPGS), which automatically generates a speaker’s personas by considering previous utterances and selects those containing new information. DPGS reformulates the process of persona generation from previous utterances as a task for performing contextualized commonsense reasoning based on these utterances. Experiments on PersonaChat demonstrate the effectiveness of our framework in both persona generation and response generation.
生成反映个人信息的一致响应对于构建类人对话系统至关重要。然而,先前的研究主要集中在利用预定义的人物角色,忽略了在对话中出现新事实的场景,并要求人物角色在现实世界的应用中扩展。为了解决这个问题,我们提出了一个新的框架——动态人物角色生成和选择(DPGS),该框架通过考虑说话者之前的话语并选择包含新信息的话语来自动生成说话者的人物角色。DPGS将从先前的话语中生成角色的过程重新制定为基于这些话语执行情境化常识推理的任务。在PersonaChat上的实验证明了我们的框架在人物角色生成和响应生成方面的有效性。
{"title":"Dynamic Persona Generation through commonsense inference in dialogues","authors":"Honghee Lee,&nbsp;Youngjoong Ko","doi":"10.1016/j.csl.2025.101896","DOIUrl":"10.1016/j.csl.2025.101896","url":null,"abstract":"<div><div>Generating consistent responses reflecting personal information is crucial for building human-like dialogue systems. However, prior studies have primarily focused on utilizing predefined personas, overlooking scenarios where new facts emerge during a conversation and require the personas to be expanded in real-world applications. To tackle this issue, we propose a novel framework, Dynamic Persona Generation and Selection (DPGS), which automatically generates a speaker’s personas by considering previous utterances and selects those containing new information. DPGS reformulates the process of persona generation from previous utterances as a task for performing contextualized commonsense reasoning based on these utterances. Experiments on PersonaChat demonstrate the effectiveness of our framework in both persona generation and response generation.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101896"},"PeriodicalIF":3.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145332585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing DiffATSM:使用基于扩散的后处理的高质量自适应时间尺度修改
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-15 DOI: 10.1016/j.csl.2025.101895
Sohee Jang , Yeon-Ju Kim , Joon-Hyuk Chang
The advent of adaptive time-scale modification (ATSM) has marked a significant evolution in audio processing, applying adaptive speaking rates that surpass the performance of conventional time-scale modification (TSM) systems employing a fixed speaking rate. However, ATSM requires audio transcriptions and additional phoneme localization modules, which limit its applicability when such resources are unavailable. Furthermore, traditional signal processing approaches in the time domain often degrade audio quality due to artifacts resulting from phase mismatches. To overcome these limitations, we propose DiffATSM, a novel deep learning-based TSM framework that directly generates time-scaled speech from raw waveforms without requiring transcription. DiffATSM comprises two main components: an adaptive neural generator and a post-processing network using a diffusion probabilistic model. The adaptive neural generator modulates the temporal scale of the mel spectrogram by conditioning on phonetic posteriorgrams (PPG), which are extracted from a self-supervised speech model. These PPG features serve as auxiliary information to preserve phonetic structure during time scaling. The generated spectrogram is further refined by the diffusion-based post-processing network, which enhances fidelity by modeling complex speech distributions. Our experimental results demonstrate that DiffATSM significantly outperforms existing TSM algorithms, including ATSM, in subjective and objective evaluations.
自适应时间尺度修改(ATSM)的出现标志着音频处理的重大演变,它应用的自适应语音速率超过了采用固定语音速率的传统时间尺度修改(TSM)系统的性能。然而,ATSM需要音频转录和额外的音素定位模块,这限制了它在这些资源不可用时的适用性。此外,传统的时域信号处理方法往往由于相位不匹配导致的伪影而降低音频质量。为了克服这些限制,我们提出了DiffATSM,这是一种新的基于深度学习的TSM框架,可以直接从原始波形生成时间尺度的语音,而无需转录。DiffATSM包括两个主要部分:一个自适应神经生成器和一个使用扩散概率模型的后处理网络。自适应神经发生器通过对自监督语音模型中提取的语音后图(PPG)进行调理来调节mel谱图的时间尺度。这些PPG特征作为辅助信息在时间标度过程中保持语音结构。生成的频谱图通过基于扩散的后处理网络进一步细化,通过建模复杂的语音分布来提高保真度。实验结果表明,DiffATSM在主观和客观评价方面都明显优于现有的TSM算法(包括ATSM)。
{"title":"DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing","authors":"Sohee Jang ,&nbsp;Yeon-Ju Kim ,&nbsp;Joon-Hyuk Chang","doi":"10.1016/j.csl.2025.101895","DOIUrl":"10.1016/j.csl.2025.101895","url":null,"abstract":"<div><div>The advent of adaptive time-scale modification (ATSM) has marked a significant evolution in audio processing, applying adaptive speaking rates that surpass the performance of conventional time-scale modification (TSM) systems employing a fixed speaking rate. However, ATSM requires audio transcriptions and additional phoneme localization modules, which limit its applicability when such resources are unavailable. Furthermore, traditional signal processing approaches in the time domain often degrade audio quality due to artifacts resulting from phase mismatches. To overcome these limitations, we propose DiffATSM, a novel deep learning-based TSM framework that directly generates time-scaled speech from raw waveforms without requiring transcription. DiffATSM comprises two main components: an adaptive neural generator and a post-processing network using a diffusion probabilistic model. The adaptive neural generator modulates the temporal scale of the mel spectrogram by conditioning on phonetic posteriorgrams (PPG), which are extracted from a self-supervised speech model. These PPG features serve as auxiliary information to preserve phonetic structure during time scaling. The generated spectrogram is further refined by the diffusion-based post-processing network, which enhances fidelity by modeling complex speech distributions. Our experimental results demonstrate that DiffATSM significantly outperforms existing TSM algorithms, including ATSM, in subjective and objective evaluations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101895"},"PeriodicalIF":3.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Under the hood: Phonemic Restoration in transformer-based automatic speech recognition 引擎盖下:音位恢复在基于变压器的自动语音识别
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-15 DOI: 10.1016/j.csl.2025.101893
Iona Gessinger , Erfan A. Shams , Julie Carson-Berndsen
This study investigates how the automatic speech recognition (ASR) models wav2vec 2.0 large-960h-lv60-self and Whisper large-v3 perform when segment-level signal perturbations (added noise, noisy gaps, and two types of silent gaps) are introduced in English words and pseudowords. We probed the speech embeddings throughout their encoder transformer layers to examine how they encode articulatory features (place and manner of articulation, and voicing). We found that wav2vec 2.0 was more successful than Whisper at restoring perturbed segments across conditions. For wav2vec 2.0 embeddings, classification accuracy was higher in words than in pseudowords. The articulatory features encoding of both ASR models was least disturbed by added noise, and most disturbed by noisy gaps, with silent gaps falling in between. Coarticulatory cues improved classification of articulatory features and classification accuracy increased from early to late layers for both models. Among the examined target sounds, [n] stood out from [m],
, and [l], as it was classified particularly well under all conditions. We compare ASR performance to the Phonemic Restoration Effect in human speech perception and discuss potential reasons for the performance differences between the two ASR models. This approach aims to foster a better understanding of otherwise opaque systems.
本文研究了自动语音识别(ASR)模型wav2vec 2.0 large-960h-lv60-self和Whisper large-v3在英语单词和假词中引入段级信号扰动(添加噪声、噪声间隙和两种类型的静音间隙)时的性能。我们在整个编码器转换层中探索了语音嵌入,以检查它们如何编码发音特征(发音的位置和方式以及语音)。我们发现,在不同条件下,wav2vec 2.0在恢复受干扰的片段方面比Whisper更成功。对于wav2vec 2.0嵌入,词的分类准确率高于伪词。两种ASR模型的发音特征编码受附加噪声的干扰最小,受噪声间隙的干扰最大,其中无声间隙介于两者之间。协同发音提示改善了发音特征的分类,两种模型的分类准确率从早期到晚期都有所提高。在被检测的目标音中,[n]从[m],和[l]中脱颖而出,因为它在所有条件下都被分类得特别好。我们将ASR的表现与人类语音感知中的音位恢复效应进行了比较,并讨论了两种ASR模型表现差异的潜在原因。这种方法旨在促进对不透明系统的更好理解。
{"title":"Under the hood: Phonemic Restoration in transformer-based automatic speech recognition","authors":"Iona Gessinger ,&nbsp;Erfan A. Shams ,&nbsp;Julie Carson-Berndsen","doi":"10.1016/j.csl.2025.101893","DOIUrl":"10.1016/j.csl.2025.101893","url":null,"abstract":"<div><div>This study investigates how the automatic speech recognition (ASR) models wav2vec 2.0 <em>large-960h-lv60-self</em> and Whisper <em>large-v3</em> perform when segment-level signal perturbations (added noise, noisy gaps, and two types of silent gaps) are introduced in English words and pseudowords. We probed the speech embeddings throughout their encoder transformer layers to examine how they encode articulatory features (place and manner of articulation, and voicing). We found that wav2vec 2.0 was more successful than Whisper at restoring perturbed segments across conditions. For wav2vec 2.0 embeddings, classification accuracy was higher in words than in pseudowords. The articulatory features encoding of both ASR models was least disturbed by added noise, and most disturbed by noisy gaps, with silent gaps falling in between. Coarticulatory cues improved classification of articulatory features and classification accuracy increased from early to late layers for both models. Among the examined target sounds, [n] stood out from [m], <figure><img></figure> , and [l], as it was classified particularly well under all conditions. We compare ASR performance to the Phonemic Restoration Effect in human speech perception and discuss potential reasons for the performance differences between the two ASR models. This approach aims to foster a better understanding of otherwise opaque systems.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101893"},"PeriodicalIF":3.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1