首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Sentiment classification method based on BERT-CondConv multi-moment state fusion 基于BERT-CondConv多时刻状态融合的情感分类方法
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-20 DOI: 10.1016/j.csl.2025.101855
Wang Xiaoyang , Liu Wenfeng
Sentiment classification has emerged as a significant research area in the field of natural language processing, garnering considerable attention in recent years. However, obtaining feature information of text sequences for sentiment classification, especially for texts with diverse characteristics, remains a challenging task. Traditional methods for extracting text features often treat all data in a uniform manner. To address this issue, we propose a hybrid sentiment classification model called BERT-CondConv, which integrates the strengths of BERT and conditional parameter convolution networks. By applying adaptive conditional parameter convolution on the hidden state feature information at different time steps of BERT, our model enhances feature extraction and optimization, and finally fusion features, thus improving the sentiment classification task. We compared various base model architectures and benchmarked our method against state-of-the-art techniques. The experimental results demonstrate the effectiveness of our approach.
情感分类是自然语言处理领域的一个重要研究领域,近年来受到了广泛的关注。然而,获取文本序列的特征信息用于情感分类,特别是对于具有多种特征的文本,仍然是一个具有挑战性的任务。提取文本特征的传统方法通常以统一的方式处理所有数据。为了解决这个问题,我们提出了一种称为BERT- condconv的混合情感分类模型,该模型集成了BERT和条件参数卷积网络的优点。通过对BERT在不同时间步长的隐藏状态特征信息进行自适应条件参数卷积,增强特征提取和优化,最终实现特征融合,从而改进情感分类任务。我们比较了各种基本模型架构,并将我们的方法与最先进的技术进行了比较。实验结果证明了该方法的有效性。
{"title":"Sentiment classification method based on BERT-CondConv multi-moment state fusion","authors":"Wang Xiaoyang ,&nbsp;Liu Wenfeng","doi":"10.1016/j.csl.2025.101855","DOIUrl":"10.1016/j.csl.2025.101855","url":null,"abstract":"<div><div>Sentiment classification has emerged as a significant research area in the field of natural language processing, garnering considerable attention in recent years. However, obtaining feature information of text sequences for sentiment classification, especially for texts with diverse characteristics, remains a challenging task. Traditional methods for extracting text features often treat all data in a uniform manner. To address this issue, we propose a hybrid sentiment classification model called BERT-CondConv, which integrates the strengths of BERT and conditional parameter convolution networks. By applying adaptive conditional parameter convolution on the hidden state feature information at different time steps of BERT, our model enhances feature extraction and optimization, and finally fusion features, thus improving the sentiment classification task. We compared various base model architectures and benchmarked our method against state-of-the-art techniques. The experimental results demonstrate the effectiveness of our approach.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101855"},"PeriodicalIF":3.1,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AraFastQA: a transformer model for question-answering for Arabic language using few-shot learning 阿拉法特qa:一个阿拉伯语问答转换模型,使用少镜头学习
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-19 DOI: 10.1016/j.csl.2025.101857
Asmaa Alrayzah , Fawaz Alsolami , Mostafa Saleh
In recent years, numerous studies have developed pre-trained language models (PLMs) for Arabic natural language processing (NLP) tasks, including question-answering (QA), but often overlooking the challenge of data scarcity. This study introduces the Arabic Few-Shot QA (AraFastQA) pre-trained language model to confront the challenge of limited resources in Arabic QA tasks. The primary contributions of this study involve developing an PLM based on a few-shot learning (FSL) approach to address the challenge of low-resource datasets in Arabic QA. Moreover, this study contributes to the developing of Arabic benchmark few-shot QA datasets. By using the few-shot datasets, we compare the AraFastQA PLM with the state-of-art Arabic PLMs such that AraBERT, AraELECTRA, and XLM-Roberta. We evaluated AraFastQA and state-of-art models on two Arabic benchmark datasets that are Arabic reading comprehension (ARCD) and the typologically diverse question answering (TyDiQA). The obtained experimental results show that AraFastQA outperforms other models across eight training sample sizes of the Arabic benchmark datasets. For instance, our proposed PLM achieves 73.2 of F1-score on TyDi QA with only 1024 training examples while the highest accuracy of other models (AraELECTRA) achieves 56.1. For the full training dataset of ARCD dataset, AraFastQA improves accuracy by 9 %, 3 %, and 10 % of AraBERT, AraELECTRA, and XLM-Roberta respectively.
近年来,许多研究为阿拉伯语自然语言处理(NLP)任务开发了预训练语言模型(PLMs),包括问答(QA),但往往忽视了数据稀缺性的挑战。本研究引入阿拉伯语Few-Shot QA (AraFastQA)预训练语言模型,以应对阿拉伯语QA任务中资源有限的挑战。本研究的主要贡献包括开发基于少量学习(FSL)方法的PLM,以解决阿拉伯语QA中资源不足的数据集的挑战。此外,本研究还有助于开发阿拉伯语基准少射QA数据集。通过使用少量数据集,我们将AraFastQA PLM与阿拉伯最先进的PLM(如AraBERT、AraELECTRA和XLM-Roberta)进行了比较。我们在阿拉伯语阅读理解(ARCD)和类型学多样性问答(TyDiQA)两个阿拉伯基准数据集上评估了AraFastQA和最先进的模型。实验结果表明,在阿拉伯文基准数据集的8个训练样本大小上,AraFastQA优于其他模型。例如,我们提出的PLM仅使用1024个训练样例,在TyDi QA上就达到了F1-score的73.2,而其他模型(AraELECTRA)的最高准确率达到了56.1。对于ARCD数据集的完整训练数据集,AraFastQA的准确率分别比AraBERT、AraELECTRA和XLM-Roberta提高9%、3%和10%。
{"title":"AraFastQA: a transformer model for question-answering for Arabic language using few-shot learning","authors":"Asmaa Alrayzah ,&nbsp;Fawaz Alsolami ,&nbsp;Mostafa Saleh","doi":"10.1016/j.csl.2025.101857","DOIUrl":"10.1016/j.csl.2025.101857","url":null,"abstract":"<div><div>In recent years, numerous studies have developed pre-trained language models (PLMs) for Arabic natural language processing (NLP) tasks, including question-answering (QA), but often overlooking the challenge of data scarcity. This study introduces the Arabic Few-Shot QA (AraFastQA) pre-trained language model to confront the challenge of limited resources in Arabic QA tasks. The primary contributions of this study involve developing an PLM based on a few-shot learning (FSL) approach to address the challenge of low-resource datasets in Arabic QA. Moreover, this study contributes to the developing of Arabic benchmark few-shot QA datasets. By using the few-shot datasets, we compare the AraFastQA PLM with the state-of-art Arabic PLMs such that AraBERT, AraELECTRA, and XLM-Roberta. We evaluated AraFastQA and state-of-art models on two Arabic benchmark datasets that are Arabic reading comprehension (ARCD) and the typologically diverse question answering (TyDiQA). The obtained experimental results show that AraFastQA outperforms other models across eight training sample sizes of the Arabic benchmark datasets. For instance, our proposed PLM achieves 73.2 of F1-score on TyDi QA with only 1024 training examples while the highest accuracy of other models (AraELECTRA) achieves 56.1. For the full training dataset of ARCD dataset, AraFastQA improves accuracy by 9 %, 3 %, and 10 % of AraBERT, AraELECTRA, and XLM-Roberta respectively.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101857"},"PeriodicalIF":3.1,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting accentedness and comprehensibility through ASR scores and acoustic features 通过ASR评分和声学特征预测口音和可理解性
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-18 DOI: 10.1016/j.csl.2025.101858
Wenwei Dong, Catia Cucchiarini, Roeland van Hout, Helmer Strik
Accentedness and comprehensibility scales are widely used in measuring the oral proficiency of second language (L2) learners, including learners of English as a Second Language (ESL). In this paper, we focus on gaining a better understanding of the concepts of accentedness and comprehensibility by developing and applying automatic measures to ESL utterances produced by Indonesian learners. We extracted features both on the segmental and the suprasegmental (fundamental frequency, loudness, energy et al.) levels to investigate which features are actually related to expert judgments on accentedness and comprehensibility. Automatic Speech Recognition (ASR) pronunciation scores based on the traditional Kaldi Time Delay Neural Network (TDNN) model and on the End-to-End Whisper model were applied, and data-driven methods were used by combining acoustic features extracted by the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and Praat. The experimental results showed that Whisper outperformed the Kaldi-TDNN model. The Whisper model gave the best results for predicting comprehensibility on the basis of phone distance, and the best results for predicting accentedness on the basis of grapheme distance. Combining segmental and suprasegmental features improved the results, yielding different feature rankings for comprehensibility and accentedness. In our final step of analysis, we included differences between utterances and learners as random effects in a mixed linear regression model. Exploiting these information sources yielded a substantial improvement in predicting both comprehensibility and accentedness.
口音和可理解性量表被广泛用于衡量第二语言(L2)学习者的口语能力,包括英语作为第二语言(ESL)学习者。在本文中,我们的重点是通过开发和应用自动测量方法来更好地理解重音性和可理解性的概念。我们提取了分段和超分段(基频、响度、能量等)水平上的特征,以研究哪些特征实际上与专家对口音和可理解性的判断有关。采用基于传统Kaldi时延神经网络(TDNN)模型和端到端耳语模型的自动语音识别(ASR)语音评分,结合日内瓦极简声学参数集(eGeMAPS)和Praat提取的声学特征,采用数据驱动方法。实验结果表明,Whisper优于Kaldi-TDNN模型。Whisper模型在基于电话距离预测可理解性和基于字素距离预测重音性方面的结果最好。结合分段和超分段特征改进了结果,产生了不同的可理解性和重音性特征排名。在分析的最后一步,我们将话语和学习者之间的差异作为随机效应纳入混合线性回归模型。利用这些信息源在预测可理解性和重音性方面取得了实质性的进步。
{"title":"Predicting accentedness and comprehensibility through ASR scores and acoustic features","authors":"Wenwei Dong,&nbsp;Catia Cucchiarini,&nbsp;Roeland van Hout,&nbsp;Helmer Strik","doi":"10.1016/j.csl.2025.101858","DOIUrl":"10.1016/j.csl.2025.101858","url":null,"abstract":"<div><div>Accentedness and comprehensibility scales are widely used in measuring the oral proficiency of second language (L2) learners, including learners of English as a Second Language (ESL). In this paper, we focus on gaining a better understanding of the concepts of accentedness and comprehensibility by developing and applying automatic measures to ESL utterances produced by Indonesian learners. We extracted features both on the segmental and the suprasegmental (fundamental frequency, loudness, energy et al.) levels to investigate which features are actually related to expert judgments on accentedness and comprehensibility. Automatic Speech Recognition (ASR) pronunciation scores based on the traditional Kaldi Time Delay Neural Network (TDNN) model and on the End-to-End Whisper model were applied, and data-driven methods were used by combining acoustic features extracted by the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and Praat. The experimental results showed that Whisper outperformed the Kaldi-TDNN model. The Whisper model gave the best results for predicting comprehensibility on the basis of phone distance, and the best results for predicting accentedness on the basis of grapheme distance. Combining segmental and suprasegmental features improved the results, yielding different feature rankings for comprehensibility and accentedness. In our final step of analysis, we included differences between utterances and learners as random effects in a mixed linear regression model. Exploiting these information sources yielded a substantial improvement in predicting both comprehensibility and accentedness.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101858"},"PeriodicalIF":3.1,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-turn response selection with Language Style and Topic Aware enhancement 多回合响应选择与语言风格和主题意识增强
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-16 DOI: 10.1016/j.csl.2025.101842
Weiwei Li, Yuzhong Chen, Junjie Xu, Jiayuan Zhong, Chen Dong
The multi-turn response selection is an important component in retrieval-based human–computer dialogue systems. Most recent models adopt the utilization of pre-trained language models to acquire fine-grained semantic information within diverse dialogue contexts, thereby enhancing the precision of response selection. However, effectively leveraging the language style information of speakers along with the topic information in the dialogue context to enhance the semantic understanding capability of pre-trained language models still poses a significant challenge that requires resolution. To address this challenge, we propose a BERT-based Language Style and Topic Aware (BERT-LSTA) model for multi-turn response selection. BERT-LSTA augments BERT with two distinctive modules: the Language Style Aware (LSA) module and the Question-oriented Topic Window Selection (QTWS) module. The LSA module introduces a contrastive learning method to learn the latent language style information from distinct speakers in the dialogue. The QTWS module proposes a topic window segmentation algorithm to segment the dialogue context into topic windows, which facilitates the capacity of BERT-LSTA to refine and incorporate relevant topic information for response selection. Experimental results on two public benchmark datasets demonstrate that BERT-LSTA outperforms all state-of-the-art baseline models across various metrics. Furthermore, ablation studies reveal that the LSA module significantly improves performance by capturing speaker-specific language styles, while the QTWS module enhances topic relevance by filtering irrelevant contextual information.
多回合响应选择是基于检索的人机对话系统的重要组成部分。最新的模型采用预训练的语言模型来获取不同对话上下文中的细粒度语义信息,从而提高响应选择的精度。然而,如何有效地利用说话人的语言风格信息和对话语境中的话题信息来增强预训练语言模型的语义理解能力,仍然是一个需要解决的重大挑战。为了解决这一挑战,我们提出了一种基于bert的语言风格和主题感知(BERT-LSTA)模型,用于多回合响应选择。BERT- lsta在BERT的基础上增加了两个不同的模块:语言风格感知(LSA)模块和面向问题的主题窗口选择(QTWS)模块。LSA模块引入了一种对比学习方法,从不同说话者的对话中学习潜在的语言风格信息。QTWS模块提出了一种话题窗口分割算法,将对话上下文分割成话题窗口,使BERT-LSTA能够提炼和整合相关的话题信息,从而进行响应选择。在两个公共基准数据集上的实验结果表明,BERT-LSTA在各种指标上优于所有最先进的基线模型。此外,消融研究表明,LSA模块通过捕获说话者特定的语言风格显著提高了性能,而QTWS模块通过过滤不相关的上下文信息来增强主题相关性。
{"title":"Multi-turn response selection with Language Style and Topic Aware enhancement","authors":"Weiwei Li,&nbsp;Yuzhong Chen,&nbsp;Junjie Xu,&nbsp;Jiayuan Zhong,&nbsp;Chen Dong","doi":"10.1016/j.csl.2025.101842","DOIUrl":"10.1016/j.csl.2025.101842","url":null,"abstract":"<div><div>The multi-turn response selection is an important component in retrieval-based human–computer dialogue systems. Most recent models adopt the utilization of pre-trained language models to acquire fine-grained semantic information within diverse dialogue contexts, thereby enhancing the precision of response selection. However, effectively leveraging the language style information of speakers along with the topic information in the dialogue context to enhance the semantic understanding capability of pre-trained language models still poses a significant challenge that requires resolution. To address this challenge, we propose a BERT-based Language Style and Topic Aware (BERT-LSTA) model for multi-turn response selection. BERT-LSTA augments BERT with two distinctive modules: the Language Style Aware (LSA) module and the Question-oriented Topic Window Selection (QTWS) module. The LSA module introduces a contrastive learning method to learn the latent language style information from distinct speakers in the dialogue. The QTWS module proposes a topic window segmentation algorithm to segment the dialogue context into topic windows, which facilitates the capacity of BERT-LSTA to refine and incorporate relevant topic information for response selection. Experimental results on two public benchmark datasets demonstrate that BERT-LSTA outperforms all state-of-the-art baseline models across various metrics. Furthermore, ablation studies reveal that the LSA module significantly improves performance by capturing speaker-specific language styles, while the QTWS module enhances topic relevance by filtering irrelevant contextual information.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101842"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minerva 2 for speech and language tasks Minerva 2用于语音和语言任务
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-14 DOI: 10.1016/j.csl.2025.101843
Rhiannon Mogridge, Anton Ragni
Most artificial neural networks do not directly incorporate a memory of previous experiences, instead using training data to parameterise a model, and then discarding the training data prior to inference. While some recent models have included a memory, this has typically been added to an already highly parameterised model. An alternative option is to use a purely memory-based model, and then add parameters. This has been shown to work for Minerva 2, a simple, non-parametric, memory-based model which has been widely used in the field of human psychology. We revisit the use of Minerva 2 for speech and language tasks, drawing comparisons between Minerva 2 and other architectures, and showing that an iterative process that Minerva 2 uses for inference is a close relative of deep equilibrium models. We assess parameterised models based on Minerva 2, including a sequence model inspired by Minerva 2’s similarity to the transformer architecture, which shows promising results.
大多数人工神经网络不直接结合以前经验的记忆,而是使用训练数据来参数化模型,然后在推理之前丢弃训练数据。虽然最近的一些模型包含了内存,但这通常是添加到已经高度参数化的模型中。另一种选择是使用纯粹基于内存的模型,然后添加参数。Minerva 2是一个简单的、非参数的、基于记忆的模型,已被广泛应用于人类心理学领域。我们重新审视了Minerva 2在语音和语言任务中的使用,在Minerva 2和其他架构之间进行了比较,并表明Minerva 2用于推理的迭代过程是深度平衡模型的近亲。我们评估了基于Minerva 2的参数化模型,包括受Minerva 2与变压器架构相似性启发的序列模型,该模型显示了有希望的结果。
{"title":"Minerva 2 for speech and language tasks","authors":"Rhiannon Mogridge,&nbsp;Anton Ragni","doi":"10.1016/j.csl.2025.101843","DOIUrl":"10.1016/j.csl.2025.101843","url":null,"abstract":"<div><div>Most artificial neural networks do not directly incorporate a memory of previous experiences, instead using training data to parameterise a model, and then discarding the training data prior to inference. While some recent models have included a memory, this has typically been added to an already highly parameterised model. An alternative option is to use a purely memory-based model, and then add parameters. This has been shown to work for Minerva 2, a simple, non-parametric, memory-based model which has been widely used in the field of human psychology. We revisit the use of Minerva 2 for speech and language tasks, drawing comparisons between Minerva 2 and other architectures, and showing that an iterative process that Minerva 2 uses for inference is a close relative of deep equilibrium models. We assess parameterised models based on Minerva 2, including a sequence model inspired by Minerva 2’s similarity to the transformer architecture, which shows promising results.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101843"},"PeriodicalIF":3.1,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144314149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiCoW: Diarization-conditioned Whisper for target speaker automatic speech recognition DiCoW:用于目标说话者自动语音识别的定向条件耳语
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-13 DOI: 10.1016/j.csl.2025.101841
Alexander Polok , Dominik Klement , Martin Kocour , Jiangyu Han , Federico Landini , Bolaji Yusuf , Matthew Wiesner , Sanjeev Khudanpur , Jan Černocký , Lukáš Burget
Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize to unseen speakers. In this work, we propose Diarization-Conditioned Whisper (DiCoW), a novel approach to target-speaker ASR that leverages speaker diarization outputs as conditioning information. DiCoW extends the pre-trained Whisper model by integrating diarization labels directly, eliminating reliance on speaker embeddings and reducing the need for extensive speaker-specific training data. Our method introduces frame-level diarization-dependent transformations (FDDT) and query-key biasing (QKb) techniques to refine the model’s focus on target speakers while effectively handling overlapping speech. By leveraging diarization outputs as conditioning signals, DiCoW simplifies the workflow for multi-speaker ASR, improves generalization to unseen speakers and enables more reliable transcription in real-world multi-speaker recordings. Additionally, we explore the integration of a connectionist temporal classification (CTC) head to Whisper and demonstrate its ability to improve transcription efficiency through hybrid decoding. Notably, we show that our approach is not limited to Whisper; it also provides similar benefits when applied to the Branchformer model. We validate DiCoW on real-world datasets, including AMI and NOTSOFAR-1 from CHiME-8 challenge, as well as synthetic benchmarks such as Libri2Mix and LibriCSS, enabling direct comparisons with previous methods. Results demonstrate that DiCoW enhances the model’s target-speaker ASR capabilities while maintaining Whisper’s accuracy and robustness on single-speaker data.
多说话人环境下的说话人属性自动语音识别(ASR)仍然是一个重大挑战,特别是当基于说话人嵌入的系统无法推广到看不见的说话人时。在这项工作中,我们提出了dialization - conditioned Whisper (DiCoW),这是一种利用说话人dialization输出作为条件反射信息的目标说话人ASR的新方法。DiCoW扩展了预训练的Whisper模型,直接集成了diarization标签,消除了对说话人嵌入的依赖,减少了对大量说话人特定训练数据的需求。我们的方法引入了帧级偏振相关变换(FDDT)和查询键偏置(QKb)技术,以改进模型对目标说话者的关注,同时有效地处理重叠语音。通过利用拨号输出作为调节信号,DiCoW简化了多扬声器ASR的工作流程,提高了对未见扬声器的泛化,并在实际多扬声器录音中实现了更可靠的转录。此外,我们探索了连接主义时间分类(CTC)头部与Whisper的整合,并证明了其通过混合解码提高转录效率的能力。值得注意的是,我们的方法并不局限于Whisper;当应用于Branchformer模型时,它也提供了类似的好处。我们在现实世界的数据集上验证了DiCoW,包括来自CHiME-8挑战的AMI和NOTSOFAR-1,以及合成基准,如Libri2Mix和LibriCSS,可以与以前的方法直接比较。结果表明,DiCoW增强了模型的目标说话人ASR能力,同时保持了Whisper在单说话人数据上的准确性和鲁棒性。
{"title":"DiCoW: Diarization-conditioned Whisper for target speaker automatic speech recognition","authors":"Alexander Polok ,&nbsp;Dominik Klement ,&nbsp;Martin Kocour ,&nbsp;Jiangyu Han ,&nbsp;Federico Landini ,&nbsp;Bolaji Yusuf ,&nbsp;Matthew Wiesner ,&nbsp;Sanjeev Khudanpur ,&nbsp;Jan Černocký ,&nbsp;Lukáš Burget","doi":"10.1016/j.csl.2025.101841","DOIUrl":"10.1016/j.csl.2025.101841","url":null,"abstract":"<div><div>Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize to unseen speakers. In this work, we propose Diarization-Conditioned Whisper (DiCoW), a novel approach to target-speaker ASR that leverages speaker diarization outputs as conditioning information. DiCoW extends the pre-trained Whisper model by integrating diarization labels directly, eliminating reliance on speaker embeddings and reducing the need for extensive speaker-specific training data. Our method introduces frame-level diarization-dependent transformations (FDDT) and query-key biasing (QKb) techniques to refine the model’s focus on target speakers while effectively handling overlapping speech. By leveraging diarization outputs as conditioning signals, DiCoW simplifies the workflow for multi-speaker ASR, improves generalization to unseen speakers and enables more reliable transcription in real-world multi-speaker recordings. Additionally, we explore the integration of a connectionist temporal classification (CTC) head to Whisper and demonstrate its ability to improve transcription efficiency through hybrid decoding. Notably, we show that our approach is not limited to Whisper; it also provides similar benefits when applied to the Branchformer model. We validate DiCoW on real-world datasets, including AMI and NOTSOFAR-1 from CHiME-8 challenge, as well as synthetic benchmarks such as Libri2Mix and LibriCSS, enabling direct comparisons with previous methods. Results demonstrate that DiCoW enhances the model’s target-speaker ASR capabilities while maintaining Whisper’s accuracy and robustness on single-speaker data.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101841"},"PeriodicalIF":3.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144314148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards explainable spoofed speech attribution and detection: A probabilistic approach for characterizing speech synthesizer components 向可解释的欺骗语音归因和检测:表征语音合成器组件的概率方法
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-11 DOI: 10.1016/j.csl.2025.101840
Jagabandhu Mishra , Manasi Chhibber , Hye-jin Shim , Tomi H. Kinnunen
We propose an explainable probabilistic framework for characterizing spoofed speech by decomposing it into probabilistic attribute embeddings. Unlike raw high-dimensional countermeasure embeddings, which lack interpretability, the proposed probabilistic attribute embeddings aim to detect specific speech synthesizer components, represented through high-level attributes and their corresponding values. We use these probabilistic embeddings with four classifier back-ends to address two downstream tasks: spoofing detection and spoofing attack attribution. The former is the well-known bonafide-spoof detection task, whereas the latter seeks to identify the source method (generator) of a spoofed utterance. We additionally use Shapley values, a widely used technique in machine learning, to quantify the relative contribution of each attribute value to the decision-making process in each task. Results on the ASVspoof2019 dataset demonstrate the substantial role of waveform generator, conversion model outputs, and inputs in spoofing detection; and inputs, speaker, and duration modeling in spoofing attack attribution. In the detection task, the probabilistic attribute embeddings achieve 99.7% balanced accuracy and 0.22% equal error rate (EER), closely matching the performance of raw embeddings (99.9% balanced accuracy and 0.22% EER). Similarly, in the attribution task, our embeddings achieve 90.23% balanced accuracy and 2.07% EER, compared to 90.16% and 2.11% with raw embeddings. These results demonstrate that the proposed framework is both inherently explainable by design and capable of achieving performance comparable to raw CM embeddings.
我们提出了一个可解释的概率框架,通过将欺骗语音分解为概率属性嵌入来表征欺骗语音。与缺乏可解释性的原始高维对抗嵌入不同,本文提出的概率属性嵌入旨在检测通过高级属性及其对应值表示的特定语音合成器组件。我们使用这些概率嵌入与四个分类器后端来解决两个下游任务:欺骗检测和欺骗攻击归因。前者是众所周知的虚假欺骗检测任务,而后者试图识别欺骗话语的源方法(生成器)。此外,我们还使用Shapley值(一种在机器学习中广泛使用的技术)来量化每个任务中每个属性值对决策过程的相对贡献。在ASVspoof2019数据集上的结果证明了波形发生器、转换模型输出和输入在欺骗检测中的重要作用;以及欺骗攻击归因中的输入、说话人和持续时间建模。在检测任务中,概率属性嵌入达到99.7%的平衡精度和0.22%的等错误率(EER),与原始嵌入的99.9%的平衡精度和0.22%的EER非常接近。同样,在归因任务中,我们的嵌入实现了90.23%的平衡准确率和2.07%的EER,而原始嵌入的平衡准确率和EER分别为90.16%和2.11%。这些结果表明,所提出的框架既具有内在的可解释性,又能够实现与原始CM嵌入相当的性能。
{"title":"Towards explainable spoofed speech attribution and detection: A probabilistic approach for characterizing speech synthesizer components","authors":"Jagabandhu Mishra ,&nbsp;Manasi Chhibber ,&nbsp;Hye-jin Shim ,&nbsp;Tomi H. Kinnunen","doi":"10.1016/j.csl.2025.101840","DOIUrl":"10.1016/j.csl.2025.101840","url":null,"abstract":"<div><div>We propose an explainable probabilistic framework for characterizing spoofed speech by decomposing it into probabilistic attribute embeddings. Unlike raw high-dimensional countermeasure embeddings, which lack interpretability, the proposed probabilistic attribute embeddings aim to detect specific speech synthesizer components, represented through high-level attributes and their corresponding values. We use these probabilistic embeddings with four classifier back-ends to address two downstream tasks: spoofing detection and spoofing attack attribution. The former is the well-known bonafide-spoof detection task, whereas the latter seeks to identify the source method (generator) of a spoofed utterance. We additionally use Shapley values, a widely used technique in machine learning, to quantify the relative contribution of each attribute value to the decision-making process in each task. Results on the ASVspoof2019 dataset demonstrate the substantial role of waveform generator, conversion model outputs, and inputs in spoofing detection; and inputs, speaker, and duration modeling in spoofing attack attribution. In the detection task, the probabilistic attribute embeddings achieve 99.7% balanced accuracy and 0.22% equal error rate (EER), closely matching the performance of raw embeddings (99.9% balanced accuracy and 0.22% EER). Similarly, in the attribution task, our embeddings achieve 90.23% balanced accuracy and 2.07% EER, compared to 90.16% and 2.11% with raw embeddings. These results demonstrate that the proposed framework is both inherently explainable by design and capable of achieving performance comparable to raw CM embeddings.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101840"},"PeriodicalIF":3.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Raw acoustic-articulatory multimodal dysarthric speech recognition 原始声学-发音多模态困难语音识别
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-10 DOI: 10.1016/j.csl.2025.101839
Zhengjun Yue , Erfan Loweimi , Zoran Cvetkovic , Jon Barker , Heidi Christensen
Automatic speech recognition (ASR) for dysarthric speech is challenging. The acoustic characteristics of dysarthric speech are highly variable and there are often fewer distinguishing cues between phonetic tokens. Multimodal ASR utilises the data from other modalities to facilitate the task when a single acoustic modality proves insufficient. Articulatory information, which encapsulates knowledge about the speech production process, may constitute such a complementary modality. Although multimodal acoustic-articulatory ASR has received increasing attention recently, incorporating real articulatory data is under-explored for dysarthric speech recognition. This paper investigates the effectiveness of multimodal acoustic modelling using real dysarthric speech articulatory information in combination with acoustic features, especially raw signal representations which are more informative than classic features, leading to learning representations tailored to dysarthric ASR. In particular, various raw acoustic-articulatory multimodal dysarthric speech recognition systems are developed and compared with similar systems with hand-crafted features. Furthermore, the difference between dysarthric and typical speech in terms of articulatory information is systematically analysed by using a statistical space distribution indicator called Maximum Articulator Motion Range (MAMR). Additionally, we used mutual information analysis to investigate the robustness and phonetic information content of the articulatory features, offering insights that support feature selection and the ASR results. Experimental results on the widely used TORGO dysarthric speech dataset show that combining the articulatory and raw acoustic features at the empirically found optimal fusion level achieves a notable performance gain, leading to up to 7.6% and 12.8% relative word error rate (WER) reduction for dysarthric and typical speech, respectively.
语言障碍语音的自动识别(ASR)具有挑战性。诵读困难的语音的声学特征是高度可变的,并且在语音符号之间往往很少有区别的线索。当单一声学模态不够时,多模态ASR利用来自其他模态的数据来促进任务。发音信息,它封装了关于语音产生过程的知识,可能构成这样一种补充情态。虽然多模态声学-发音ASR最近受到越来越多的关注,但结合真实发音数据用于困难语音识别的探索还不够。本文研究了多模态声学建模的有效性,将真实的困难语音发音信息与声学特征相结合,特别是原始信号表征,它比经典特征更具信息量,从而导致针对困难ASR的学习表征。特别是,开发了各种原始声学-发音多模态困难语音识别系统,并与具有手工制作特征的类似系统进行了比较。此外,通过使用称为最大关节运动范围(MAMR)的统计空间分布指标,系统地分析了发音困难和典型语音在发音信息方面的差异。此外,我们使用互信息分析来研究发音特征的鲁棒性和语音信息内容,为支持特征选择和ASR结果提供见解。在广泛使用的TORGO困难语音数据集上的实验结果表明,在经验发现的最佳融合水平上结合发音和原始声学特征可以获得显着的性能提升,使困难语音和典型语音的相对单词错误率(WER)分别降低7.6%和12.8%。
{"title":"Raw acoustic-articulatory multimodal dysarthric speech recognition","authors":"Zhengjun Yue ,&nbsp;Erfan Loweimi ,&nbsp;Zoran Cvetkovic ,&nbsp;Jon Barker ,&nbsp;Heidi Christensen","doi":"10.1016/j.csl.2025.101839","DOIUrl":"10.1016/j.csl.2025.101839","url":null,"abstract":"<div><div>Automatic speech recognition (ASR) for dysarthric speech is challenging. The acoustic characteristics of dysarthric speech are highly variable and there are often fewer distinguishing cues between phonetic tokens. Multimodal ASR utilises the data from other modalities to facilitate the task when a single acoustic modality proves insufficient. Articulatory information, which encapsulates knowledge about the speech production process, may constitute such a complementary modality. Although multimodal acoustic-articulatory ASR has received increasing attention recently, incorporating real articulatory data is under-explored for dysarthric speech recognition. This paper investigates the effectiveness of multimodal acoustic modelling using real dysarthric speech articulatory information in combination with acoustic features, especially raw signal representations which are more informative than classic features, leading to learning representations tailored to dysarthric ASR. In particular, various raw acoustic-articulatory multimodal dysarthric speech recognition systems are developed and compared with similar systems with hand-crafted features. Furthermore, the difference between dysarthric and typical speech in terms of articulatory information is systematically analysed by using a statistical space distribution indicator called Maximum Articulator Motion Range (MAMR). Additionally, we used mutual information analysis to investigate the robustness and phonetic information content of the articulatory features, offering insights that support feature selection and the ASR results. Experimental results on the widely used TORGO dysarthric speech dataset show that combining the articulatory and raw acoustic features at the empirically found optimal fusion level achieves a notable performance gain, leading to up to 7.6% and 12.8% relative word error rate (WER) reduction for dysarthric and typical speech, respectively.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101839"},"PeriodicalIF":3.1,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment analysis for live video comments with variational residual representations 基于变分残差表示的实时视频评论情感分析
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-09 DOI: 10.1016/j.csl.2025.101838
Changfan Luo , Ling Fang , Bensheng Qiu
Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. 1
实时视频评论(LVC)对于舆论分析、交流和用户参与都很有价值。分析LVC中的情绪对于理解其内容至关重要,特别是当涉及到强烈的情绪时。然而,与普通文本相比,LVC表现出更强的实时性,以及上下文依赖和跨模态错位。传统的情感分析方法仅依赖于文本信息和明确的上下文,而现有的多模态情感分析模型在区分上下文和对齐多模态信息方面存在不足。为了解决这些挑战,我们提出了一种基于变分自编码器的新型变分残差融合网络,用于lvc的情感分析。特别是,在编码器中引入了一个自动过滤模块,用于过滤掉有用的周围注释,作为目标注释的上下文信息。在编码器和解码器之间嵌入残差融合模块,以区分最相关的视觉信息,促进多模态信息的对齐,从而增强目标注释表示的学习。此外,我们的方法遵循多任务学习方案,以帮助模型加强目标评论的表示,提高情感分析的有效性。大量的实验表明,所提出的框架在这项工作中的有效性。1
{"title":"Sentiment analysis for live video comments with variational residual representations","authors":"Changfan Luo ,&nbsp;Ling Fang ,&nbsp;Bensheng Qiu","doi":"10.1016/j.csl.2025.101838","DOIUrl":"10.1016/j.csl.2025.101838","url":null,"abstract":"<div><div>Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. <span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101838"},"PeriodicalIF":3.1,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144263942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring knowledge distillation for low-resource multi-modal streaming ASR in the CHiME-8 MMCSG challenge 在CHiME-8 MMCSG挑战中探索低资源多模态流ASR的知识蒸馏
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-06 DOI: 10.1016/j.csl.2025.101837
Hongbo Lan, Ya Jiang, Jun Du, Qing Wang
In the CHiME-8 Multi-modal Conversational Speech Recognition for Smart Glasses (MMCSG) challenge, participants were tasked with achieving real-time transcription of two-person conversations recorded with smart glasses. To address the scarcity of real-world data, we propose a knowledge distillation framework where a non-streaming teacher model, trained on augmented multi-channel audio, guides a streaming student model. Leveraging simulated data with varying overlap rates, the framework employs a logit-based Kullback–Leibler divergence loss alongside mean square error losses on hidden states and attention maps of Fast-Conformer layers to transfer knowledge from the teacher to the student, significantly improving the performance of the audio-only streaming automatic speech recognition (ASR) model. Furthermore, we exploit the synergy and complementarity of inertial measurement unit and audio data by developing a novel multi-modal streaming ASR model. Meanwhile, cross-modal distillation is performed by adopting the non-streaming audio-only teacher to guide the streaming multi-modal student. Experimental results demonstrate that our proposed multi-modal fusion and teacher-student learning framework effectively enhance the performance of streaming ASR models. Notably, our approach secured the first place in the sub-track of the CHiME-8 MMCSG challenge.
在CHiME-8智能眼镜多模态会话语音识别(MMCSG)挑战中,参与者的任务是实现用智能眼镜记录的两人对话的实时转录。为了解决现实世界数据的稀缺性,我们提出了一个知识蒸馏框架,其中一个非流教师模型,在增强多通道音频上训练,指导流学生模型。利用不同重叠率的模拟数据,该框架采用基于逻辑的Kullback-Leibler散度损失以及Fast-Conformer层的隐藏状态和注意图上的均方误差损失,将知识从教师传递给学生,显著提高了纯音频流自动语音识别(ASR)模型的性能。此外,我们通过开发一种新的多模态流ASR模型,利用惯性测量单元和音频数据的协同和互补性。同时,采用非流化的纯音频教师对流化的多模态学生进行跨模态升华。实验结果表明,我们提出的多模态融合和师生学习框架有效地提高了流ASR模型的性能。值得注意的是,我们的方法在CHiME-8 MMCSG挑战赛的子赛道中获得了第一名。
{"title":"Exploring knowledge distillation for low-resource multi-modal streaming ASR in the CHiME-8 MMCSG challenge","authors":"Hongbo Lan,&nbsp;Ya Jiang,&nbsp;Jun Du,&nbsp;Qing Wang","doi":"10.1016/j.csl.2025.101837","DOIUrl":"10.1016/j.csl.2025.101837","url":null,"abstract":"<div><div>In the CHiME-8 Multi-modal Conversational Speech Recognition for Smart Glasses (MMCSG) challenge, participants were tasked with achieving real-time transcription of two-person conversations recorded with smart glasses. To address the scarcity of real-world data, we propose a knowledge distillation framework where a non-streaming teacher model, trained on augmented multi-channel audio, guides a streaming student model. Leveraging simulated data with varying overlap rates, the framework employs a logit-based Kullback–Leibler divergence loss alongside mean square error losses on hidden states and attention maps of Fast-Conformer layers to transfer knowledge from the teacher to the student, significantly improving the performance of the audio-only streaming automatic speech recognition (ASR) model. Furthermore, we exploit the synergy and complementarity of inertial measurement unit and audio data by developing a novel multi-modal streaming ASR model. Meanwhile, cross-modal distillation is performed by adopting the non-streaming audio-only teacher to guide the streaming multi-modal student. Experimental results demonstrate that our proposed multi-modal fusion and teacher-student learning framework effectively enhance the performance of streaming ASR models. Notably, our approach secured the first place in the sub-track of the CHiME-8 MMCSG challenge.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101837"},"PeriodicalIF":3.1,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144240491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1