首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Design choices for PixIT-based speaker-attributed ASR: Team ToTaTo at the NOTSOFAR-1 challenge 基于pixit的演讲者属性ASR的设计选择:ToTaTo团队在NOTSOFAR-1挑战中
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-22 DOI: 10.1016/j.csl.2025.101824
Joonas Kalda , Séverin Baroudi , Martin Lebourdais , Clément Pagés , Ricard Marxer , Tanel Alumäe , Hervé Bredin
PixIT is a recently proposed joint training framework that integrates Permutation Invariant Training (PIT) for speaker diarization and Mixture Invariant Training (MixIT) for speech separation. By leveraging diarization labels, PixIT addresses MixIT’s limitations, producing aligned sources and speaker activations that enable automatic long-form separation. We investigate applications of PixIT on the speaker-attributed automatic speech recognition (SA-ASR) task based on our systems for the NOTSOFAR-1 Challenge. We explore modifications to the joint ToTaToNet by integrating advanced self-supervised learning (SSL) features and masking networks. We show that fine-tuning an ASR system on PixIT-separated sources significantly boosts downstream SA-ASR performance, outperforming standard diarization-based baselines without relying on synthetic data. We explore lightweight post-processing heuristics for improving SA-ASR timestamp errors caused by long silences and artifacts present in file-level separated sources. We also show the potential of extracting speaker embeddings for the diarization pipeline directly from separated sources, with performance rivaling standard methods without any fine-tuning of speaker embeddings. On the NOTSOFAR-1 Challenge dataset, our PixIT-based approach outperforms the CSS-based baseline by 20% in terms of tcpWER after fine-tuning the ASR system on the separated sources. Notably, even when using the same ASR model as the baseline, our system is able to outperform it, without using any of the provided domain-specific synthetic data. These advancements position PixIT as a robust and flexible solution for real-world SA-ASR.
PixIT是最近提出的一种联合训练框架,它集成了用于说话人特征化的排列不变性训练(PIT)和用于语音分离的混合不变性训练(MixIT)。通过利用diarization标签,PixIT解决了MixIT的局限性,生成对齐的源和扬声器激活,从而实现自动长格式分离。基于我们的NOTSOFAR-1挑战赛系统,我们研究了PixIT在说话人属性自动语音识别(SA-ASR)任务中的应用。我们通过集成高级自监督学习(SSL)特征和屏蔽网络来探索对联合ToTaToNet的修改。我们发现,在pixit分离源上对ASR系统进行微调可以显著提高下游SA-ASR性能,在不依赖合成数据的情况下优于标准的基于数字化的基线。我们探索轻量级的后处理启发式方法,以改进由文件级分离源中存在的长沉默和工件引起的SA-ASR时间戳错误。我们还展示了直接从分离的源中提取扬声器嵌入的潜力,其性能可与标准方法相媲美,而无需对扬声器嵌入进行任何微调。在NOTSOFAR-1 Challenge数据集上,在对分离源的ASR系统进行微调后,我们基于pixit的方法在tcpWER方面比基于css的基线高出20%。值得注意的是,即使在使用相同的ASR模型作为基线时,我们的系统也能够在不使用任何提供的特定于领域的合成数据的情况下优于它。这些进步使PixIT成为现实世界SA-ASR的强大而灵活的解决方案。
{"title":"Design choices for PixIT-based speaker-attributed ASR: Team ToTaTo at the NOTSOFAR-1 challenge","authors":"Joonas Kalda ,&nbsp;Séverin Baroudi ,&nbsp;Martin Lebourdais ,&nbsp;Clément Pagés ,&nbsp;Ricard Marxer ,&nbsp;Tanel Alumäe ,&nbsp;Hervé Bredin","doi":"10.1016/j.csl.2025.101824","DOIUrl":"10.1016/j.csl.2025.101824","url":null,"abstract":"<div><div>PixIT is a recently proposed joint training framework that integrates Permutation Invariant Training (PIT) for speaker diarization and Mixture Invariant Training (MixIT) for speech separation. By leveraging diarization labels, PixIT addresses MixIT’s limitations, producing aligned sources and speaker activations that enable automatic long-form separation. We investigate applications of PixIT on the speaker-attributed automatic speech recognition (SA-ASR) task based on our systems for the NOTSOFAR-1 Challenge. We explore modifications to the joint ToTaToNet by integrating advanced self-supervised learning (SSL) features and masking networks. We show that fine-tuning an ASR system on PixIT-separated sources significantly boosts downstream SA-ASR performance, outperforming standard diarization-based baselines without relying on synthetic data. We explore lightweight post-processing heuristics for improving SA-ASR timestamp errors caused by long silences and artifacts present in file-level separated sources. We also show the potential of extracting speaker embeddings for the diarization pipeline directly from separated sources, with performance rivaling standard methods without any fine-tuning of speaker embeddings. On the NOTSOFAR-1 Challenge dataset, our PixIT-based approach outperforms the CSS-based baseline by 20% in terms of tcpWER after fine-tuning the ASR system on the separated sources. Notably, even when using the same ASR model as the baseline, our system is able to outperform it, without using any of the provided domain-specific synthetic data. These advancements position PixIT as a robust and flexible solution for real-world SA-ASR.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101824"},"PeriodicalIF":3.1,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards decoupling frontend enhancement and backend recognition in monaural robust ASR 单鲁棒ASR中前端增强与后端识别的解耦研究
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-20 DOI: 10.1016/j.csl.2025.101821
Yufeng Yang , Ashutosh Pandey , DeLiang Wang
It has been shown that the intelligibility of noisy speech can be improved by speech enhancement (SE) algorithms. However, monaural SE has not been established as an effective frontend for automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between SE and ASR impedes the progress of robust ASR systems, especially as SE has made major advances in recent years. This paper focuses on eliminating this divide with an ARN (attentive recurrent network) time-domain, a TF-CrossNet time–frequency domain, and an MP-SENet magnitude-phase based enhancement model. The proposed systems decouple frontend enhancement and backend ASR, with the latter trained only on clean speech. Results on the WSJ, CHiME-2, LibriSpeech, and CHiME-4 corpora demonstrate that ARN, TF-CrossNet, and MP-SENet enhanced speech all translate to improved ASR results in noisy and reverberant environments, and generalize well to real acoustic scenarios. The proposed system outperforms the baselines trained on corrupted speech directly. Furthermore, it cuts the previous best word error rate (WER) on CHiME-2 by 28.4% relatively with a 5.6% WER, and achieves 3.3/4.4% WER on single-channel CHiME-4 simulated/real test data without training on CHiME-4. We also observe consistent improvements using noise-robust Whisper as the backend ASR model.
研究表明,语音增强(SE)算法可以提高噪声语音的可理解性。然而,与直接在有噪声的语音上训练的自动语音识别模型相比,单耳SE尚未被建立为在有噪声条件下自动语音识别(ASR)的有效前端。SE和ASR之间的鸿沟阻碍了强大的ASR系统的发展,特别是在SE近年来取得重大进展的情况下。本文的重点是通过ARN(注意循环网络)时域、TF-CrossNet时频域和MP-SENet基于幅度相位的增强模型来消除这种鸿沟。所提出的系统将前端增强和后端ASR解耦,后者仅在干净的语音上进行训练。在WSJ、CHiME-2、librisspeech和CHiME-4语料库上的结果表明,在嘈杂和混响环境中,ARN、TF-CrossNet和MP-SENet增强语音都转化为改善的ASR结果,并且可以很好地推广到真实声学场景。所提系统的性能优于直接在损坏语音上训练的基线。此外,该算法将先前在CHiME-2上的最佳单词错误率(WER)从5.6%的WER降低了28.4%,在单通道CHiME-4上未经训练的模拟/真实测试数据上达到3.3/4.4%的WER。我们还观察到使用噪声鲁棒Whisper作为后端ASR模型的一致性改进。
{"title":"Towards decoupling frontend enhancement and backend recognition in monaural robust ASR","authors":"Yufeng Yang ,&nbsp;Ashutosh Pandey ,&nbsp;DeLiang Wang","doi":"10.1016/j.csl.2025.101821","DOIUrl":"10.1016/j.csl.2025.101821","url":null,"abstract":"<div><div>It has been shown that the intelligibility of noisy speech can be improved by speech enhancement (SE) algorithms. However, monaural SE has not been established as an effective frontend for automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between SE and ASR impedes the progress of robust ASR systems, especially as SE has made major advances in recent years. This paper focuses on eliminating this divide with an ARN (attentive recurrent network) time-domain, a TF-CrossNet time–frequency domain, and an MP-SENet magnitude-phase based enhancement model. The proposed systems decouple frontend enhancement and backend ASR, with the latter trained only on clean speech. Results on the WSJ, CHiME-2, LibriSpeech, and CHiME-4 corpora demonstrate that ARN, TF-CrossNet, and MP-SENet enhanced speech all translate to improved ASR results in noisy and reverberant environments, and generalize well to real acoustic scenarios. The proposed system outperforms the baselines trained on corrupted speech directly. Furthermore, it cuts the previous best word error rate (WER) on CHiME-2 by 28.4% relatively with a 5.6% WER, and achieves <span><math><mrow><mn>3</mn><mo>.</mo><mn>3</mn><mo>/</mo><mn>4</mn><mo>.</mo><mn>4</mn><mtext>%</mtext></mrow></math></span> WER on single-channel CHiME-4 simulated/real test data without training on CHiME-4. We also observe consistent improvements using noise-robust Whisper as the backend ASR model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101821"},"PeriodicalIF":3.1,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BERSting at the screams: A benchmark for distanced, emotional and shouted speech recognition 对尖叫进行识别:远距离、情绪性和喊叫声语音识别的基准
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-16 DOI: 10.1016/j.csl.2025.101815
Paige Tuttösí , Mantaj Dhillon , Luna Sang , Shane Eastwood , Poorvi Bhatia , Quang Minh Dinh , Avni Kapoor , Yewon Jin , Angelica Lim
Some speech recognition tasks, such as automatic speech recognition (ASR), are approaching or have reached human performance in many reported metrics. Yet, they continue to struggle in complex, real-world, situations, such as with distanced speech. Previous challenges have released datasets to address the issue of distanced ASR, however, the focus remains primarily on distance, specifically relying on multi-microphone array systems. Here we present the B(asic) E(motion) R(andom phrase) S(hou)t(s) (BERSt) dataset. The dataset contains almost 4 h of English speech from 98 actors with varying regional and non-native accents. The data was collected on smartphones in the actors homes and therefore includes at least 98 different acoustic environments. The data also includes 7 different emotion prompts and both shouted and spoken utterances. The smartphones were places in 19 different positions, including obstructions and being in a different room than the actor. This data is publicly available for use and can be used to evaluate a variety of speech recognition tasks, including: ASR, shout detection, and speech emotion recognition (SER). We provide initial benchmarks for ASR and SER tasks, and find that ASR degrades both with an increase in distance and shout level and shows varied performance depending on the intended emotion. Our results show that the BERSt dataset is challenging for both ASR and SER tasks and continued work is needed to improve the robustness of such systems for more accurate real-world use.
一些语音识别任务,如自动语音识别(ASR),在许多报告的指标中正在接近或已经达到人类的表现。然而,他们仍然在复杂的现实世界中挣扎,比如远距离讲话。之前的挑战已经发布了数据集来解决远程ASR问题,然而,重点仍然主要集中在距离上,特别是依赖于多麦克风阵列系统。在这里,我们提出了B(基本)E(运动)R(随机短语)S(如何)t(S) (BERSt)数据集。该数据集包含来自98位演员的近4小时的英语演讲,这些演员有不同的地区和非母语口音。这些数据是在演员家中的智能手机上收集的,因此包括至少98种不同的声学环境。这些数据还包括7种不同的情绪提示,以及喊叫和说话的话语。智能手机被放置在19个不同的位置,包括障碍物和与演员不同的房间。这些数据是公开可用的,可用于评估各种语音识别任务,包括:ASR、呼喊检测和语音情感识别(SER)。我们为ASR和SER任务提供了初始基准,发现ASR会随着距离和呼喊水平的增加而降低,并根据预期的情绪表现出不同的表现。我们的研究结果表明,BERSt数据集对于ASR和SER任务都具有挑战性,需要继续工作来提高这些系统的鲁棒性,以便更准确地在现实世界中使用。
{"title":"BERSting at the screams: A benchmark for distanced, emotional and shouted speech recognition","authors":"Paige Tuttösí ,&nbsp;Mantaj Dhillon ,&nbsp;Luna Sang ,&nbsp;Shane Eastwood ,&nbsp;Poorvi Bhatia ,&nbsp;Quang Minh Dinh ,&nbsp;Avni Kapoor ,&nbsp;Yewon Jin ,&nbsp;Angelica Lim","doi":"10.1016/j.csl.2025.101815","DOIUrl":"10.1016/j.csl.2025.101815","url":null,"abstract":"<div><div>Some speech recognition tasks, such as automatic speech recognition (ASR), are approaching or have reached human performance in many reported metrics. Yet, they continue to struggle in complex, real-world, situations, such as with distanced speech. Previous challenges have released datasets to address the issue of distanced ASR, however, the focus remains primarily on distance, specifically relying on multi-microphone array systems. Here we present the B(asic) E(motion) R(andom phrase) S(hou)t(s) (BERSt) dataset. The dataset contains almost 4 h of English speech from 98 actors with varying regional and non-native accents. The data was collected on smartphones in the actors homes and therefore includes at least 98 different acoustic environments. The data also includes 7 different emotion prompts and both shouted and spoken utterances. The smartphones were places in 19 different positions, including obstructions and being in a different room than the actor. This data is publicly available for use and can be used to evaluate a variety of speech recognition tasks, including: ASR, shout detection, and speech emotion recognition (SER). We provide initial benchmarks for ASR and SER tasks, and find that ASR degrades both with an increase in distance and shout level and shows varied performance depending on the intended emotion. Our results show that the BERSt dataset is challenging for both ASR and SER tasks and continued work is needed to improve the robustness of such systems for more accurate real-world use.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101815"},"PeriodicalIF":3.1,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An item response theory framework to evaluate automatic speech recognition systems against speech difficulty 基于项目反应理论的语音自动识别系统言语困难评价
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-16 DOI: 10.1016/j.csl.2025.101817
Chaina Santos Oliveira, Ricardo B.C. Prudêncio
Evaluating the performance of Automatic Speech Recognition (ASR) systems is very relevant for selecting good techniques and understanding their advantages and limitations. ASR systems are usually evaluated by adopting test sets of audio speeches, ideally with different difficulty levels. In this sense, it is important to analyse whether a system under test correctly transcribes easy test speeches, while being robust to the most difficult ones. In this paper, a novel framework is proposed for evaluating ASR systems, which covers two complementary issues: (1) to measure the difficulty of each test speech; and (2) to analyse each ASR system’s performance against the difficulty level. Regarding the first issue, the framework measures speech difficulty by adopting Item Response Theory (IRT). Regarding the second issue, the Recognizer Characteristic Curve (RCC) is proposed, which is a plot of the ASR system’s performance versus speech difficulty. ASR performance is further analysed by a two-dimensional plot, in which speech difficulty is decomposed by IRT into sentence difficulty and speaker quality. In the experiments, the proposed framework was applied in a test set produced by adopting text-to-speech tools, with diverse speakers and sentences. Additionally, noise injection was applied to produce test items with even higher difficulty levels. In the experiments, noise injection actually increases difficulty and generates a wide variety of speeches to assess ASR performance. However, it is essential to pay attention that high noise levels can lead to an unreliable evaluation. The proposed plots were helpful for both identifying robust ASR systems as well as for choosing the noise level that results in both diversity and reliability.
评估自动语音识别(ASR)系统的性能对于选择好的技术和了解它们的优点和局限性是非常重要的。ASR系统通常通过采用音频演讲的测试集来评估,最好具有不同的难度级别。从这个意义上说,重要的是要分析一个被测系统是否正确地转录了简单的测试演讲,同时对最难的演讲保持稳健。本文提出了一个新的评估ASR系统的框架,它涵盖了两个互补的问题:(1)衡量每个测试语音的难度;(2)分析每个ASR系统在不同难度下的性能。关于第一个问题,该框架采用项目反应理论(IRT)来衡量言语困难。关于第二个问题,提出了识别器特征曲线(RCC),它是ASR系统的性能与语音困难的关系图。通过二维图进一步分析ASR性能,其中语音困难通过IRT分解为句子困难和说话者质量。在实验中,提出的框架应用于采用文本到语音工具生成的测试集,该测试集具有不同的说话者和句子。此外,噪音注入被用于产生更高难度的测试项目。在实验中,噪音注入实际上增加了难度,并产生了各种各样的语音来评估ASR的表现。然而,必须注意的是,高噪音水平可能导致不可靠的评估。所提出的图既有助于识别鲁棒ASR系统,也有助于选择导致多样性和可靠性的噪声水平。
{"title":"An item response theory framework to evaluate automatic speech recognition systems against speech difficulty","authors":"Chaina Santos Oliveira,&nbsp;Ricardo B.C. Prudêncio","doi":"10.1016/j.csl.2025.101817","DOIUrl":"10.1016/j.csl.2025.101817","url":null,"abstract":"<div><div>Evaluating the performance of Automatic Speech Recognition (ASR) systems is very relevant for selecting good techniques and understanding their advantages and limitations. ASR systems are usually evaluated by adopting test sets of audio speeches, ideally with different difficulty levels. In this sense, it is important to analyse whether a system under test correctly transcribes easy test speeches, while being robust to the most difficult ones. In this paper, a novel framework is proposed for evaluating ASR systems, which covers two complementary issues: (1) to measure the difficulty of each test speech; and (2) to analyse each ASR system’s performance against the difficulty level. Regarding the first issue, the framework measures speech difficulty by adopting Item Response Theory (IRT). Regarding the second issue, the Recognizer Characteristic Curve (RCC) is proposed, which is a plot of the ASR system’s performance versus speech difficulty. ASR performance is further analysed by a two-dimensional plot, in which speech difficulty is decomposed by IRT into sentence difficulty and speaker quality. In the experiments, the proposed framework was applied in a test set produced by adopting text-to-speech tools, with diverse speakers and sentences. Additionally, noise injection was applied to produce test items with even higher difficulty levels. In the experiments, noise injection actually increases difficulty and generates a wide variety of speeches to assess ASR performance. However, it is essential to pay attention that high noise levels can lead to an unreliable evaluation. The proposed plots were helpful for both identifying robust ASR systems as well as for choosing the noise level that results in both diversity and reliability.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101817"},"PeriodicalIF":3.1,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144072019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel graph kernel algorithm for improving the effect of text classification 一种新的提高文本分类效果的图核算法
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-16 DOI: 10.1016/j.csl.2025.101818
Fan Yang , Tan Zhu , Jing Huang , Zhilin Huang , Guoqi Xie
Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.
We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.
文本分类是自然语言处理中的一个重要课题。近年来,图核方法和深度学习方法在文本分类任务中得到了广泛的应用。然而,以前的图核算法过于关注图结构本身,如最短路径子图,而对文本本身的信息关注有限。以前的深度学习方法往往导致大量的计算资源的利用。因此,我们提出了一种新的图核算法来解决这些缺点。首先,我们使用术语加权方案提取文档的文本信息。其次,我们收集文档图的结构信息。第三,利用图核进行文本分类的相似度度量。我们在三个实验数据集上比较了八种基线方法,包括传统的深度学习方法和基于图的分类方法,并在多个指标上测试了我们的算法。实验结果表明,我们的算法在准确率方面优于其他基线方法。此外,它实现了内存消耗最少减少69%,运行时最少减少23%。此外,随着我们减少训练数据的百分比,与其他深度学习方法相比,我们的算法继续取得更好的结果。优秀的实验结果表明,我们的算法可以在保证高准确率的前提下,提高文本分类任务的效率,减少对计算机资源的占用。
{"title":"A novel graph kernel algorithm for improving the effect of text classification","authors":"Fan Yang ,&nbsp;Tan Zhu ,&nbsp;Jing Huang ,&nbsp;Zhilin Huang ,&nbsp;Guoqi Xie","doi":"10.1016/j.csl.2025.101818","DOIUrl":"10.1016/j.csl.2025.101818","url":null,"abstract":"<div><div>Text classification is an important topic in natural language processing. In recent years, both graph kernel methods and deep learning methods have been widely employed in text classification tasks. However, previous graph kernel algorithms focused too much on the graph structure itself, such as the shortest path subgraph,while focusing limited attention to the information of the text itself. Previous deep learning methods have often resulted in substantial utilization of computational resources. Therefore,we propose a new graph kernel algorithm to address the disadvantages. First,we extract the textual information of the document using the term weighting scheme. Second,we collect the structural information on the document graph. Third, graph kernel is used for similarity measurement for text classification.</div><div>We compared eight baseline methods on three experimental datasets, including traditional deep learning methods and graph-based classification methods, and tested our algorithm on multiple indicators. The experimental results demonstrate that our algorithm outperforms other baseline methods in terms of accuracy. Furthermore, it achieves a minimum reduction of 69% in memory consumption and a minimum decrease of 23% in runtime. Furthermore, as we decrease the percentage of training data, our algorithm continues to achieve superior results compared to other deep learning methods. The excellent experimental results show that our algorithm can improve the efficiency of text classification tasks and reduce the occupation of computer resources under the premise of ensuring high accuracy.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101818"},"PeriodicalIF":3.1,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of modular multi-speaker distant conversational speech recognition 模块化多扬声器远程会话语音识别的优化
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-14 DOI: 10.1016/j.csl.2025.101816
Qinwen Hu , Tianchi Sun , Xin’an Chen , Xiaobin Rong , Jing Lu
Conducting multi-speaker distant conversational speech recognition on real meeting recordings is a challenging task and has recently become an active area of research. In this work, we focus on modular approaches to addressing this challenge, integrating continuous speech separation (CSS), automatic speech recognition (ASR), and speaker diarization in a pipeline. We explore the effective utilization of the high-performing separation model, TF-GridNet, within our system and propose integration techniques to enhance the performance of the ASR and diarization modules. Our system is evaluated on both the LibriCSS and the real-world CHiME-8 NOTSOFAR-1 dataset. Through a comprehensive analysis of the system’s generalization performance, we identify key areas for further improvement in the front-end module.
对真实会议录音进行多发言者远程会话语音识别是一项具有挑战性的任务,近年来已成为一个活跃的研究领域。在这项工作中,我们专注于模块化方法来解决这一挑战,在管道中集成连续语音分离(CSS),自动语音识别(ASR)和说话者拨号。我们探索了在我们的系统中有效利用高性能分离模型TF-GridNet,并提出了集成技术来提高ASR和diarization模块的性能。我们的系统在LibriCSS和现实世界的CHiME-8 NOTSOFAR-1数据集上进行了评估。通过对系统泛化性能的综合分析,确定了前端模块需要进一步改进的关键领域。
{"title":"Optimization of modular multi-speaker distant conversational speech recognition","authors":"Qinwen Hu ,&nbsp;Tianchi Sun ,&nbsp;Xin’an Chen ,&nbsp;Xiaobin Rong ,&nbsp;Jing Lu","doi":"10.1016/j.csl.2025.101816","DOIUrl":"10.1016/j.csl.2025.101816","url":null,"abstract":"<div><div>Conducting multi-speaker distant conversational speech recognition on real meeting recordings is a challenging task and has recently become an active area of research. In this work, we focus on modular approaches to addressing this challenge, integrating continuous speech separation (CSS), automatic speech recognition (ASR), and speaker diarization in a pipeline. We explore the effective utilization of the high-performing separation model, TF-GridNet, within our system and propose integration techniques to enhance the performance of the ASR and diarization modules. Our system is evaluated on both the LibriCSS and the real-world CHiME-8 NOTSOFAR-1 dataset. Through a comprehensive analysis of the system’s generalization performance, we identify key areas for further improvement in the front-end module.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101816"},"PeriodicalIF":3.1,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An end-to-end integration of speech separation and recognition with self-supervised learning representation 基于自监督学习表征的语音分离与识别端到端集成
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-14 DOI: 10.1016/j.csl.2025.101813
Yoshiki Masuyama , Xuankai Chang , Wangyou Zhang , Samuele Cornell , Zhong-Qiu Wang , Nobutaka Ono , Yanmin Qian , Shinji Watanabe
Multi-speaker automatic speech recognition (ASR) has gained growing attention in a wide range of applications, including conversation analysis and human–computer interaction. Speech separation and enhancement (SSE) and single-speaker ASR have witnessed remarkable performance improvements with the rapid advances in deep learning. Complex spectral mapping predicts the short-time Fourier transform (STFT) coefficients of each speaker and has achieved promising results in several SSE benchmarks. Meanwhile, self-supervised learning representation (SSLR) has demonstrated its significant advantage in single-speaker ASR. In this work, we push forward the performance of multi-speaker ASR under noisy reverberant conditions by integrating powerful SSE, SSL, and ASR models in an end-to-end manner. We systematically investigate both monaural and multi-channel SSE methods and various feature representations. Our experiments demonstrate the advantages of recently proposed complex spectral mapping and SSLRs in multi-speaker ASR. The experimental results also confirm that end-to-end fine-tuning with an ASR criterion is important to achieve state-of-the-art word error rates (WERs) even with powerful pre-trained models. Moreover, we show the performance trade-off between SSE and ASE and mitigate it with a multi-task learning framework with both SSE and ASR criteria.
多扬声器自动语音识别(ASR)在会话分析和人机交互等领域的应用越来越受到人们的关注。随着深度学习的快速发展,语音分离和增强(SSE)和单说话人语音识别(ASR)的性能得到了显著提高。复频谱映射预测了每个扬声器的短时傅里叶变换(STFT)系数,并在几个SSE基准测试中取得了令人满意的结果。与此同时,自监督学习表征(SSLR)在单语ASR中也显示出了显著的优势。在这项工作中,我们通过集成强大的SSE、SSL和ASR模型,以端到端方式提高了多扬声器ASR在嘈杂混响条件下的性能。我们系统地研究了单通道和多通道SSE方法以及各种特征表示。我们的实验证明了最近提出的复杂频谱映射和sslr在多扬声器ASR中的优势。实验结果还证实,即使使用强大的预训练模型,基于ASR标准的端到端微调对于实现最先进的单词错误率(wer)也很重要。此外,我们展示了SSE和ASE之间的性能权衡,并使用具有SSE和ASR标准的多任务学习框架来缓解它。
{"title":"An end-to-end integration of speech separation and recognition with self-supervised learning representation","authors":"Yoshiki Masuyama ,&nbsp;Xuankai Chang ,&nbsp;Wangyou Zhang ,&nbsp;Samuele Cornell ,&nbsp;Zhong-Qiu Wang ,&nbsp;Nobutaka Ono ,&nbsp;Yanmin Qian ,&nbsp;Shinji Watanabe","doi":"10.1016/j.csl.2025.101813","DOIUrl":"10.1016/j.csl.2025.101813","url":null,"abstract":"<div><div>Multi-speaker automatic speech recognition (ASR) has gained growing attention in a wide range of applications, including conversation analysis and human–computer interaction. Speech separation and enhancement (SSE) and single-speaker ASR have witnessed remarkable performance improvements with the rapid advances in deep learning. Complex spectral mapping predicts the short-time Fourier transform (STFT) coefficients of each speaker and has achieved promising results in several SSE benchmarks. Meanwhile, self-supervised learning representation (SSLR) has demonstrated its significant advantage in single-speaker ASR. In this work, we push forward the performance of multi-speaker ASR under noisy reverberant conditions by integrating powerful SSE, SSL, and ASR models in an end-to-end manner. We systematically investigate both monaural and multi-channel SSE methods and various feature representations. Our experiments demonstrate the advantages of recently proposed complex spectral mapping and SSLRs in multi-speaker ASR. The experimental results also confirm that end-to-end fine-tuning with an ASR criterion is important to achieve state-of-the-art word error rates (WERs) even with powerful pre-trained models. Moreover, we show the performance trade-off between SSE and ASE and mitigate it with a multi-task learning framework with both SSE and ASR criteria.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101813"},"PeriodicalIF":3.1,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143948781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring features for membership inference in ASR model auditing 探索ASR模型审计中成员推理的特性
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-12 DOI: 10.1016/j.csl.2025.101812
Francisco Teixeira , Karla Pizzi , Raphaël Olivier , Alberto Abad , Bhiksha Raj , Isabel Trancoso
Membership inference (MI) poses a substantial privacy threat to the training data of automatic speech recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. We compare our proposed features with commonly used error-based features for both sample-level and speaker-level MI. We find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtain a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.
隶属推理(MI)对自动语音识别(ASR)系统的训练数据构成了实质性的隐私威胁,同时也为审计这些模型的用户数据提供了机会。本文探讨了基于损失的特征与高斯和对抗性扰动相结合在ASR模型中执行MI的有效性。我们将我们提出的特征与样本级和说话人级MI中常用的基于误差的特征进行了比较。我们发现,我们提出的特征极大地提高了样本级MI的性能。对于说话人级MI,这些特征改善了结果,尽管幅度较小,因为基于误差的特征已经在该任务中获得了高性能。我们的研究结果强调了在ASR系统中考虑不同的特征集和对目标模型的访问级别对于有效MI的重要性,为审计这些模型提供了有价值的见解。
{"title":"Exploring features for membership inference in ASR model auditing","authors":"Francisco Teixeira ,&nbsp;Karla Pizzi ,&nbsp;Raphaël Olivier ,&nbsp;Alberto Abad ,&nbsp;Bhiksha Raj ,&nbsp;Isabel Trancoso","doi":"10.1016/j.csl.2025.101812","DOIUrl":"10.1016/j.csl.2025.101812","url":null,"abstract":"<div><div>Membership inference (MI) poses a substantial privacy threat to the training data of automatic speech recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. We compare our proposed features with commonly used error-based features for both sample-level and speaker-level MI. We find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtain a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101812"},"PeriodicalIF":3.1,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144072018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modality fusion using auxiliary tasks for dementia detection 用辅助任务进行痴呆检测的模态融合
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-10 DOI: 10.1016/j.csl.2025.101814
Hangshou Shao, Yilin Pan, Yue Wang, Yijia Zhang
Alzheimer’s disease is the leading cause of dementia that affects elderly individual’s speech and language abilities. In this paper, a Feature Fusion Model with Guide Patterns (FFG) is designed as an acoustic- and linguistic-based dementia detection system, considering the limited publicly available data and modalities fusion inefficiency. Specifically, a multi-modal features interaction module composed of multiple co-attention layers is designed to improve multi-modal interaction between the acoustic and linguistic information embedded in the audio recordings. Given the limited audio recordings available in public datasets, guide patterns are introduced as auxiliary tasks to enhance the interaction between acoustic and linguistic information. Our proposed FFG model is evaluated with three publicly available datasets, namely, Pitt, ADReSS, and ADReSSo. Experimental results demonstrate that the FFG model can achieve superior resu lts on all three publicly available datasets. An exceptional performance of 85.85% and 84.30% accuracy was achieved on the Pitt and ADReSSo datasets. The ablation study demonstrated the efficiency of our proposed model.
阿尔茨海默病是影响老年人言语和语言能力的痴呆症的主要原因。考虑到公开数据有限和模式融合效率低下,本文设计了一种基于声学和语言的痴呆检测系统。具体而言,设计了由多个共同注意层组成的多模态特征交互模块,以改善音频记录中嵌入的声学和语言信息之间的多模态交互。鉴于公共数据集中可用的音频记录有限,引入引导模式作为辅助任务,以增强声学和语言信息之间的相互作用。我们提出的FFG模型使用三个公开可用的数据集进行评估,即Pitt, address和addresso。实验结果表明,FFG模型在三个公开的数据集上都能取得较好的结果。在Pitt和addresso数据集上分别获得了85.85%和84.30%的准确率。烧蚀实验证明了该模型的有效性。
{"title":"Modality fusion using auxiliary tasks for dementia detection","authors":"Hangshou Shao,&nbsp;Yilin Pan,&nbsp;Yue Wang,&nbsp;Yijia Zhang","doi":"10.1016/j.csl.2025.101814","DOIUrl":"10.1016/j.csl.2025.101814","url":null,"abstract":"<div><div>Alzheimer’s disease is the leading cause of dementia that affects elderly individual’s speech and language abilities. In this paper, a <strong>F</strong>eature <strong>F</strong>usion Model with <strong>G</strong>uide Patterns (FFG) is designed as an acoustic- and linguistic-based dementia detection system, considering the limited publicly available data and modalities fusion inefficiency. Specifically, a multi-modal features interaction module composed of multiple co-attention layers is designed to improve multi-modal interaction between the acoustic and linguistic information embedded in the audio recordings. Given the limited audio recordings available in public datasets, guide patterns are introduced as auxiliary tasks to enhance the interaction between acoustic and linguistic information. Our proposed FFG model is evaluated with three publicly available datasets, namely, Pitt, ADReSS, and ADReSSo. Experimental results demonstrate that the FFG model can achieve superior resu lts on all three publicly available datasets. An exceptional performance of 85.85% and 84.30% accuracy was achieved on the Pitt and ADReSSo datasets. The ablation study demonstrated the efficiency of our proposed model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101814"},"PeriodicalIF":3.1,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combined generative and predictive modeling for speech super-resolution 结合生成和预测建模的语音超分辨率
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-08 DOI: 10.1016/j.csl.2025.101808
Heming Wang , Eric W. Healy , DeLiang Wang
Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive models are known to perform well in fixed experimental settings, but can introduce artifacts in adverse conditions. On the other hand, generative models learn the distribution of target data and have a better capacity to perform well on unseen conditions. In this study, we propose a novel two-stage approach that combines the strengths of predictive and generative models. Specifically, we employ a diffusion-based model that is conditioned on the output of a predictive model. Our experiments demonstrate that the model significantly outperforms single-stage counterparts and existing strong baselines on benchmark SR datasets. Furthermore, we introduce a repainting technique during the inference of the diffusion process, enabling the proposed model to regenerate high-frequency components even in mismatched conditions. An additional contribution is the collection of and evaluation on real SR recordings, using the same microphone at different native sampling rates. We make this dataset freely accessible, to accelerate progress towards real-world speech super-resolution.
语音超分辨率(SR)是一种从低分辨率输入恢复高分辨率语音的任务。现有模型采用模拟数据和受限的实验设置,这限制了对现实世界sr的推广。已知预测模型在固定的实验设置中表现良好,但在不利条件下可能会引入伪像。另一方面,生成模型学习目标数据的分布,在未知条件下有更好的表现能力。在这项研究中,我们提出了一种新的两阶段方法,结合了预测模型和生成模型的优势。具体来说,我们采用了基于扩散的模型,该模型以预测模型的输出为条件。我们的实验表明,该模型在基准SR数据集上显著优于单阶段对应模型和现有的强基线。此外,我们在扩散过程的推理过程中引入了一种重涂技术,使所提出的模型即使在不匹配的条件下也能重新生成高频成分。一个额外的贡献是收集和评估真实的SR记录,使用相同的麦克风在不同的本地采样率。我们让这个数据集可以免费访问,以加速现实世界语音超分辨率的发展。
{"title":"Combined generative and predictive modeling for speech super-resolution","authors":"Heming Wang ,&nbsp;Eric W. Healy ,&nbsp;DeLiang Wang","doi":"10.1016/j.csl.2025.101808","DOIUrl":"10.1016/j.csl.2025.101808","url":null,"abstract":"<div><div>Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive models are known to perform well in fixed experimental settings, but can introduce artifacts in adverse conditions. On the other hand, generative models learn the distribution of target data and have a better capacity to perform well on unseen conditions. In this study, we propose a novel two-stage approach that combines the strengths of predictive and generative models. Specifically, we employ a diffusion-based model that is conditioned on the output of a predictive model. Our experiments demonstrate that the model significantly outperforms single-stage counterparts and existing strong baselines on benchmark SR datasets. Furthermore, we introduce a repainting technique during the inference of the diffusion process, enabling the proposed model to regenerate high-frequency components even in mismatched conditions. An additional contribution is the collection of and evaluation on real SR recordings, using the same microphone at different native sampling rates. We make this dataset freely accessible, to accelerate progress towards real-world speech super-resolution.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101808"},"PeriodicalIF":3.1,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1