首页 > 最新文献

IEEE/ACM Transactions on Audio, Speech, and Language Processing最新文献

英文 中文
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning ROSE: 利用多目标学习在空中交通管制中建立以识别为导向的语音增强框架
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-11 DOI: 10.1109/TASLP.2024.3423652
Xincheng Yu;Dongyue Guo;Jianwei Zhang;Yi Lin
Radio speech echo is a specific phenomenon in the air traffic control (ATC) domain, which degrades speech quality and further impacts automatic speech recognition (ASR) accuracy. In this work, a time-domain recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy based on convolutional encoder-decoder-based U-Net framework, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model. Specifically, 1) In the U-Net architecture, an attention-based skip-fusion (ABSF) module is applied to mine shared features from encoders using an attention mask, which enables the model to effectively fuse the hierarchical features. 2) A channel and sequence attention (CSAtt) module is innovatively designed to guide the model to focus on informative features in dual parallel attention paths, aiming to enhance the effective representations and suppress the interference noises. 3) Based on the handcrafted features, ASR-oriented optimization targets are designed to improve recognition performance in the ATC environment by learning robust feature representations. By incorporating both the SE-oriented and ASR-oriented losses, ROSE is implemented in a multi-objective learning manner by optimizing shared representations across the two task objectives. The experimental results show that the ROSE significantly outperforms other state-of-the-art methods for both the SE and ASR tasks, in which all the proposed improvements are confirmed by designed experiments. In addition, the proposed approach can contribute to the desired performance improvements on public datasets.
无线电语音回声是空中交通管制(ATC)领域的一种特殊现象,它会降低语音质量并进一步影响自动语音识别(ASR)的准确性。本研究基于卷积编码器-解码器的 U-Net 框架,提出了一种面向时域识别的语音增强(ROSE)框架,以改善语音清晰度,同时提高自动语音识别(ASR)的准确性。具体来说,1)在 U-Net 架构中,基于注意力的跳过融合(ABSF)模块利用注意力掩码从编码器中挖掘共享特征,从而使模型能够有效融合分层特征。2) 创新设计了通道和序列注意(CSAtt)模块,引导模型在双并行注意路径中关注信息特征,旨在增强有效表征并抑制干扰噪声。3) 在手工特征的基础上,设计了面向 ASR 的优化目标,通过学习稳健的特征表征来提高空管环境下的识别性能。通过结合面向 SE 和面向 ASR 的损失,ROSE 以多目标学习的方式,通过优化两个任务目标的共享表征来实现。实验结果表明,在 SE 和 ASR 任务中,ROSE 的表现明显优于其他最先进的方法。此外,所提出的方法还有助于提高公共数据集的性能。
{"title":"ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning","authors":"Xincheng Yu;Dongyue Guo;Jianwei Zhang;Yi Lin","doi":"10.1109/TASLP.2024.3423652","DOIUrl":"10.1109/TASLP.2024.3423652","url":null,"abstract":"Radio speech echo is a specific phenomenon in the air traffic control (ATC) domain, which degrades speech quality and further impacts automatic speech recognition (ASR) accuracy. In this work, a time-domain recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy based on convolutional encoder-decoder-based U-Net framework, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model. Specifically, 1) In the U-Net architecture, an attention-based skip-fusion (ABSF) module is applied to mine shared features from encoders using an attention mask, which enables the model to effectively fuse the hierarchical features. 2) A channel and sequence attention (CSAtt) module is innovatively designed to guide the model to focus on informative features in dual parallel attention paths, aiming to enhance the effective representations and suppress the interference noises. 3) Based on the handcrafted features, ASR-oriented optimization targets are designed to improve recognition performance in the ATC environment by learning robust feature representations. By incorporating both the SE-oriented and ASR-oriented losses, ROSE is implemented in a multi-objective learning manner by optimizing shared representations across the two task objectives. The experimental results show that the ROSE significantly outperforms other state-of-the-art methods for both the SE and ASR tasks, in which all the proposed improvements are confirmed by designed experiments. In addition, the proposed approach can contribute to the desired performance improvements on public datasets.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3365-3378"},"PeriodicalIF":4.1,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overview of the Ninth Dialog System Technology Challenge: DSTC9 第九届对话系统技术挑战赛概览:DSTC9
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-11 DOI: 10.1109/TASLP.2024.3426331
Chulaka Gunasekara;Seokhwan Kim;Luis Fernando D'Haro;Abhinav Rastogi;Yun-Nung Chen;Mihail Eric;Behnam Hedayatnia;Karthik Gopalakrishnan;Yang Liu;Chao-Wei Huang;Dilek Hakkani-Tür;Jinchao Li;Qi Zhu;Lingxiao Luo;Lars Liden;Kaili Huang;Shahin Shayandeh;Runze Liang;Baolin Peng;Zheng Zhang;Swadheen Shukla;Minlie Huang;Jianfeng Gao;Shikib Mehri;Yulan Feng;Carla Gordon;Seyed Hossein Alavi;David Traum;Maxine Eskenazi;Ahmad Beirami;Eunjoon Cho;Paul A. Crook;Ankita De;Alborz Geramifard;Satwik Kottur;Seungwhan Moon;Shivani Poddar;Rajen Subba
This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with Unstructured Knowledge Access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog and 4. Situated interactive multimodal dialog. This paper describes the task definition, provided datasets, baselines, and evaluation setup for each track. We also summarize the results of the submitted systems to highlight the general trends of the state-of-the-art technologies for the tasks.
本文介绍第九届对话系统技术挑战赛(DSTC-9)。本期 DSTC 的重点是将端到端对话技术应用于对话系统中的四项不同任务,即:1.非结构化知识访问的任务导向对话建模;2.多领域任务导向对话;3.对话的交互式评估;4.情景交互式多模态对话。情景交互式多模态对话。本文介绍了每个赛道的任务定义、提供的数据集、基线和评估设置。我们还总结了所提交系统的结果,以突出这些任务的最新技术的总体趋势。
{"title":"Overview of the Ninth Dialog System Technology Challenge: DSTC9","authors":"Chulaka Gunasekara;Seokhwan Kim;Luis Fernando D'Haro;Abhinav Rastogi;Yun-Nung Chen;Mihail Eric;Behnam Hedayatnia;Karthik Gopalakrishnan;Yang Liu;Chao-Wei Huang;Dilek Hakkani-Tür;Jinchao Li;Qi Zhu;Lingxiao Luo;Lars Liden;Kaili Huang;Shahin Shayandeh;Runze Liang;Baolin Peng;Zheng Zhang;Swadheen Shukla;Minlie Huang;Jianfeng Gao;Shikib Mehri;Yulan Feng;Carla Gordon;Seyed Hossein Alavi;David Traum;Maxine Eskenazi;Ahmad Beirami;Eunjoon Cho;Paul A. Crook;Ankita De;Alborz Geramifard;Satwik Kottur;Seungwhan Moon;Shivani Poddar;Rajen Subba","doi":"10.1109/TASLP.2024.3426331","DOIUrl":"10.1109/TASLP.2024.3426331","url":null,"abstract":"This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with Unstructured Knowledge Access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog and 4. Situated interactive multimodal dialog. This paper describes the task definition, provided datasets, baselines, and evaluation setup for each track. We also summarize the results of the submitted systems to highlight the general trends of the state-of-the-art technologies for the tasks.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4066-4076"},"PeriodicalIF":4.1,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10595468","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MVT: Chinese NER Using Multi-View Transformer 使用多视图变换器进行中文近义词识别
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-10 DOI: 10.1109/TASLP.2024.3426287
Yinlong Xiao;Zongcheng Ji;Jianqiang Li;Mei Han
Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice (i.e., a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.
在中文命名实体识别(NER)中整合词汇知识已被证明是有效的。在现有方法中,扁平格转换器(FLAT)在性能和效率方面都取得了巨大成功。FLAT 通过构建平面网格(即包括句子中的字符和词库中的匹配词在内的标记序列)和计算具有全连接结构的自注意力来对每个句子进行词性增强。然而,完全连接结构的自注意力无法很好地捕捉到标记之间的不同交互作用,而这些交互作用会为中文 NER 带来不同方面的语义信息。在本文中,我们提出了一种新颖的多视图转换器(Multi-View Transformer,MVT),以有效捕捉标记之间的不同交互。我们首先定义了四种视图,以捕捉四种不同的标记交互结构。然后,我们根据相应的结构为每个视图构建一个视图感知可见矩阵,并为每个视图引入一个视图感知点积注意力,通过结合相应的可见矩阵来限制注意力范围。最后,我们设计了三种不同的 MVT 变体,在 Transformer 架构的不同层次上融合多视图特征。在四个公开的中文 NER 数据集上进行的实验结果表明了所提方法的有效性。具体来说,在最具挑战性的非正式文本数据集微博上,MVT 的 F1 得分比 FLAT 高出 2.56%,如果与 BERT 结合使用,MVT 的 F1 得分比 FLAT 高出 3.03%。
{"title":"MVT: Chinese NER Using Multi-View Transformer","authors":"Yinlong Xiao;Zongcheng Ji;Jianqiang Li;Mei Han","doi":"10.1109/TASLP.2024.3426287","DOIUrl":"10.1109/TASLP.2024.3426287","url":null,"abstract":"Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice (i.e., a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3656-3668"},"PeriodicalIF":4.1,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion T5G2P:基于词素到音素转换的文本到文本转换器
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-10 DOI: 10.1109/TASLP.2024.3426332
Markéta Řezáčková;Daniel Tihelka;Jindřich Matoušek
The present paper explores the use of several deep neural network architectures to carry out a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language-independent approach to the task. The models explored are trained on whole sentences in order to automatically capture cross-word context (such as voicedness assimilation) if it exists in the given language. Four different languages, English, Czech, Russian, and German, were chosen due to their different nature and requirements for the G2P task. Ultimately, the Text-to-Text Transfer Transformer (T5) based model achieved very high conversion accuracy on all the tested languages. Also, it exceeded the accuracy reached by a similar system, when trained on a public LibriSpeech database.
本文探讨了使用几种深度神经网络架构进行词素到音素(G2P)转换的方法,旨在找到一种通用的、与语言无关的方法来完成这项任务。所探索的模型是在整个句子中进行训练的,以便自动捕捉特定语言中存在的跨词语境(如声母同化)。由于英语、捷克语、俄语和德语这四种语言的性质和对 G2P 任务的要求不同,因此选择了这四种语言。最终,基于文本到文本转换器(T5)的模型在所有测试语言中都达到了非常高的转换精度。此外,它还超过了在公共 LibriSpeech 数据库上训练的类似系统所达到的准确率。
{"title":"T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion","authors":"Markéta Řezáčková;Daniel Tihelka;Jindřich Matoušek","doi":"10.1109/TASLP.2024.3426332","DOIUrl":"10.1109/TASLP.2024.3426332","url":null,"abstract":"The present paper explores the use of several deep neural network architectures to carry out a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language-independent approach to the task. The models explored are trained on whole sentences in order to automatically capture cross-word context (such as voicedness assimilation) if it exists in the given language. Four different languages, English, Czech, Russian, and German, were chosen due to their different nature and requirements for the G2P task. Ultimately, the Text-to-Text Transfer Transformer (T5) based model achieved very high conversion accuracy on all the tested languages. Also, it exceeded the accuracy reached by a similar system, when trained on a public LibriSpeech database.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3466-3476"},"PeriodicalIF":4.1,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning 交接 QG:通过解码器融合和强化学习生成问题
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-10 DOI: 10.1109/TASLP.2024.3426292
Ho-Lam Chung;Ying-Hong Chan;Yao-Chung Fan
In recent years, Question Generation (QG) has gained significant attention as a research topic, particularly in the context of its potential to support automatic reading comprehension assessment preparation. However, current QG models are mostly trained on factoid-type datasets, which tend to produce questions that are too simple for assessing advanced abilities. One promising alternative is to train QG models on exam-type datasets, which contain questions that require content reasoning. Unfortunately, there is a shortage of such training data compared to factoid-type questions. To address this issue and improve the quality of QG for generating advanced questions, we propose the Handover QG framework. This framework involves the joint training of exam-type QG and factoid-type QG, and controls the question generation process by interleavingly using the exam-type QG decoder and the factoid-type QG decoder. Furthermore, we employ reinforcement learning to enhance QG performance. Our experimental evaluation shows that our model significantly outperforms the compared baselines, with a BLEU-4 score increase from 5.31 to 6.48. Human evaluation also confirms that the questions generated by our model are answerable and appropriately difficult. Overall, the Handover QG framework offers a promising solution for improving QG performance in generating advanced questions for reading comprehension assessment.
近年来,问题生成(Question Generation,QG)作为一个研究课题备受关注,尤其是在其支持自动阅读理解评估准备工作的潜力方面。然而,目前的 QG 模型大多是在事实类数据集上训练的,而事实类数据集产生的问题往往过于简单,无法评估高级能力。一种有前途的替代方法是在考试类型的数据集上训练 QG 模型,这些数据集包含需要内容推理的问题。遗憾的是,与事实类问题相比,这类训练数据非常缺乏。为了解决这个问题并提高 QG 生成高级问题的质量,我们提出了交接 QG 框架。该框架涉及考试题型 QG 和事实题型 QG 的联合训练,并通过交错使用考试题型 QG 解码器和事实题型 QG 解码器来控制问题生成过程。此外,我们还采用了强化学习来提高 QG 性能。实验评估表明,我们的模型明显优于比较基线,BLEU-4 分数从 5.31 提高到 6.48。人工评估也证实,我们的模型生成的问题是可以回答的,而且难度适当。总之,Handover QG 框架为提高 QG 生成阅读理解评估高级问题的性能提供了一个很有前途的解决方案。
{"title":"Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning","authors":"Ho-Lam Chung;Ying-Hong Chan;Yao-Chung Fan","doi":"10.1109/TASLP.2024.3426292","DOIUrl":"10.1109/TASLP.2024.3426292","url":null,"abstract":"In recent years, Question Generation (QG) has gained significant attention as a research topic, particularly in the context of its potential to support automatic reading comprehension assessment preparation. However, current QG models are mostly trained on factoid-type datasets, which tend to produce questions that are too simple for assessing advanced abilities. One promising alternative is to train QG models on exam-type datasets, which contain questions that require content reasoning. Unfortunately, there is a shortage of such training data compared to factoid-type questions. To address this issue and improve the quality of QG for generating advanced questions, we propose the \u0000<italic>Handover QG</i>\u0000 framework. This framework involves the joint training of exam-type QG and factoid-type QG, and controls the question generation process by interleavingly using the exam-type QG decoder and the factoid-type QG decoder. Furthermore, we employ reinforcement learning to enhance QG performance. Our experimental evaluation shows that our model significantly outperforms the compared baselines, with a BLEU-4 score increase from 5.31 to 6.48. Human evaluation also confirms that the questions generated by our model are answerable and appropriately difficult. Overall, the \u0000<italic>Handover QG</i>\u0000 framework offers a promising solution for improving QG performance in generating advanced questions for reading comprehension assessment.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3644-3655"},"PeriodicalIF":4.1,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition 用于肢体障碍和老年人语音识别的自监督 ASR 模型和特征
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-03 DOI: 10.1109/TASLP.2024.3422839
Shujie Hu;Xurong Xie;Mengzhe Geng;Zengrui Jin;Jiajun Deng;Guinan Li;Yi Wang;Mingyu Cui;Tianzi Wang;Helen Meng;Xunying Liu
Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition. These include: a) input feature fusion between standard acoustic frontends and domain fine-tuned SSL speech representations; b) frame-level joint decoding between TDNN systems separately trained using standard acoustic features alone and those with additional domain fine-tuned SSL features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain fine-tuned pre-trained ASR models. In addition, fine-tuned SSL speech features are used in acoustic-to-articulatory (A2A) inversion to construct multi-modal ASR systems. Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; and the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets. The TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models and their features consistently outperform the standalone fine-tuned SSL pre-trained models. These systems produced statistically significant WER or CER reductions of 6.53%, 1.90%, 2.04% and 7.97% absolute (24.10%, 23.84%, 10.14% and 31.39% relative) on the four tasks respectively. Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
基于自监督学习(SSL)的语音基础模型已被广泛应用于 ASR 任务。然而,通过数据密集型参数微调将其应用于听力障碍和老年语音时,却面临着领域内数据稀缺和不匹配的问题。为此,本文探索了一系列方法,将领域微调的 SSL 预训练模型及其特征整合到 TDNN 和 Conformer ASR 系统中,用于肢体障碍和老年语音识别。这些方法包括:a) 在标准声学前端和领域微调 SSL 语音表示之间进行输入特征融合;b) 在单独使用标准声学特征训练的 TDNN 系统和使用额外的领域微调 SSL 特征训练的 TDNN 系统之间进行帧级联合解码;c) 使用领域微调预训练 ASR 模型对 TDNN/C Conformer 系统输出重新进行多路解码。此外,经过微调的 SSL 语音特征还被用于声学到发音(A2A)反转,以构建多模态 ASR 系统。实验在四个任务上进行:英语 UASpeech 和 TORGO 听觉障碍语音库;英语 DementiaBank Pitt 和广东话 JCCOCC MoCA 老年语音数据集。通过整合与领域相适应的 HuBERT、wav2vec2-conformer 或多语言 XLSR 模型及其特征而构建的 TDNN 系统始终优于独立的微调 SSL 预训练模型。在四项任务中,这些系统的 WER 或 CER 绝对值分别降低了 6.53%、1.90%、2.04% 和 7.97%(相对值分别降低了 24.10%、23.84%、10.14% 和 31.39%),具有显著的统计学意义。使用 DementiaBank Pitt 老年人语音识别输出,阿尔茨海默病的检测准确率也得到了持续提高。
{"title":"Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition","authors":"Shujie Hu;Xurong Xie;Mengzhe Geng;Zengrui Jin;Jiajun Deng;Guinan Li;Yi Wang;Mingyu Cui;Tianzi Wang;Helen Meng;Xunying Liu","doi":"10.1109/TASLP.2024.3422839","DOIUrl":"10.1109/TASLP.2024.3422839","url":null,"abstract":"Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition. These include: a) input feature fusion between standard acoustic frontends and domain fine-tuned SSL speech representations; b) frame-level joint decoding between TDNN systems separately trained using standard acoustic features alone and those with additional domain fine-tuned SSL features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain fine-tuned pre-trained ASR models. In addition, fine-tuned SSL speech features are used in acoustic-to-articulatory (A2A) inversion to construct multi-modal ASR systems. Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; and the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets. The TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models and their features consistently outperform the standalone fine-tuned SSL pre-trained models. These systems produced statistically significant WER or CER reductions of \u0000<bold>6.53%</b>\u0000, \u0000<bold>1.90%</b>\u0000, \u0000<bold>2.04%</b>\u0000 and \u0000<bold>7.97%</b>\u0000 absolute (\u0000<bold>24.10%</b>\u0000, \u0000<bold>23.84%</b>\u0000, \u0000<bold>10.14%</b>\u0000 and \u0000<bold>31.39%</b>\u0000 relative) on the four tasks respectively. Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3561-3575"},"PeriodicalIF":4.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10584335","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors DiaPer:利用基于感知器的吸引器进行端到端神经萃取
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-03 DOI: 10.1109/TASLP.2024.3422818
Federico Landini;Mireia Diez;Themos Stafylakis;Lukáš Burget
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
直到最近,层叠系统仍在说话人日记领域占据主导地位。由于其局限性(主要是语音重叠和管道繁琐),端到端模型近来大受欢迎。其中最成功的模型之一是基于吸引子的端到端神经日记(EEND-EDA)。在这项工作中,我们用基于感知器的 EDA 模块取代了 EEND-EDA,并展示了它与 EEND-EDA 相比的优势,即在研究较多的 Callhome 数据集上获得更好的性能,更准确地找到对话中说话者的数量,以及更快的推理时间。此外,在与其他方法进行详尽比较时,我们的模型 DiaPer 以其非常轻巧的设计获得了显著的性能。此外,我们还在十多个公共宽频数据集上与其他作品和级联基线进行了比较。与本出版物一起发布的还有 DiaPer 的代码以及在公共和免费数据上训练的模型。
{"title":"DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors","authors":"Federico Landini;Mireia Diez;Themos Stafylakis;Lukáš Burget","doi":"10.1109/TASLP.2024.3422818","DOIUrl":"10.1109/TASLP.2024.3422818","url":null,"abstract":"Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3450-3465"},"PeriodicalIF":4.1,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription 用于自动钢琴转写的谐波感知频率和时间注意事项
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-06-28 DOI: 10.1109/TASLP.2024.3419441
Qi Wang;Mingkuan Liu;Changchun Bao;Maoshen Jia
Automatic music transcription (AMT) is to transcribe music audio into note symbol representations. Concurrent notes overlapping in the frequency and time domains still hinder the performance of polyphonic piano transcription in current studies. In this work, we develop an attention-based method for piano transcription, where we propose a harmonic-aware attention to capture the musical frequency structure, and a local time attention to model temporal dependencies. The harmonic-aware frequency attention not only emphasizes the relationship between the obvious harmonics, but also extracts the correlation in the residual non-harmonic component. The time attention mechanism is improved using the learnable attention range masks to model frame-wise short-term dependencies on different subtasks. Experiments on the MAESTRO dataset demonstrate that the proposed system achieves state-of-the-art transcription performance on both frame-wise and note-wise F1 metrics. Considering the influence of the piano pedals' dynamic behavior on note duration, a note duration modification method is also proposed. With a more accurate annotation of the offset on MAESTRO, the transcription performance is further improved.
自动音乐转录(AMT)是将音乐音频转录为音符符号表示。在目前的研究中,频域和时域重叠的并发音符仍然阻碍着复调钢琴转写的性能。在这项工作中,我们开发了一种基于注意力的钢琴转写方法,其中我们提出了一种谐波感知注意力来捕捉音乐的频率结构,以及一种局部时间注意力来模拟时间依赖性。谐波感知频率注意不仅强调了明显谐波之间的关系,还提取了残余非谐波成分中的相关性。利用可学习的注意力范围掩码改进了时间注意力机制,以模拟不同子任务的帧短期依赖关系。在 MAESTRO 数据集上的实验表明,所提出的系统在帧和音符的 F1 指标上都达到了最先进的转录性能。考虑到钢琴踏板的动态行为对音符持续时间的影响,还提出了一种音符持续时间修改方法。通过对 MAESTRO 上的偏移量进行更精确的注释,转录性能得到了进一步提高。
{"title":"Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription","authors":"Qi Wang;Mingkuan Liu;Changchun Bao;Maoshen Jia","doi":"10.1109/TASLP.2024.3419441","DOIUrl":"10.1109/TASLP.2024.3419441","url":null,"abstract":"Automatic music transcription (AMT) is to transcribe music audio into note symbol representations. Concurrent notes overlapping in the frequency and time domains still hinder the performance of polyphonic piano transcription in current studies. In this work, we develop an attention-based method for piano transcription, where we propose a harmonic-aware attention to capture the musical frequency structure, and a local time attention to model temporal dependencies. The harmonic-aware frequency attention not only emphasizes the relationship between the obvious harmonics, but also extracts the correlation in the residual non-harmonic component. The time attention mechanism is improved using the learnable attention range masks to model frame-wise short-term dependencies on different subtasks. Experiments on the MAESTRO dataset demonstrate that the proposed system achieves state-of-the-art transcription performance on both frame-wise and note-wise F1 metrics. Considering the influence of the piano pedals' dynamic behavior on note duration, a note duration modification method is also proposed. With a more accurate annotation of the offset on MAESTRO, the transcription performance is further improved.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3492-3506"},"PeriodicalIF":4.1,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer SpeechX:作为多功能语音转换器的神经编解码语言模型
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-06-28 DOI: 10.1109/TASLP.2024.3419418
Xiaofei Wang;Manthan Thakker;Zhuo Chen;Naoyuki Kanda;Sefik Emre Eskimez;Sanyuan Chen;Min Tang;Shujie Liu;Jinyu Li;Takuya Yoshioka
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks.
基于音频文本提示的语音生成模型的最新进展带来了令人瞩目的创新,如高质量的零镜头文本到语音。然而,现有模型在处理各种音频-文本语音生成任务时仍面临限制,包括转换输入语音和处理在不利声学条件下捕获的音频。本文介绍了 SpeechX,这是一种多功能语音生成模型,能够处理零声道 TTS 和各种语音转换任务,既能处理干净信号,也能处理噪声信号。SpeechX 将神经编解码语言建模与使用任务提示的多任务学习相结合,实现了统一和可扩展的建模,并为在语音增强和转换任务中利用文本输入提供了一致的方法。实验结果表明,SpeechX 在各种任务中都很有效,包括零镜头 TTS、噪声抑制、目标说话人提取、语音移除以及有背景噪声或无背景噪声的语音编辑,在各种任务中都取得了与专门模型相当或更优的性能。
{"title":"SpeechX: Neural Codec Language Model as a Versatile Speech Transformer","authors":"Xiaofei Wang;Manthan Thakker;Zhuo Chen;Naoyuki Kanda;Sefik Emre Eskimez;Sanyuan Chen;Min Tang;Shujie Liu;Jinyu Li;Takuya Yoshioka","doi":"10.1109/TASLP.2024.3419418","DOIUrl":"10.1109/TASLP.2024.3419418","url":null,"abstract":"Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3355-3364"},"PeriodicalIF":4.1,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Modal Decomposition for Directionally Biased Sound Field Recording 定向偏置声场记录的最佳模态分解
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-06-28 DOI: 10.1109/TASLP.2024.3420252
Hao Gao;Junlong Ren;Jiazheng Cheng;Yong Shen
Sound field recording aims to capture and preserve the information of the sound field in a specific area. Typically, the recorded sound field is decomposed as a superposition of a set of modes. Spherical harmonic functions are often used as basis functions for the modal decomposition, and they are optimal for directionally unbiased sound field recording, but the sound field recording problems in many practical application scenarios are directionally biased. However, most conventional directionally biased modal decomposition methods are non-optimal or have limited applications for sound field recording. In this paper, an optimal modal decomposition for directionally biased sound field recording is proposed, which minimizes the least-square error of the directionally biased sound field recording. This paper formulates the optimization problem of the modal decomposition with the consideration of the sound wave distribution and the directional importance. After that, the optimization problem is discretized and then solved to obtain the optimal basis functions for modal decomposition. To estimate the modal coefficients by using the spherical microphone array, the corresponding optimal encoding matrix is also derived. Finally, several simulations and experiments are presented to verify the proposed method. The results indicate that the proposed method performs well for directionally biased sound field recording.
声场记录旨在捕捉和保存特定区域的声场信息。通常,记录的声场被分解为一组模态的叠加。球谐函数通常被用作模态分解的基函数,它们是无方向偏差声场记录的最佳选择,但许多实际应用场景中的声场记录问题都是有方向偏差的。然而,大多数传统的方向偏置模态分解方法都不是最优的,或者在声场记录中的应用有限。本文提出了一种方向偏置声场记录的最优模态分解方法,它能使方向偏置声场记录的最小平方误差最小。本文在考虑声波分布和方向重要性的基础上,提出了模态分解的优化问题。然后,将优化问题离散化并求解,以获得模态分解的最优基函数。为了利用球形麦克风阵列估算模态系数,还推导出了相应的最优编码矩阵。最后,还介绍了一些模拟和实验来验证所提出的方法。结果表明,所提出的方法在有方向偏差的声场记录中表现良好。
{"title":"Optimal Modal Decomposition for Directionally Biased Sound Field Recording","authors":"Hao Gao;Junlong Ren;Jiazheng Cheng;Yong Shen","doi":"10.1109/TASLP.2024.3420252","DOIUrl":"10.1109/TASLP.2024.3420252","url":null,"abstract":"Sound field recording aims to capture and preserve the information of the sound field in a specific area. Typically, the recorded sound field is decomposed as a superposition of a set of modes. Spherical harmonic functions are often used as basis functions for the modal decomposition, and they are optimal for directionally unbiased sound field recording, but the sound field recording problems in many practical application scenarios are directionally biased. However, most conventional directionally biased modal decomposition methods are non-optimal or have limited applications for sound field recording. In this paper, an optimal modal decomposition for directionally biased sound field recording is proposed, which minimizes the least-square error of the directionally biased sound field recording. This paper formulates the optimization problem of the modal decomposition with the consideration of the sound wave distribution and the directional importance. After that, the optimization problem is discretized and then solved to obtain the optimal basis functions for modal decomposition. To estimate the modal coefficients by using the spherical microphone array, the corresponding optimal encoding matrix is also derived. Finally, several simulations and experiments are presented to verify the proposed method. The results indicate that the proposed method performs well for directionally biased sound field recording.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3424-3436"},"PeriodicalIF":4.1,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1