首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Adopting machine translation in the healthcare sector: A methodological multi-criteria review 在医疗保健部门采用机器翻译:一种方法多标准审查
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-29 DOI: 10.1016/j.csl.2023.101582
Marco Zappatore , Gilda Ruggieri

Background:

The recent advances in machine translation (MT) offer an appealing and low-cost solution to overcome language barriers in multiple contexts (e.g., travelling, cultural interaction, digital content localisation). However, highly-technical domains typically exhibiting as long, complex, and specialised texts as the healthcare sector, pose multiple challenges to the effective and risk-safe use of MT.

Methods:

To examine how MT nowadays assists written/verbal health communication and because of the existing considerable heterogeneity in technological enablers, language pairs and user groups, training approaches, evaluation processes, and users” requirements, we propose in this paper a methodological multi-criteria literature review based on current guidelines in computer science research and grounded on a customised configuration of the PRISMA methodology, normally used to perform meta-analyses on clinical trials. The review focuses on language-to-language medical MT, covers the time period January 2015–February 2023, and only refers to articles written in English that are accessible via four scientific online digital libraries. Articles are ranked according to a meta-evaluation scoring method for MT scientific credibility along with a scoring for assessing the scope of MT in healthcare. Finally, a guideline to properly design a study about MT in healthcare is also proposed.

Results:

The review included a final set of 58 articles from journals (n=30) and conference proceedings (n=28), considering 48 different language combinations. We identified a predominance of English-to-Spanish (n=19) and English-to-Chinese (n=16) implementations, mainly tailored to medical staff only (n=14) or along with patients (n=12). Included papers addressed clinical communication (n=21) and health education (n=37). Unidirectional real-time bilingual MT (n=24) was the most frequent configuration. MT implementations were dominated by Google Translate (n=22) often used as baseline, OpenNMT (n=12), or Moses (n=11). Training and evaluation approaches varied considerably, while deployment and pre-/post-editing were rarely desc

背景:机器翻译(MT)的最新进展为克服多种环境(如旅游、文化互动、数字内容本地化)中的语言障碍提供了一种有吸引力且低成本的解决方案。然而,高技术领域通常表现出与医疗保健部门一样长的、复杂的和专业的文本,对MT的有效和风险安全使用构成了多重挑战。方法:研究当今MT如何协助书面/口头健康沟通,并且由于技术促成因素、语言对和用户群体、培训方法、评估过程和用户需求存在相当大的异质性,我们在本文中提出了一种基于当前计算机科学研究指南的方法学多标准文献综述,并以PRISMA方法学的定制配置为基础,该方法学通常用于对临床试验进行荟萃分析。该综述的重点是语言到语言的医学MT,涵盖的时间段为2015年1月至2023年2月,并且仅参考可通过四个科学在线数字图书馆访问的英文文章。文章根据MT科学可信度的元评价评分方法以及评估MT在医疗保健中的范围的评分进行排名。最后,提出了合理设计医疗卫生领域MT研究的指导原则。结果:本综述最终纳入了来自期刊(n=30)和会议论文集(n=28)的58篇文章,考虑了48种不同的语言组合。我们确定了英语-西班牙语(n=19)和英语-中文(n=16)实施的优势,主要针对医务人员(n=14)或患者(n=12)。纳入涉及临床沟通(n=21)和健康教育(n=37)的论文。单向实时双语MT (n=24)是最常见的配置。机器翻译的实现主要是谷歌Translate (n=22),通常用作基线,OpenNMT (n=12)或Moses (n=11)。培训和评价方法差别很大,而部署和前/后编辑很少有足够详细的描述。结论:即使有相当数量的文章报道了所提出的机器翻译解决方案在翻译(生物)医学文本时是有效的,但其中只有一部分符合严格的翻译质量评估标准(例如,使用比BLEU或统计显著性检验更与人类排名相关的自动指标)。尽管如此,机器翻译可以成为健康沟通的有效支持/补充,但为了应对流畅性、准确性、非自然翻译、领域充分性和潜在安全风险(对于高度敏感的文件)等问题,适当的机器翻译培训以及领域内的人工后期编辑是必不可少的。领域内训练文本语料库的存在也被证明是有益的。最后,本文还提出了如何设计医疗保健领域MT研究的指南,以吸引更多的研究人员参与这一领域。
{"title":"Adopting machine translation in the healthcare sector: A methodological multi-criteria review","authors":"Marco Zappatore ,&nbsp;Gilda Ruggieri","doi":"10.1016/j.csl.2023.101582","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101582","url":null,"abstract":"<div><h3>Background:</h3><p>The recent advances in machine translation (MT) offer an appealing and low-cost solution to overcome language barriers in multiple contexts (e.g., travelling, cultural interaction, digital content localisation). However, highly-technical domains typically exhibiting as long, complex, and specialised texts as the healthcare sector, pose multiple challenges to the effective and risk-safe use of MT.</p></div><div><h3>Methods:</h3><p>To examine how MT nowadays assists written/verbal health communication and because of the existing considerable heterogeneity in technological enablers, language pairs and user groups, training approaches, evaluation processes, and users” requirements, we propose in this paper a methodological multi-criteria literature review based on current guidelines in computer science research and grounded on a customised configuration of the PRISMA methodology, normally used to perform meta-analyses on clinical trials. The review focuses on language-to-language medical MT, covers the time period January 2015–February 2023, and only refers to articles written in English that are accessible via four scientific online digital libraries. Articles are ranked according to a meta-evaluation scoring method for MT scientific credibility along with a scoring for assessing the scope of MT in healthcare. Finally, a guideline to properly design a study about MT in healthcare is also proposed.</p></div><div><h3>Results:</h3><p>The review included a final set of 58 articles from journals (<span><math><mrow><mi>n</mi><mo>=</mo><mn>30</mn></mrow></math></span>) and conference proceedings (<span><math><mrow><mi>n</mi><mo>=</mo><mn>28</mn></mrow></math></span>), considering 48 different language combinations. We identified a predominance of English-to-Spanish (<span><math><mrow><mi>n</mi><mo>=</mo><mn>19</mn></mrow></math></span>) and English-to-Chinese (<span><math><mrow><mi>n</mi><mo>=</mo><mn>16</mn></mrow></math></span>) implementations, mainly tailored to medical staff only (<span><math><mrow><mi>n</mi><mo>=</mo><mn>14</mn></mrow></math></span>) or along with patients (<span><math><mrow><mi>n</mi><mo>=</mo><mn>12</mn></mrow></math></span>). Included papers addressed clinical communication (<span><math><mrow><mi>n</mi><mo>=</mo><mn>21</mn></mrow></math></span>) and health education (<span><math><mrow><mi>n</mi><mo>=</mo><mn>37</mn></mrow></math></span>). Unidirectional real-time bilingual MT (<span><math><mrow><mi>n</mi><mo>=</mo><mn>24</mn></mrow></math></span>) was the most frequent configuration. MT implementations were dominated by Google Translate (<span><math><mrow><mi>n</mi><mo>=</mo><mn>22</mn></mrow></math></span>) often used as baseline, OpenNMT (<span><math><mrow><mi>n</mi><mo>=</mo><mn>12</mn></mrow></math></span>), or Moses (<span><math><mrow><mi>n</mi><mo>=</mo><mn>11</mn></mrow></math></span>). Training and evaluation approaches varied considerably, while deployment and pre-/post-editing were rarely desc","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230823001018/pdfft?md5=07fea723a485870f0441f905c12f6368&pid=1-s2.0-S0885230823001018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAE-NER: Dual-channel attention enhancement for Chinese named entity recognition DAE-NER:用于中文命名实体识别的双通道注意力增强技术
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-26 DOI: 10.1016/j.csl.2023.101581
Jingxin Liu , Mengzhe Sun , Wenhao Zhang , Gengquan Xie , Yongxia Jing , Xiulai Li , Zhaoxin Shi

Named Entity Recognition (NER) is an important component of Natural Language Processing (NLP) and is a fundamental yet challenging task in text analysis. Recently, NER models for Chinese-language characters have received considerable attention. Owing to the complexity and ambiguity of the Chinese language, the same semantic features have different levels of importance in different contexts. However, existing literature on Chinese Named Entity recognition (CNER) does not capture this difference in importance. To tackle this problem, we propose a new method, referred to as Dual-channel Attention Enhancement for Chinese Named Entity Recognition (DAE-NER). Specifically, we design compression and decompression mechanisms to adapt Chinese language characters to different contexts. By adjusting the weight of the semantic feature vector, the semantic weight is reconstructed to alleviate the interference of contextual differences in semantics. Moreover, in order to enhance the semantic representation of the different granularities in Chinese text, we design attention enhancement modules at the character and sentence levels. These modules dynamically learn the differences in semantic features to enhance important semantic representations in different dimensions. Extensive experiments on four benchmark datasets, namely MSRA, People Daily, Resume, and Weibo, have demonstrated that the proposed DAE-NER can effectively improve the overall performance of CNER.

命名实体识别(NER)是自然语言处理(NLP)的一个重要组成部分,也是文本分析中一项基本但极具挑战性的任务。最近,中文字符的 NER 模型受到了广泛关注。由于中文的复杂性和模糊性,相同的语义特征在不同语境中具有不同的重要性。然而,现有的中文命名实体识别(CNER)文献并没有捕捉到这种重要性上的差异。为了解决这个问题,我们提出了一种新方法,即中文命名实体识别双通道注意力增强法(DAE-NER)。具体来说,我们设计了压缩和解压缩机制,使汉字适应不同的语境。通过调整语义特征向量的权重,重构语义权重,减轻语境差异对语义的干扰。此外,为了增强中文文本中不同粒度的语义表征,我们在字符和句子层面设计了注意力增强模块。这些模块动态学习语义特征的差异,以增强不同维度的重要语义表征。在 MSRA、人民日报、简历和微博四个基准数据集上进行的广泛实验证明,所提出的 DAE-NER 可以有效提高 CNER 的整体性能。
{"title":"DAE-NER: Dual-channel attention enhancement for Chinese named entity recognition","authors":"Jingxin Liu ,&nbsp;Mengzhe Sun ,&nbsp;Wenhao Zhang ,&nbsp;Gengquan Xie ,&nbsp;Yongxia Jing ,&nbsp;Xiulai Li ,&nbsp;Zhaoxin Shi","doi":"10.1016/j.csl.2023.101581","DOIUrl":"10.1016/j.csl.2023.101581","url":null,"abstract":"<div><p>Named Entity Recognition (NER) is an important component of Natural Language Processing (NLP) and is a fundamental yet challenging task in text analysis. Recently, NER models for Chinese-language characters have received considerable attention. Owing to the complexity and ambiguity of the Chinese language, the same semantic features have different levels of importance in different contexts. However, existing literature on Chinese Named Entity recognition (CNER) does not capture this difference in importance. To tackle this problem, we propose a new method, referred to as Dual-channel Attention Enhancement for Chinese Named Entity Recognition (DAE-NER). Specifically, we design compression and decompression mechanisms to adapt Chinese language characters to different contexts. By adjusting the weight of the semantic feature vector, the semantic weight is reconstructed to alleviate the interference of contextual differences in semantics. Moreover, in order to enhance the semantic representation of the different granularities in Chinese text, we design attention enhancement modules at the character and sentence levels. These modules dynamically learn the differences in semantic features to enhance important semantic representations in different dimensions. Extensive experiments on four benchmark datasets, namely MSRA, People Daily, Resume, and Weibo, have demonstrated that the proposed DAE-NER can effectively improve the overall performance of CNER.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136160609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An effective approach for identifying keywords as high-quality filters to get emergency-implicated Twitter Spanish data 识别关键字作为高质量过滤器的有效方法,以获取涉及紧急情况的Twitter西班牙语数据
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-26 DOI: 10.1016/j.csl.2023.101579
Joel Garcia-Arteaga , Jesús Zambrano-Zambrano , Jorge Parraga-Alava , Jorge Rodas-Silva

Twitter has become a powerful knowledge source for data extraction for data mining projects due to the amount of data generated by its users, which allows researchers to find content of almost any topic in real time, but this depends on the quality of the keywords used, otherwise the extracted data will have a high percentage of irrelevant content. In this paper, we introduce a time-aware machine-learning-based approach to identify meaningful keywords to maximize the extraction of relevant emergency-related tweets when the Twitter API is used. We follow the CRISP-DM methodology. The first stage relies on problem understanding, where we detected the necessity of using meaningful keywords to filter content and extract data with more quality and reduce the percentage of irrelevant tweets. In the second stage, data collection, we used the official Twitter API to extract and label tweets as “emergencia” and “no emergencia”. After that, we analyzed the collected data (data understanding) to determine preprocessing techniques and to prepare the data for the model. Finally, in the modeling and testing stages, we trained a restricted Boltzmann machine and four variations of autoencoders, including an architecture proposed by a genetic algorithm, to use them as keyword identifiers and to determine which of them has the best performance to deploy it to production (deployment stage). The results show a slightly better performance of the autoencoder proposed by the genetic algorithm (GADAE), achieving a R2 score of 0.97, a MAE of 14×103, and a MSE of 4×104. GADAE, the best model, managed to extract 110% more relevant tweets than manual filtering in the context of emergency-implicated tweets in Ecuador.

由于Twitter用户产生的大量数据,Twitter已经成为数据挖掘项目中数据提取的强大知识来源,这使得研究人员可以实时找到几乎任何主题的内容,但这取决于所使用的关键字的质量,否则提取的数据中会有很高比例的不相关内容。在本文中,我们引入了一种基于时间感知的机器学习方法来识别有意义的关键字,以便在使用Twitter API时最大限度地提取相关的紧急相关推文。我们遵循CRISP-DM方法。第一阶段依赖于对问题的理解,我们发现了使用有意义的关键字来过滤内容和提取质量更高的数据的必要性,并减少了不相关推文的百分比。在第二阶段,数据收集,我们使用Twitter官方API提取推文,并将其标记为“紧急情况”和“非紧急情况”。之后,我们对收集到的数据进行分析(数据理解),以确定预处理技术,并为模型准备数据。最后,在建模和测试阶段,我们训练了一个受限玻尔兹曼机和四种自编码器的变体,包括一种由遗传算法提出的架构,将它们用作关键字标识符,并确定其中哪一种具有最佳性能以将其部署到生产(部署阶段)。结果表明,由遗传算法(GADAE)提出的自编码器性能稍好,R2得分为0.97,MAE为14×10−3,MSE为4×10−4。GADAE是最好的模型,在厄瓜多尔涉及紧急事件的推文中,它比人工过滤多提取了110%的相关推文。
{"title":"An effective approach for identifying keywords as high-quality filters to get emergency-implicated Twitter Spanish data","authors":"Joel Garcia-Arteaga ,&nbsp;Jesús Zambrano-Zambrano ,&nbsp;Jorge Parraga-Alava ,&nbsp;Jorge Rodas-Silva","doi":"10.1016/j.csl.2023.101579","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101579","url":null,"abstract":"<div><p><span>Twitter has become a powerful knowledge source for data extraction for data mining projects due to the amount of data generated by its users, which allows researchers to find content of almost any topic in real time, but this depends on the quality of the keywords used, otherwise the extracted data will have a high percentage of irrelevant content. In this paper, we introduce a time-aware machine-learning-based approach to identify meaningful keywords to maximize the extraction of relevant emergency-related tweets when the Twitter API is used. We follow the </span><em>CRISP-DM</em> methodology. The first stage relies on <em>problem understanding</em>, where we detected the necessity of using meaningful keywords to filter content and extract data with more quality and reduce the percentage of irrelevant tweets. In the second stage, <em>data collection</em>, we used the official Twitter API to extract and label tweets as “<em>emergencia</em>” and “<em>no emergencia</em>”. After that, we analyzed the collected data (<em>data understanding</em><span>) to determine preprocessing techniques and to prepare the data for the model. Finally, in the </span><em>modeling</em> and <em>testing</em><span><span> stages, we trained a restricted Boltzmann machine and four variations of </span>autoencoders<span>, including an architecture proposed by a genetic algorithm, to use them as keyword identifiers and to determine which of them has the best performance to deploy it to production (</span></span><em>deployment</em> stage). The results show a slightly better performance of the autoencoder proposed by the genetic algorithm (GADAE), achieving a <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> <em>score</em> of 0.97, a <span><em>MAE</em></span> of <span><math><mrow><mn>14</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></math></span>, and a <span><em>MSE</em></span> of <span><math><mrow><mn>4</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></mrow></math></span>. GADAE, the best model, managed to extract 110% more relevant tweets than manual filtering in the context of emergency-implicated tweets in Ecuador.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight approach based on prompt for few-shot relation extraction 一种基于提示的轻量级少镜头关系提取方法
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-25 DOI: 10.1016/j.csl.2023.101580
Ying Zhang, Wencheng Huang, Depeng Dang

Few-shot relation extraction (FSRE) aims to predict the relation between two entities in a sentence using a few annotated samples. Many works solve the FSRE problem by training complex models with a huge number of parameters, which results in longer processing times to obtain results. Some recent works focus on introducing relation information into Prototype Networks in various ways. However, most of these methods obtain entity and relation representations by fine-tuning large pre-trained language models. This implies that a copy of the complete pre-trained model needs to be saved after fine-tuning for each specific task, leading to a shortage of computing and space resources. To address this problem, in this paper, we introduce a light approach that utilizes prompt-learning to assist in fine-tuning model by adjusting fewer parameters. To obtain a better prototype of relation, we design a new enhanced fusion module to fuse relation information and original prototype. We conduct extensive experiments on the common FSRE datasets FewRel 1.0 and FewRel 2.0 to varify the advantages of our method, the results show that our model achieves state-of-the-art performance.

少样本关系抽取(FSRE)的目的是利用几个标注的样本来预测句子中两个实体之间的关系。许多工作通过训练具有大量参数的复杂模型来解决FSRE问题,这导致获得结果的处理时间较长。最近的一些研究侧重于以各种方式将关系信息引入原型网络。然而,这些方法大多是通过微调大型预训练语言模型来获得实体和关系表示。这意味着在对每个特定任务进行微调后,需要保存完整的预训练模型的副本,从而导致计算资源和空间资源的短缺。为了解决这个问题,在本文中,我们引入了一种轻量级的方法,利用即时学习通过调整更少的参数来辅助微调模型。为了获得更好的关系原型,我们设计了一种新的增强融合模块,将关系信息与原始原型融合。我们在常见的FSRE数据集fewrel1.0和fewrel2.0上进行了大量的实验,以验证我们的方法的优势,结果表明我们的模型达到了最先进的性能。
{"title":"A lightweight approach based on prompt for few-shot relation extraction","authors":"Ying Zhang,&nbsp;Wencheng Huang,&nbsp;Depeng Dang","doi":"10.1016/j.csl.2023.101580","DOIUrl":"10.1016/j.csl.2023.101580","url":null,"abstract":"<div><p>Few-shot relation extraction (FSRE) aims to predict the relation between two entities in a sentence using a few annotated samples. Many works solve the FSRE problem by training complex models with a huge number of parameters, which results in longer processing times to obtain results. Some recent works focus on introducing relation information into Prototype Networks in various ways. However, most of these methods obtain entity and relation representations by fine-tuning large pre-trained language models. This implies that a copy of the complete pre-trained model needs to be saved after fine-tuning for each specific task, leading to a shortage of computing and space resources. To address this problem, in this paper, we introduce a light approach that utilizes prompt-learning to assist in fine-tuning model by adjusting fewer parameters. To obtain a better prototype of relation, we design a new enhanced fusion module to fuse relation information and original prototype. We conduct extensive experiments on the common FSRE datasets FewRel 1.0 and FewRel 2.0 to varify the advantages of our method, the results show that our model achieves state-of-the-art performance.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136093034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The limits of the Mean Opinion Score for speech synthesis evaluation 语音合成评价中意见平均分的局限性
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-21 DOI: 10.1016/j.csl.2023.101577
Sébastien Le Maguer , Simon King , Naomi Harte

The release of WaveNet and Tacotron has forever transformed the speech synthesis landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS). This protocol is not without controversy, and as the current state-of-the-art synthesis systems now produce outputs remarkably close to human speech, it is now vital to determine how reliable this score is.

To do so, we conducted a series of four experiments replicating and following the 2013 edition of the Blizzard Challenge. With these experiments, we asked four questions about the MOS: How stable is the MOS of a system across time? How do the scores of lower quality systems influence the MOS of higher quality systems? How does the introduction of modern technologies influence the scores of past systems? How does the MOS of modern technologies evolve in isolation?

The results of our experiments are manyfold. Firstly, we verify the superiority of modern technologies in comparison to historical synthesis. Then, we show that despite its origin as an absolute category rating, MOS is a relative score. While minimal variations are observed during the replication of the 2013-EH2 task, these variations can still lead to different conclusions for the intermediate systems. Our experiments also illustrate the sensitivity of MOS to the presence/absence of lower and higher anchors. Overall, our experiments suggest that we may have reached the end of a cul-de-sac by only evaluating the overall quality with MOS. We must embark on a new road and develop different evaluation protocols better suited to the analysis of modern speech synthesis technologies.

WaveNet和Tacotron的发布永远地改变了语音合成领域。多亏了这些改变游戏规则的创新,合成语音的质量达到了前所未有的水平。然而,为了衡量这种质量上的飞跃,绝大多数研究仍然依赖于绝对类别评级(ACR)协议,并使用其输出来比较系统;平均意见得分(MOS)。该协议并非没有争议,由于目前最先进的合成系统现在产生的输出非常接近人类语言,现在确定这个分数的可靠性是至关重要的。为了做到这一点,我们进行了一系列的四项实验,复制并遵循了2013年的暴雪挑战赛。通过这些实验,我们提出了关于MOS的四个问题:一个系统的MOS在时间上有多稳定?低质量系统的分数如何影响高质量系统的MOS ?现代技术的引入如何影响过去系统的分数?现代技术的主流是如何孤立发展的?我们实验的结果是多方面的。首先,我们验证了现代技术相对于历史综合的优越性。然后,我们表明,尽管它的起源是一个绝对类别评级,MOS是一个相对得分。虽然在2013-EH2任务的复制过程中观察到最小的变化,但这些变化仍然可能导致中间系统的不同结论。我们的实验还说明了MOS对低锚点和高锚点存在/不存在的敏感性。总的来说,我们的实验表明,仅用MOS评估整体质量,我们可能已经走到了死胡同的尽头。我们必须走出一条新的道路,开发出更适合现代语音合成技术分析的不同评估协议。
{"title":"The limits of the Mean Opinion Score for speech synthesis evaluation","authors":"Sébastien Le Maguer ,&nbsp;Simon King ,&nbsp;Naomi Harte","doi":"10.1016/j.csl.2023.101577","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101577","url":null,"abstract":"<div><p>The release of WaveNet and Tacotron has forever transformed the speech synthesis<span> landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS). This protocol is not without controversy, and as the current state-of-the-art synthesis systems now produce outputs remarkably close to human speech, it is now vital to determine how reliable this score is.</span></p><p>To do so, we conducted a series of four experiments replicating and following the 2013 edition of the Blizzard Challenge. With these experiments, we asked four questions about the MOS: How stable is the MOS of a system across time? How do the scores of lower quality systems influence the MOS of higher quality systems? How does the introduction of modern technologies influence the scores of past systems? How does the MOS of modern technologies evolve in isolation?</p><p>The results of our experiments are manyfold. Firstly, we verify the superiority of modern technologies in comparison to historical synthesis. Then, we show that despite its origin as an absolute category rating, MOS is a relative score. While minimal variations are observed during the replication of the 2013-EH2 task, these variations can still lead to different conclusions for the intermediate systems. Our experiments also illustrate the sensitivity of MOS to the presence/absence of lower and higher anchors. Overall, our experiments suggest that we may have reached the end of a cul-de-sac by only evaluating the overall quality with MOS. We must embark on a new road and develop different evaluation protocols better suited to the analysis of modern speech synthesis technologies.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios M-Sim:低资源情境下中文简答评分的多层次语义推理模型
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-20 DOI: 10.1016/j.csl.2023.101575
Peichao Lai, Feiyang Ye, Yanggeng Fu, Zhiwei Chen, Yingjie Wu, Yilei Wang

Short answer scoring is a significant task in natural language processing. On datasets comprising numerous explicit or implicit symbols and quantization entities, the existing approaches continue to perform poorly. Additionally, the majority of relevant datasets contain few-shot samples, reducing model efficacy in low-resource scenarios. To solve the above issues, we propose a Multi-level Semantic Inference Model (M-Sim), which obtains features at multiple scales to fully consider the explicit or implicit entity information contained in the data. We then design a prompt-based data augmentation to construct the simulated datasets, which effectively enhance model performance in low-resource scenarios. Our M-Sim outperforms the best competitor models by an average of 1.48 percent in the F1 score. The data augmentation significantly increases all approaches’ performance by an average of 0.036 in correlation coefficient scores.

简答题评分是自然语言处理中的一项重要任务。在包含大量显式或隐式符号和量化实体的数据集上,现有的方法仍然表现不佳。此外,大多数相关数据集包含的样本很少,这降低了模型在低资源场景下的有效性。为了解决上述问题,我们提出了一种多层语义推理模型(M-Sim),该模型在多个尺度上获取特征,以充分考虑数据中包含的显式或隐式实体信息。然后,我们设计了一个基于提示的数据增强来构建模拟数据集,有效地提高了模型在低资源场景下的性能。我们的M-Sim在F1得分上平均比竞争对手高出1.48%。数据增强显著提高了所有方法的相关系数得分,平均提高了0.036。
{"title":"M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios","authors":"Peichao Lai,&nbsp;Feiyang Ye,&nbsp;Yanggeng Fu,&nbsp;Zhiwei Chen,&nbsp;Yingjie Wu,&nbsp;Yilei Wang","doi":"10.1016/j.csl.2023.101575","DOIUrl":"10.1016/j.csl.2023.101575","url":null,"abstract":"<div><p><span>Short answer scoring is a significant task in natural language processing<span>. On datasets comprising numerous explicit or implicit symbols and quantization entities, the existing approaches continue to perform poorly. Additionally, the majority of relevant datasets contain few-shot samples, reducing model efficacy in low-resource scenarios. To solve the above issues, we propose a Multi-level Semantic Inference Model (M-Sim), which obtains features at multiple scales to fully consider the explicit or implicit entity information contained in the data. We then design a prompt-based data augmentation to construct the simulated datasets, which effectively enhance model performance in low-resource scenarios. Our M-Sim outperforms the best competitor models by an average of 1.48 percent in the F1 score. The data augmentation significantly increases all approaches’ performance by an average of 0.036 in </span></span>correlation coefficient scores.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135965365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Document-level relation extraction with entity mentions deep attention 带有实体的文档级关系提取引起了人们的高度关注
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-20 DOI: 10.1016/j.csl.2023.101574
Yangsheng Xu, Jiaxin Tian, Mingwei Tang, Linping Tao, Liuxuan Wang

Document-level Relation Extraction(DocRE) aims to extract relations between entities from documents. In contrast to sentence-level relation extraction, it requires extracting semantic relations from multiple sentences. It is necessary to further improve the performance of the above algorithm in order to extract document-level relation. Therefore, the DocRE algorithms have to deal with more complex entity structure relationships and the need to unite semantic relationships between different sentences when reasoning about relationships between entities. The proposed algorithms fail to infer relationships between entities when dealing with complex entity structure relationships. In this paper, we propose an entity mentions deep attention framework that efficiently infers entity relationships through entity structure and contextual information. Firstly, a structural dependency module of entities is designed to achieve interaction between different mentions of the entity. Secondly, a deep contextual attention component proposed to enrich the semantic information between entities by entity-related contexts. Finally, we use a distance mapping component to solve the problem of entity pairs that are far away from each other. According to our implementation results, our model outperforms the state-ofthe-art models on three public datasets DocRED, DGA, and CDR.

文档级关系提取(DocRE)旨在从文档中提取实体之间的关系。与句子级关系提取不同,它需要从多个句子中提取语义关系。为了提取文档级关系,有必要进一步提高上述算法的性能。因此,DocRE算法在推理实体之间的关系时,需要处理更复杂的实体结构关系,需要统一不同句子之间的语义关系。当处理复杂的实体结构关系时,所提出的算法无法推断实体之间的关系。在本文中,我们提出了一个实体提及深度注意框架,该框架通过实体结构和上下文信息有效地推断实体关系。首先,设计实体的结构性依赖模块,实现实体的不同提及之间的交互。其次,提出了一种深度上下文注意分量,通过实体相关上下文丰富实体间的语义信息。最后,我们使用距离映射组件来解决实体对彼此距离较远的问题。根据我们的实现结果,我们的模型在三个公共数据集DocRED、DGA和CDR上优于最先进的模型。
{"title":"Document-level relation extraction with entity mentions deep attention","authors":"Yangsheng Xu,&nbsp;Jiaxin Tian,&nbsp;Mingwei Tang,&nbsp;Linping Tao,&nbsp;Liuxuan Wang","doi":"10.1016/j.csl.2023.101574","DOIUrl":"10.1016/j.csl.2023.101574","url":null,"abstract":"<div><p><span><span><span>Document-level Relation Extraction(DocRE) aims to extract relations between entities from documents. In contrast to sentence-level relation extraction, it requires extracting semantic relations from multiple sentences. It is necessary to further improve the performance of the above algorithm in order to extract document-level relation. Therefore, the DocRE algorithms have to deal with more complex entity structure relationships and the need to unite </span>semantic relationships between different sentences when reasoning about relationships between entities. The proposed algorithms fail to infer relationships between entities when dealing with complex entity structure relationships. In this paper, we propose an entity mentions deep attention framework that efficiently infers entity relationships through entity structure and contextual information. Firstly, a structural </span>dependency module of entities is designed to achieve interaction between different mentions of the entity. Secondly, a deep contextual attention component proposed to enrich the </span>semantic information between entities by entity-related contexts. Finally, we use a distance mapping component to solve the problem of entity pairs that are far away from each other. According to our implementation results, our model outperforms the state-ofthe-art models on three public datasets DocRED, DGA, and CDR.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136009863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta adversarial learning improves low-resource speech recognition 元对抗学习提高低资源语音识别
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-19 DOI: 10.1016/j.csl.2023.101576
Yaqi Chen, Xukui Yang, Hao Zhang, Wenlin Zhang, Dan Qu, Cong Chen

Low-resource automatic speech recognition is a challenging task. To resolve this issue, multilingual meta-learning learns a better model initialization from many source languages, allowing for rapid adaption to target languages. However, differences in data scales and learning difficulties vary greatly from one language to another. As a result, the model favors large-scale and simple source languages. Moreover, the shared semantic space of various languages is difficult to learn due to a lack of restrictions on multilingual pre-training. In this paper, we propose a meta adversarial learning approach to address this problem. The meta-learner will be guided to learn language-independent information by using an adversarial auxiliary objective of language identification, which makes the shared semantic space more compact and improves model generalization. Additionally, we optimize adversarial training using Wasserstein distance and temporal normalization, enabling more stable and simple training. Experiment results on IARPA BABEL and OpenSLR show a significant performance improvement. It also outperforms state-of-the-art results by a large margin in all target languages, and especially in few-shot settings. Finally, we demonstrate how our method is superior by using t-SNE visualization.

低资源自动语音识别是一项具有挑战性的任务。为了解决这个问题,多语言元学习从许多源语言中学习更好的模型初始化,从而允许快速适应目标语言。然而,不同语言之间的数据量和学习难度差异很大。因此,该模型倾向于大规模和简单的源语言。此外,由于缺乏多语言预训练的限制,各种语言的共享语义空间难以学习。在本文中,我们提出了一种元对抗学习方法来解决这个问题。元学习者将使用对抗性的语言识别辅助目标来引导学习与语言无关的信息,这使得共享语义空间更加紧凑,提高了模型的泛化能力。此外,我们使用Wasserstein距离和时间归一化来优化对抗性训练,使训练更加稳定和简单。在IARPA BABEL和OpenSLR上的实验结果表明,该方法的性能得到了显著提高。在所有目标语言中,它的表现也远远超过了最先进的结果,尤其是在少数镜头设置中。最后,我们通过使用t-SNE可视化展示了我们的方法的优越性。
{"title":"Meta adversarial learning improves low-resource speech recognition","authors":"Yaqi Chen,&nbsp;Xukui Yang,&nbsp;Hao Zhang,&nbsp;Wenlin Zhang,&nbsp;Dan Qu,&nbsp;Cong Chen","doi":"10.1016/j.csl.2023.101576","DOIUrl":"10.1016/j.csl.2023.101576","url":null,"abstract":"<div><p><span>Low-resource automatic speech recognition is a challenging task. To resolve this issue, multilingual meta-learning learns a better model initialization from many source languages, allowing for rapid adaption to target languages. However, differences in data scales and learning difficulties vary greatly from one language to another. As a result, the model favors large-scale and simple source languages. Moreover, the shared </span>semantic space<span> of various languages is difficult to learn due to a lack of restrictions on multilingual pre-training. In this paper, we propose a meta adversarial learning approach to address this problem. The meta-learner will be guided to learn language-independent information by using an adversarial auxiliary objective of language identification, which makes the shared semantic space more compact and improves model generalization. Additionally, we optimize adversarial training using Wasserstein distance and temporal normalization, enabling more stable and simple training. Experiment results on IARPA BABEL and OpenSLR show a significant performance improvement. It also outperforms state-of-the-art results by a large margin in all target languages, and especially in few-shot settings. Finally, we demonstrate how our method is superior by using t-SNE visualization.</span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135963787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pronoun use in preclinical and early stages of Alzheimer's dementia Pronoun在阿尔茨海默病痴呆症临床前和早期的应用
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-12 DOI: 10.1016/j.csl.2023.101573
Dagmar Bittner , Claudia Frankenberg , Johannes Schröder

The present study aims at improving the predictive power of the use of pronouns in computational modeling of the risk of Alzheimer's dementia (AD) by (i) further determining the onset of increased pronoun use in AD and (ii) providing insights into the linguistic contexts affected by the increase early on. Pronoun use was compared longitudinally between subjects who either stayed cognitively intact (CTR-group, n = 5) or who had developed AD upon follow-up after 10–12 years (AD-group, n = 5). Data were taken from semi-structured biographical interviews, which stem from the Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE). The first interview (baseline) was conducted when all participants were still cognitively healthy. Analyses concerned the proportional distribution of 12 pronoun types and linguistic contexts of increased use. Already at baseline, the AD-group produced a significantly higher proportion of D-pronouns (der, die, das, etc.) than the CTR-group. The increase in D-pronouns did not affect linguistic contexts favoring the use of personal pronouns. Instead, we found a significantly higher proportion of D-pronouns referring to family members and a significantly higher proportion of personal pronouns referring to non-family humans in the AD-group than in the CTR-group. Our results suggest that the predictive power of the use of pronouns can be significantly improved in computational modeling of the risk of AD by assessing language material that induces the use of pronouns in linguistic contexts affected by the increase.

本研究旨在通过(i)进一步确定AD中代词使用增加的开始,以及(ii)深入了解早期受代词使用增加影响的语言环境,提高代词在阿尔茨海默病(AD)风险计算建模中的预测能力。在保持认知完整的受试者(CTR组,n=5)或在10-12年后随访时患上AD(AD组,n=5)之间,对代词的使用进行纵向比较。数据来自半结构化的传记访谈,这些访谈源于成人发展与衰老跨学科纵向研究(ILSE)。第一次访谈(基线)是在所有参与者仍然认知健康的情况下进行的。分析涉及12种代词类型的比例分布和增加使用的语境。在基线时,AD组产生的D代词(der、die、das等)比例明显高于CTR组。D代词的增加并没有影响到有利于使用人称代词的语境。相反,我们发现AD组中提及家庭成员的D代词比例显著高于CTR组,提及非家庭人类的人称代词比例显著较高。我们的研究结果表明,在AD风险的计算建模中,通过评估在受增加影响的语言环境中诱发代词使用的语言材料,可以显著提高代词使用的预测能力。
{"title":"Pronoun use in preclinical and early stages of Alzheimer's dementia","authors":"Dagmar Bittner ,&nbsp;Claudia Frankenberg ,&nbsp;Johannes Schröder","doi":"10.1016/j.csl.2023.101573","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101573","url":null,"abstract":"<div><p>The present study aims at improving the predictive power of the use of pronouns in computational modeling of the risk of Alzheimer's dementia (AD) by (i) further determining the onset of increased pronoun use in AD and (ii) providing insights into the linguistic contexts affected by the increase early on. Pronoun use was compared longitudinally between subjects who either stayed cognitively intact (CTR-group, <em>n</em> = 5) or who had developed AD upon follow-up after 10–12 years (AD-group, <em>n</em> = 5). Data were taken from semi-structured biographical interviews, which stem from the Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE). The first interview (baseline) was conducted when all participants were still cognitively healthy. Analyses concerned the proportional distribution of 12 pronoun types and linguistic contexts of increased use. Already at baseline, the AD-group produced a significantly higher proportion of <em>D-pronouns</em> (<em>der, die, das</em>, etc.) than the CTR-group. The increase in <em>D-pronouns</em> did not affect linguistic contexts favoring the use of <em>personal pronouns</em>. Instead, we found a significantly higher proportion of <em>D-pronouns</em> referring to family members and a significantly higher proportion of <em>personal pronouns</em> referring to non-family <em>humans</em> in the AD-group than in the CTR-group. Our results suggest that the predictive power of the use of pronouns can be significantly improved in computational modeling of the risk of AD by assessing language material that induces the use of pronouns in linguistic contexts affected by the increase.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49836129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement 基于切片的时域卷积网络的ChannelWise加权技术在噪声语音增强中的应用
IF 4.3 3区 计算机科学 Q1 Mathematics Pub Date : 2023-10-11 DOI: 10.1016/j.csl.2023.101572
Wei-Tyng Hong, Kuldeep Singh Rana

In recent years, Temporal Convolutional Networks (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of 3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.

近年来,时间卷积网络(TCN)在单通道噪声语音增强方面取得了重大进展。然而,基于TCN的系统仍然面临某些挑战,例如用于处理长程依赖关系的网络信道深度利用率有限以及权重共享问题。为了应对这些挑战,本文提出了一种新的信道加权方案,专门为切片TCN框架设计。所提出的方案涉及TCN片的每个信道的移位权重技术的逐元素乘法。利用循环移位的方法,这些权重从相邻信道捕获信息,揭示相邻信道之间的相关性。通过组合按信道加权的TCN输出并随后估计掩蔽函数,所提出的方法有效地抑制了噪声分量,从而提高了语音质量。为了训练和评估我们提出的方法,我们使用了由不同级别的各种噪声类型组成的语音数据集。为了优化所提出的端到端增强系统的性能,我们采用了尺度不变信噪比(SI-SNR)目标函数。实验结果证明了我们提出的TCN信道加权方法的有效性,对于看不见的噪声数据集,SI-SNR的平均显著提高了约9.8%。对于多切片TCN框架内的非信道加权方案和所提出的信道加权方案,在−3 dB的SNR下都观察到了这种改进。所提出的方法的主要优点是它能够解决TCN切片输出不均匀和有偏差的挑战,特别是在处理充满类语音噪声的高度非平稳、有噪声的语音信号时。这将在各种实际应用程序中带来更强健的性能。
{"title":"A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement","authors":"Wei-Tyng Hong,&nbsp;Kuldeep Singh Rana","doi":"10.1016/j.csl.2023.101572","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101572","url":null,"abstract":"<div><p><span>In recent years, Temporal Convolutional Networks<span> (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of </span></span><span><math><mo>−</mo></math></span>3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1