Pub Date : 2023-10-29DOI: 10.1016/j.csl.2023.101582
Marco Zappatore , Gilda Ruggieri
Background:
The recent advances in machine translation (MT) offer an appealing and low-cost solution to overcome language barriers in multiple contexts (e.g., travelling, cultural interaction, digital content localisation). However, highly-technical domains typically exhibiting as long, complex, and specialised texts as the healthcare sector, pose multiple challenges to the effective and risk-safe use of MT.
Methods:
To examine how MT nowadays assists written/verbal health communication and because of the existing considerable heterogeneity in technological enablers, language pairs and user groups, training approaches, evaluation processes, and users” requirements, we propose in this paper a methodological multi-criteria literature review based on current guidelines in computer science research and grounded on a customised configuration of the PRISMA methodology, normally used to perform meta-analyses on clinical trials. The review focuses on language-to-language medical MT, covers the time period January 2015–February 2023, and only refers to articles written in English that are accessible via four scientific online digital libraries. Articles are ranked according to a meta-evaluation scoring method for MT scientific credibility along with a scoring for assessing the scope of MT in healthcare. Finally, a guideline to properly design a study about MT in healthcare is also proposed.
Results:
The review included a final set of 58 articles from journals () and conference proceedings (), considering 48 different language combinations. We identified a predominance of English-to-Spanish () and English-to-Chinese () implementations, mainly tailored to medical staff only () or along with patients (). Included papers addressed clinical communication () and health education (). Unidirectional real-time bilingual MT () was the most frequent configuration. MT implementations were dominated by Google Translate () often used as baseline, OpenNMT (), or Moses (). Training and evaluation approaches varied considerably, while deployment and pre-/post-editing were rarely desc
{"title":"Adopting machine translation in the healthcare sector: A methodological multi-criteria review","authors":"Marco Zappatore , Gilda Ruggieri","doi":"10.1016/j.csl.2023.101582","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101582","url":null,"abstract":"<div><h3>Background:</h3><p>The recent advances in machine translation (MT) offer an appealing and low-cost solution to overcome language barriers in multiple contexts (e.g., travelling, cultural interaction, digital content localisation). However, highly-technical domains typically exhibiting as long, complex, and specialised texts as the healthcare sector, pose multiple challenges to the effective and risk-safe use of MT.</p></div><div><h3>Methods:</h3><p>To examine how MT nowadays assists written/verbal health communication and because of the existing considerable heterogeneity in technological enablers, language pairs and user groups, training approaches, evaluation processes, and users” requirements, we propose in this paper a methodological multi-criteria literature review based on current guidelines in computer science research and grounded on a customised configuration of the PRISMA methodology, normally used to perform meta-analyses on clinical trials. The review focuses on language-to-language medical MT, covers the time period January 2015–February 2023, and only refers to articles written in English that are accessible via four scientific online digital libraries. Articles are ranked according to a meta-evaluation scoring method for MT scientific credibility along with a scoring for assessing the scope of MT in healthcare. Finally, a guideline to properly design a study about MT in healthcare is also proposed.</p></div><div><h3>Results:</h3><p>The review included a final set of 58 articles from journals (<span><math><mrow><mi>n</mi><mo>=</mo><mn>30</mn></mrow></math></span>) and conference proceedings (<span><math><mrow><mi>n</mi><mo>=</mo><mn>28</mn></mrow></math></span>), considering 48 different language combinations. We identified a predominance of English-to-Spanish (<span><math><mrow><mi>n</mi><mo>=</mo><mn>19</mn></mrow></math></span>) and English-to-Chinese (<span><math><mrow><mi>n</mi><mo>=</mo><mn>16</mn></mrow></math></span>) implementations, mainly tailored to medical staff only (<span><math><mrow><mi>n</mi><mo>=</mo><mn>14</mn></mrow></math></span>) or along with patients (<span><math><mrow><mi>n</mi><mo>=</mo><mn>12</mn></mrow></math></span>). Included papers addressed clinical communication (<span><math><mrow><mi>n</mi><mo>=</mo><mn>21</mn></mrow></math></span>) and health education (<span><math><mrow><mi>n</mi><mo>=</mo><mn>37</mn></mrow></math></span>). Unidirectional real-time bilingual MT (<span><math><mrow><mi>n</mi><mo>=</mo><mn>24</mn></mrow></math></span>) was the most frequent configuration. MT implementations were dominated by Google Translate (<span><math><mrow><mi>n</mi><mo>=</mo><mn>22</mn></mrow></math></span>) often used as baseline, OpenNMT (<span><math><mrow><mi>n</mi><mo>=</mo><mn>12</mn></mrow></math></span>), or Moses (<span><math><mrow><mi>n</mi><mo>=</mo><mn>11</mn></mrow></math></span>). Training and evaluation approaches varied considerably, while deployment and pre-/post-editing were rarely desc","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230823001018/pdfft?md5=07fea723a485870f0441f905c12f6368&pid=1-s2.0-S0885230823001018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1016/j.csl.2023.101581
Jingxin Liu , Mengzhe Sun , Wenhao Zhang , Gengquan Xie , Yongxia Jing , Xiulai Li , Zhaoxin Shi
Named Entity Recognition (NER) is an important component of Natural Language Processing (NLP) and is a fundamental yet challenging task in text analysis. Recently, NER models for Chinese-language characters have received considerable attention. Owing to the complexity and ambiguity of the Chinese language, the same semantic features have different levels of importance in different contexts. However, existing literature on Chinese Named Entity recognition (CNER) does not capture this difference in importance. To tackle this problem, we propose a new method, referred to as Dual-channel Attention Enhancement for Chinese Named Entity Recognition (DAE-NER). Specifically, we design compression and decompression mechanisms to adapt Chinese language characters to different contexts. By adjusting the weight of the semantic feature vector, the semantic weight is reconstructed to alleviate the interference of contextual differences in semantics. Moreover, in order to enhance the semantic representation of the different granularities in Chinese text, we design attention enhancement modules at the character and sentence levels. These modules dynamically learn the differences in semantic features to enhance important semantic representations in different dimensions. Extensive experiments on four benchmark datasets, namely MSRA, People Daily, Resume, and Weibo, have demonstrated that the proposed DAE-NER can effectively improve the overall performance of CNER.
命名实体识别(NER)是自然语言处理(NLP)的一个重要组成部分,也是文本分析中一项基本但极具挑战性的任务。最近,中文字符的 NER 模型受到了广泛关注。由于中文的复杂性和模糊性,相同的语义特征在不同语境中具有不同的重要性。然而,现有的中文命名实体识别(CNER)文献并没有捕捉到这种重要性上的差异。为了解决这个问题,我们提出了一种新方法,即中文命名实体识别双通道注意力增强法(DAE-NER)。具体来说,我们设计了压缩和解压缩机制,使汉字适应不同的语境。通过调整语义特征向量的权重,重构语义权重,减轻语境差异对语义的干扰。此外,为了增强中文文本中不同粒度的语义表征,我们在字符和句子层面设计了注意力增强模块。这些模块动态学习语义特征的差异,以增强不同维度的重要语义表征。在 MSRA、人民日报、简历和微博四个基准数据集上进行的广泛实验证明,所提出的 DAE-NER 可以有效提高 CNER 的整体性能。
{"title":"DAE-NER: Dual-channel attention enhancement for Chinese named entity recognition","authors":"Jingxin Liu , Mengzhe Sun , Wenhao Zhang , Gengquan Xie , Yongxia Jing , Xiulai Li , Zhaoxin Shi","doi":"10.1016/j.csl.2023.101581","DOIUrl":"10.1016/j.csl.2023.101581","url":null,"abstract":"<div><p>Named Entity Recognition (NER) is an important component of Natural Language Processing (NLP) and is a fundamental yet challenging task in text analysis. Recently, NER models for Chinese-language characters have received considerable attention. Owing to the complexity and ambiguity of the Chinese language, the same semantic features have different levels of importance in different contexts. However, existing literature on Chinese Named Entity recognition (CNER) does not capture this difference in importance. To tackle this problem, we propose a new method, referred to as Dual-channel Attention Enhancement for Chinese Named Entity Recognition (DAE-NER). Specifically, we design compression and decompression mechanisms to adapt Chinese language characters to different contexts. By adjusting the weight of the semantic feature vector, the semantic weight is reconstructed to alleviate the interference of contextual differences in semantics. Moreover, in order to enhance the semantic representation of the different granularities in Chinese text, we design attention enhancement modules at the character and sentence levels. These modules dynamically learn the differences in semantic features to enhance important semantic representations in different dimensions. Extensive experiments on four benchmark datasets, namely MSRA, People Daily, Resume, and Weibo, have demonstrated that the proposed DAE-NER can effectively improve the overall performance of CNER.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136160609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Twitter has become a powerful knowledge source for data extraction for data mining projects due to the amount of data generated by its users, which allows researchers to find content of almost any topic in real time, but this depends on the quality of the keywords used, otherwise the extracted data will have a high percentage of irrelevant content. In this paper, we introduce a time-aware machine-learning-based approach to identify meaningful keywords to maximize the extraction of relevant emergency-related tweets when the Twitter API is used. We follow the CRISP-DM methodology. The first stage relies on problem understanding, where we detected the necessity of using meaningful keywords to filter content and extract data with more quality and reduce the percentage of irrelevant tweets. In the second stage, data collection, we used the official Twitter API to extract and label tweets as “emergencia” and “no emergencia”. After that, we analyzed the collected data (data understanding) to determine preprocessing techniques and to prepare the data for the model. Finally, in the modeling and testing stages, we trained a restricted Boltzmann machine and four variations of autoencoders, including an architecture proposed by a genetic algorithm, to use them as keyword identifiers and to determine which of them has the best performance to deploy it to production (deployment stage). The results show a slightly better performance of the autoencoder proposed by the genetic algorithm (GADAE), achieving a score of 0.97, a MAE of , and a MSE of . GADAE, the best model, managed to extract 110% more relevant tweets than manual filtering in the context of emergency-implicated tweets in Ecuador.
{"title":"An effective approach for identifying keywords as high-quality filters to get emergency-implicated Twitter Spanish data","authors":"Joel Garcia-Arteaga , Jesús Zambrano-Zambrano , Jorge Parraga-Alava , Jorge Rodas-Silva","doi":"10.1016/j.csl.2023.101579","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101579","url":null,"abstract":"<div><p><span>Twitter has become a powerful knowledge source for data extraction for data mining projects due to the amount of data generated by its users, which allows researchers to find content of almost any topic in real time, but this depends on the quality of the keywords used, otherwise the extracted data will have a high percentage of irrelevant content. In this paper, we introduce a time-aware machine-learning-based approach to identify meaningful keywords to maximize the extraction of relevant emergency-related tweets when the Twitter API is used. We follow the </span><em>CRISP-DM</em> methodology. The first stage relies on <em>problem understanding</em>, where we detected the necessity of using meaningful keywords to filter content and extract data with more quality and reduce the percentage of irrelevant tweets. In the second stage, <em>data collection</em>, we used the official Twitter API to extract and label tweets as “<em>emergencia</em>” and “<em>no emergencia</em>”. After that, we analyzed the collected data (<em>data understanding</em><span>) to determine preprocessing techniques and to prepare the data for the model. Finally, in the </span><em>modeling</em> and <em>testing</em><span><span> stages, we trained a restricted Boltzmann machine and four variations of </span>autoencoders<span>, including an architecture proposed by a genetic algorithm, to use them as keyword identifiers and to determine which of them has the best performance to deploy it to production (</span></span><em>deployment</em> stage). The results show a slightly better performance of the autoencoder proposed by the genetic algorithm (GADAE), achieving a <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> <em>score</em> of 0.97, a <span><em>MAE</em></span> of <span><math><mrow><mn>14</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></math></span>, and a <span><em>MSE</em></span> of <span><math><mrow><mn>4</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></mrow></math></span>. GADAE, the best model, managed to extract 110% more relevant tweets than manual filtering in the context of emergency-implicated tweets in Ecuador.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-25DOI: 10.1016/j.csl.2023.101580
Ying Zhang, Wencheng Huang, Depeng Dang
Few-shot relation extraction (FSRE) aims to predict the relation between two entities in a sentence using a few annotated samples. Many works solve the FSRE problem by training complex models with a huge number of parameters, which results in longer processing times to obtain results. Some recent works focus on introducing relation information into Prototype Networks in various ways. However, most of these methods obtain entity and relation representations by fine-tuning large pre-trained language models. This implies that a copy of the complete pre-trained model needs to be saved after fine-tuning for each specific task, leading to a shortage of computing and space resources. To address this problem, in this paper, we introduce a light approach that utilizes prompt-learning to assist in fine-tuning model by adjusting fewer parameters. To obtain a better prototype of relation, we design a new enhanced fusion module to fuse relation information and original prototype. We conduct extensive experiments on the common FSRE datasets FewRel 1.0 and FewRel 2.0 to varify the advantages of our method, the results show that our model achieves state-of-the-art performance.
{"title":"A lightweight approach based on prompt for few-shot relation extraction","authors":"Ying Zhang, Wencheng Huang, Depeng Dang","doi":"10.1016/j.csl.2023.101580","DOIUrl":"10.1016/j.csl.2023.101580","url":null,"abstract":"<div><p>Few-shot relation extraction (FSRE) aims to predict the relation between two entities in a sentence using a few annotated samples. Many works solve the FSRE problem by training complex models with a huge number of parameters, which results in longer processing times to obtain results. Some recent works focus on introducing relation information into Prototype Networks in various ways. However, most of these methods obtain entity and relation representations by fine-tuning large pre-trained language models. This implies that a copy of the complete pre-trained model needs to be saved after fine-tuning for each specific task, leading to a shortage of computing and space resources. To address this problem, in this paper, we introduce a light approach that utilizes prompt-learning to assist in fine-tuning model by adjusting fewer parameters. To obtain a better prototype of relation, we design a new enhanced fusion module to fuse relation information and original prototype. We conduct extensive experiments on the common FSRE datasets FewRel 1.0 and FewRel 2.0 to varify the advantages of our method, the results show that our model achieves state-of-the-art performance.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136093034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-21DOI: 10.1016/j.csl.2023.101577
Sébastien Le Maguer , Simon King , Naomi Harte
The release of WaveNet and Tacotron has forever transformed the speech synthesis landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS). This protocol is not without controversy, and as the current state-of-the-art synthesis systems now produce outputs remarkably close to human speech, it is now vital to determine how reliable this score is.
To do so, we conducted a series of four experiments replicating and following the 2013 edition of the Blizzard Challenge. With these experiments, we asked four questions about the MOS: How stable is the MOS of a system across time? How do the scores of lower quality systems influence the MOS of higher quality systems? How does the introduction of modern technologies influence the scores of past systems? How does the MOS of modern technologies evolve in isolation?
The results of our experiments are manyfold. Firstly, we verify the superiority of modern technologies in comparison to historical synthesis. Then, we show that despite its origin as an absolute category rating, MOS is a relative score. While minimal variations are observed during the replication of the 2013-EH2 task, these variations can still lead to different conclusions for the intermediate systems. Our experiments also illustrate the sensitivity of MOS to the presence/absence of lower and higher anchors. Overall, our experiments suggest that we may have reached the end of a cul-de-sac by only evaluating the overall quality with MOS. We must embark on a new road and develop different evaluation protocols better suited to the analysis of modern speech synthesis technologies.
{"title":"The limits of the Mean Opinion Score for speech synthesis evaluation","authors":"Sébastien Le Maguer , Simon King , Naomi Harte","doi":"10.1016/j.csl.2023.101577","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101577","url":null,"abstract":"<div><p>The release of WaveNet and Tacotron has forever transformed the speech synthesis<span> landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS). This protocol is not without controversy, and as the current state-of-the-art synthesis systems now produce outputs remarkably close to human speech, it is now vital to determine how reliable this score is.</span></p><p>To do so, we conducted a series of four experiments replicating and following the 2013 edition of the Blizzard Challenge. With these experiments, we asked four questions about the MOS: How stable is the MOS of a system across time? How do the scores of lower quality systems influence the MOS of higher quality systems? How does the introduction of modern technologies influence the scores of past systems? How does the MOS of modern technologies evolve in isolation?</p><p>The results of our experiments are manyfold. Firstly, we verify the superiority of modern technologies in comparison to historical synthesis. Then, we show that despite its origin as an absolute category rating, MOS is a relative score. While minimal variations are observed during the replication of the 2013-EH2 task, these variations can still lead to different conclusions for the intermediate systems. Our experiments also illustrate the sensitivity of MOS to the presence/absence of lower and higher anchors. Overall, our experiments suggest that we may have reached the end of a cul-de-sac by only evaluating the overall quality with MOS. We must embark on a new road and develop different evaluation protocols better suited to the analysis of modern speech synthesis technologies.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138087257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1016/j.csl.2023.101575
Peichao Lai, Feiyang Ye, Yanggeng Fu, Zhiwei Chen, Yingjie Wu, Yilei Wang
Short answer scoring is a significant task in natural language processing. On datasets comprising numerous explicit or implicit symbols and quantization entities, the existing approaches continue to perform poorly. Additionally, the majority of relevant datasets contain few-shot samples, reducing model efficacy in low-resource scenarios. To solve the above issues, we propose a Multi-level Semantic Inference Model (M-Sim), which obtains features at multiple scales to fully consider the explicit or implicit entity information contained in the data. We then design a prompt-based data augmentation to construct the simulated datasets, which effectively enhance model performance in low-resource scenarios. Our M-Sim outperforms the best competitor models by an average of 1.48 percent in the F1 score. The data augmentation significantly increases all approaches’ performance by an average of 0.036 in correlation coefficient scores.
{"title":"M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios","authors":"Peichao Lai, Feiyang Ye, Yanggeng Fu, Zhiwei Chen, Yingjie Wu, Yilei Wang","doi":"10.1016/j.csl.2023.101575","DOIUrl":"10.1016/j.csl.2023.101575","url":null,"abstract":"<div><p><span>Short answer scoring is a significant task in natural language processing<span>. On datasets comprising numerous explicit or implicit symbols and quantization entities, the existing approaches continue to perform poorly. Additionally, the majority of relevant datasets contain few-shot samples, reducing model efficacy in low-resource scenarios. To solve the above issues, we propose a Multi-level Semantic Inference Model (M-Sim), which obtains features at multiple scales to fully consider the explicit or implicit entity information contained in the data. We then design a prompt-based data augmentation to construct the simulated datasets, which effectively enhance model performance in low-resource scenarios. Our M-Sim outperforms the best competitor models by an average of 1.48 percent in the F1 score. The data augmentation significantly increases all approaches’ performance by an average of 0.036 in </span></span>correlation coefficient scores.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135965365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1016/j.csl.2023.101574
Yangsheng Xu, Jiaxin Tian, Mingwei Tang, Linping Tao, Liuxuan Wang
Document-level Relation Extraction(DocRE) aims to extract relations between entities from documents. In contrast to sentence-level relation extraction, it requires extracting semantic relations from multiple sentences. It is necessary to further improve the performance of the above algorithm in order to extract document-level relation. Therefore, the DocRE algorithms have to deal with more complex entity structure relationships and the need to unite semantic relationships between different sentences when reasoning about relationships between entities. The proposed algorithms fail to infer relationships between entities when dealing with complex entity structure relationships. In this paper, we propose an entity mentions deep attention framework that efficiently infers entity relationships through entity structure and contextual information. Firstly, a structural dependency module of entities is designed to achieve interaction between different mentions of the entity. Secondly, a deep contextual attention component proposed to enrich the semantic information between entities by entity-related contexts. Finally, we use a distance mapping component to solve the problem of entity pairs that are far away from each other. According to our implementation results, our model outperforms the state-ofthe-art models on three public datasets DocRED, DGA, and CDR.
{"title":"Document-level relation extraction with entity mentions deep attention","authors":"Yangsheng Xu, Jiaxin Tian, Mingwei Tang, Linping Tao, Liuxuan Wang","doi":"10.1016/j.csl.2023.101574","DOIUrl":"10.1016/j.csl.2023.101574","url":null,"abstract":"<div><p><span><span><span>Document-level Relation Extraction(DocRE) aims to extract relations between entities from documents. In contrast to sentence-level relation extraction, it requires extracting semantic relations from multiple sentences. It is necessary to further improve the performance of the above algorithm in order to extract document-level relation. Therefore, the DocRE algorithms have to deal with more complex entity structure relationships and the need to unite </span>semantic relationships between different sentences when reasoning about relationships between entities. The proposed algorithms fail to infer relationships between entities when dealing with complex entity structure relationships. In this paper, we propose an entity mentions deep attention framework that efficiently infers entity relationships through entity structure and contextual information. Firstly, a structural </span>dependency module of entities is designed to achieve interaction between different mentions of the entity. Secondly, a deep contextual attention component proposed to enrich the </span>semantic information between entities by entity-related contexts. Finally, we use a distance mapping component to solve the problem of entity pairs that are far away from each other. According to our implementation results, our model outperforms the state-ofthe-art models on three public datasets DocRED, DGA, and CDR.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136009863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-resource automatic speech recognition is a challenging task. To resolve this issue, multilingual meta-learning learns a better model initialization from many source languages, allowing for rapid adaption to target languages. However, differences in data scales and learning difficulties vary greatly from one language to another. As a result, the model favors large-scale and simple source languages. Moreover, the shared semantic space of various languages is difficult to learn due to a lack of restrictions on multilingual pre-training. In this paper, we propose a meta adversarial learning approach to address this problem. The meta-learner will be guided to learn language-independent information by using an adversarial auxiliary objective of language identification, which makes the shared semantic space more compact and improves model generalization. Additionally, we optimize adversarial training using Wasserstein distance and temporal normalization, enabling more stable and simple training. Experiment results on IARPA BABEL and OpenSLR show a significant performance improvement. It also outperforms state-of-the-art results by a large margin in all target languages, and especially in few-shot settings. Finally, we demonstrate how our method is superior by using t-SNE visualization.
{"title":"Meta adversarial learning improves low-resource speech recognition","authors":"Yaqi Chen, Xukui Yang, Hao Zhang, Wenlin Zhang, Dan Qu, Cong Chen","doi":"10.1016/j.csl.2023.101576","DOIUrl":"10.1016/j.csl.2023.101576","url":null,"abstract":"<div><p><span>Low-resource automatic speech recognition is a challenging task. To resolve this issue, multilingual meta-learning learns a better model initialization from many source languages, allowing for rapid adaption to target languages. However, differences in data scales and learning difficulties vary greatly from one language to another. As a result, the model favors large-scale and simple source languages. Moreover, the shared </span>semantic space<span> of various languages is difficult to learn due to a lack of restrictions on multilingual pre-training. In this paper, we propose a meta adversarial learning approach to address this problem. The meta-learner will be guided to learn language-independent information by using an adversarial auxiliary objective of language identification, which makes the shared semantic space more compact and improves model generalization. Additionally, we optimize adversarial training using Wasserstein distance and temporal normalization, enabling more stable and simple training. Experiment results on IARPA BABEL and OpenSLR show a significant performance improvement. It also outperforms state-of-the-art results by a large margin in all target languages, and especially in few-shot settings. Finally, we demonstrate how our method is superior by using t-SNE visualization.</span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135963787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-12DOI: 10.1016/j.csl.2023.101573
Dagmar Bittner , Claudia Frankenberg , Johannes Schröder
The present study aims at improving the predictive power of the use of pronouns in computational modeling of the risk of Alzheimer's dementia (AD) by (i) further determining the onset of increased pronoun use in AD and (ii) providing insights into the linguistic contexts affected by the increase early on. Pronoun use was compared longitudinally between subjects who either stayed cognitively intact (CTR-group, n = 5) or who had developed AD upon follow-up after 10–12 years (AD-group, n = 5). Data were taken from semi-structured biographical interviews, which stem from the Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE). The first interview (baseline) was conducted when all participants were still cognitively healthy. Analyses concerned the proportional distribution of 12 pronoun types and linguistic contexts of increased use. Already at baseline, the AD-group produced a significantly higher proportion of D-pronouns (der, die, das, etc.) than the CTR-group. The increase in D-pronouns did not affect linguistic contexts favoring the use of personal pronouns. Instead, we found a significantly higher proportion of D-pronouns referring to family members and a significantly higher proportion of personal pronouns referring to non-family humans in the AD-group than in the CTR-group. Our results suggest that the predictive power of the use of pronouns can be significantly improved in computational modeling of the risk of AD by assessing language material that induces the use of pronouns in linguistic contexts affected by the increase.
{"title":"Pronoun use in preclinical and early stages of Alzheimer's dementia","authors":"Dagmar Bittner , Claudia Frankenberg , Johannes Schröder","doi":"10.1016/j.csl.2023.101573","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101573","url":null,"abstract":"<div><p>The present study aims at improving the predictive power of the use of pronouns in computational modeling of the risk of Alzheimer's dementia (AD) by (i) further determining the onset of increased pronoun use in AD and (ii) providing insights into the linguistic contexts affected by the increase early on. Pronoun use was compared longitudinally between subjects who either stayed cognitively intact (CTR-group, <em>n</em> = 5) or who had developed AD upon follow-up after 10–12 years (AD-group, <em>n</em> = 5). Data were taken from semi-structured biographical interviews, which stem from the Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE). The first interview (baseline) was conducted when all participants were still cognitively healthy. Analyses concerned the proportional distribution of 12 pronoun types and linguistic contexts of increased use. Already at baseline, the AD-group produced a significantly higher proportion of <em>D-pronouns</em> (<em>der, die, das</em>, etc.) than the CTR-group. The increase in <em>D-pronouns</em> did not affect linguistic contexts favoring the use of <em>personal pronouns</em>. Instead, we found a significantly higher proportion of <em>D-pronouns</em> referring to family members and a significantly higher proportion of <em>personal pronouns</em> referring to non-family <em>humans</em> in the AD-group than in the CTR-group. Our results suggest that the predictive power of the use of pronouns can be significantly improved in computational modeling of the risk of AD by assessing language material that induces the use of pronouns in linguistic contexts affected by the increase.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49836129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11DOI: 10.1016/j.csl.2023.101572
Wei-Tyng Hong, Kuldeep Singh Rana
In recent years, Temporal Convolutional Networks (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of 3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.
{"title":"A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement","authors":"Wei-Tyng Hong, Kuldeep Singh Rana","doi":"10.1016/j.csl.2023.101572","DOIUrl":"https://doi.org/10.1016/j.csl.2023.101572","url":null,"abstract":"<div><p><span>In recent years, Temporal Convolutional Networks<span> (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme, specifically designed for the sliced TCN framework. The proposed scheme involves the element-wise multiplication of shifting weight techniques for each channel of the TCN slice. Utilizing a cyclically shifted approach, these weights capture information from neighboring channels, uncovering the dependencies between adjacent channels. By combining the channel-wise weighted TCN output and subsequently estimating a masking function, the proposed method effectively suppresses noise components, leading to enhanced speech quality. To train and evaluate our proposed method, we utilize speech datasets that consist of various noise types at different levels. To optimize the performance of the proposed end-to-end enhancement system, we adopt the Scale-Invariant Signal-to-Noise Ratio (SI-SNR) objective function. Experimental results demonstrate the effectiveness of our proposed TCN channel-wise weighting method, with a significant average improvement of approximately 9.8% in SI-SNR for the unseen noise dataset. This improvement was observed at an SNR of </span></span><span><math><mo>−</mo></math></span>3 dB for both non-channel-wise weighting schemes and the proposed channel-wise weighting schemes within the Multi-slicing TCNs framework. The main advantage of the proposed approach is its ability to address the challenges of uneven and biased output from TCN slices, particularly when dealing with highly non-stationary, noisy speech signals infused with speech-like noise. This leads to more robust performance in various real-world applications.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}