Pub Date : 2024-11-09DOI: 10.1016/j.csl.2024.101749
Yongle Kong , Zhihao Yang , Zeyuan Ding , Wenfei Liu , Shiqi Zhang , Jianan Xu , Hongfei Lin
Recently, table filling models have achieved promising performance in jointly extracting relation triplets from complex sentences, leveraging their inherent structural advantage of delineating entities and relations as table cells. Nonetheless, these models predominantly concentrate on the cells corresponding to entity pairs within the predicted tables, neglecting the interrelations among other token pairs. This oversight can potentially lead to the exclusion of essential token information. To address these challenges, we introduce the Token Relation-Inspired Network (TR-Net), a novel framework for the joint extraction of entities and relations. It encompasses a token relation generator that adaptively constructs a token relation table, concentrating on the prominent token cells. Moreover, it also uses a structure-enhanced encoder that integrates the structural and sequential data of sentences via a highway gate mechanism. Our experimental analysis demonstrates that TR-Net delivers considerable enhancements and achieves state-of-the-art performance on four public datasets.
{"title":"TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction","authors":"Yongle Kong , Zhihao Yang , Zeyuan Ding , Wenfei Liu , Shiqi Zhang , Jianan Xu , Hongfei Lin","doi":"10.1016/j.csl.2024.101749","DOIUrl":"10.1016/j.csl.2024.101749","url":null,"abstract":"<div><div>Recently, table filling models have achieved promising performance in jointly extracting relation triplets from complex sentences, leveraging their inherent structural advantage of delineating entities and relations as table cells. Nonetheless, these models predominantly concentrate on the cells corresponding to entity pairs within the predicted tables, neglecting the interrelations among other token pairs. This oversight can potentially lead to the exclusion of essential token information. To address these challenges, we introduce the <em>Token Relation-Inspired Network (TR-Net)</em>, a novel framework for the joint extraction of entities and relations. It encompasses a token relation generator that adaptively constructs a token relation table, concentrating on the prominent token cells. Moreover, it also uses a structure-enhanced encoder that integrates the structural and sequential data of sentences via a highway gate mechanism. Our experimental analysis demonstrates that TR-Net delivers considerable enhancements and achieves state-of-the-art performance on four public datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101749"},"PeriodicalIF":3.1,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.csl.2024.101748
Peng Wang , Dagang Li , Xuesi Hu , Yongmei Wang , Youhua Zhang
Zero-shot text classification does not require large amounts of labeled data and is designed to handle text classification tasks that lack annotated training data. Existing zero-shot text classification uses either a text–text matching paradigm or a text–image matching paradigm, which shows good performance on different benchmark datasets. However, the existing classification paradigms only consider a single modality for text matching, and little attention is paid to the help of multimodality for text classification. In order to incorporate multimodality into zero-shot text classification, we propose a multimodal enhanced CLIP framework (CLIPMulti), which employs a text–image&text matching paradigm to enhance the effectiveness of zero-shot text classification. Three different image and text combinations are tested for their effects on zero-shot text classification, and a matching method (Match-CLIPMulti) is further proposed to find the corresponding text based on the classified images automatically. We conducted experiments on seven publicly available zero-shot text classification datasets and achieved competitive performance. In addition, we analyzed the effect of different parameters on the Match-CLIPMulti experiments. We hope this work will bring more thoughts and explorations on multimodal fusion in language tasks.
{"title":"CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification","authors":"Peng Wang , Dagang Li , Xuesi Hu , Yongmei Wang , Youhua Zhang","doi":"10.1016/j.csl.2024.101748","DOIUrl":"10.1016/j.csl.2024.101748","url":null,"abstract":"<div><div>Zero-shot text classification does not require large amounts of labeled data and is designed to handle text classification tasks that lack annotated training data. Existing zero-shot text classification uses either a text–text matching paradigm or a text–image matching paradigm, which shows good performance on different benchmark datasets. However, the existing classification paradigms only consider a single modality for text matching, and little attention is paid to the help of multimodality for text classification. In order to incorporate multimodality into zero-shot text classification, we propose a multimodal enhanced CLIP framework (CLIPMulti), which employs a text–image&text matching paradigm to enhance the effectiveness of zero-shot text classification. Three different image and text combinations are tested for their effects on zero-shot text classification, and a matching method (Match-CLIPMulti) is further proposed to find the corresponding text based on the classified images automatically. We conducted experiments on seven publicly available zero-shot text classification datasets and achieved competitive performance. In addition, we analyzed the effect of different parameters on the Match-CLIPMulti experiments. We hope this work will bring more thoughts and explorations on multimodal fusion in language tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101748"},"PeriodicalIF":3.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.csl.2024.101740
Qian Wang , Yan Chen , Yang Wang , Xu Wang
knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at https://github.com/qianandfei/UniKDD.git.
{"title":"UniKDD: A Unified Generative model for Knowledge-driven Dialogue","authors":"Qian Wang , Yan Chen , Yang Wang , Xu Wang","doi":"10.1016/j.csl.2024.101740","DOIUrl":"10.1016/j.csl.2024.101740","url":null,"abstract":"<div><div>knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at <span><span>https://github.com/qianandfei/UniKDD.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101740"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.csl.2024.101745
Susanne DeVore
This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.
{"title":"Exploring the ability of LLMs to classify written proficiency levels","authors":"Susanne DeVore","doi":"10.1016/j.csl.2024.101745","DOIUrl":"10.1016/j.csl.2024.101745","url":null,"abstract":"<div><div>This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101745"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.csl.2024.101744
Qibin Li , Nianmin Yao , Nai Zhou , Jian Zhao
Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFF, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.
{"title":"Entity and relationship extraction based on span contribution evaluation and focusing framework","authors":"Qibin Li , Nianmin Yao , Nai Zhou , Jian Zhao","doi":"10.1016/j.csl.2024.101744","DOIUrl":"10.1016/j.csl.2024.101744","url":null,"abstract":"<div><div>Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFF<span><math><msub><mrow></mrow><mrow><mi>s</mi></mrow></msub></math></span>, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101744"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.csl.2024.101734
Guanqing Kong , Qi Lei
Relational triple extraction refers to extracting entities and relations from natural texts, which is a crucial task in the construction of knowledge graph. Recently, tagging based methods have received increasing attention because of their simple and effective structural form. Among them, the two-step extraction method is easy to cause the problem of category imbalance. To address this issue, we propose a novel two-step extraction method, which first extracts subjects, generates a fixed-size embedding for each relation, and then regards these relations as known conditions to extract the objects directly with the identified subjects. In order to eliminate the influence of irrelevant relations when predicting objects, we use a relation-special attention mechanism and a gate unit to select appropriate relations. In addition, most current models do not account for two-way interaction between tasks, so we design a feature interactive network to achieve bidirectional interaction between subject and object extraction tasks and enhance their connection. Experimental results on NYT, WebNLG, NYT and WebNLG datasets show that our model is competitive among joint extraction models.
{"title":"Taking relations as known conditions: A tagging based method for relational triple extraction","authors":"Guanqing Kong , Qi Lei","doi":"10.1016/j.csl.2024.101734","DOIUrl":"10.1016/j.csl.2024.101734","url":null,"abstract":"<div><div>Relational triple extraction refers to extracting entities and relations from natural texts, which is a crucial task in the construction of knowledge graph. Recently, tagging based methods have received increasing attention because of their simple and effective structural form. Among them, the two-step extraction method is easy to cause the problem of category imbalance. To address this issue, we propose a novel two-step extraction method, which first extracts subjects, generates a fixed-size embedding for each relation, and then regards these relations as known conditions to extract the objects directly with the identified subjects. In order to eliminate the influence of irrelevant relations when predicting objects, we use a relation-special attention mechanism and a gate unit to select appropriate relations. In addition, most current models do not account for two-way interaction between tasks, so we design a feature interactive network to achieve bidirectional interaction between subject and object extraction tasks and enhance their connection. Experimental results on NYT, WebNLG, NYT<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> and WebNLG<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> datasets show that our model is competitive among joint extraction models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101734"},"PeriodicalIF":3.1,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1016/j.csl.2024.101738
Julian Linke , Bernhard C. Geiger , Gernot Kubin , Barbara Schuppler
Highly performing speech recognition is important for more fluent human–machine interaction (e.g., dialogue systems). Modern ASR architectures achieve human-level recognition performance on read speech but still perform sub-par on conversational speech, which arguably is or, at least, will be instrumental for human–machine interaction. Understanding the factors behind this shortcoming of modern ASR systems may suggest directions for improving them. In this work, we compare the performances of HMM- vs. transformer-based ASR architectures on a corpus of Austrian German conversational speech. Specifically, we investigate how strongly utterance length, prosody, pronunciation, and utterance complexity as measured by perplexity affect different ASR architectures. Among other findings, we observe that single-word utterances – which are characteristic of conversational speech and constitute roughly 30% of the corpus – are recognized more accurately if their F0 contour is flat; for longer utterances, the effects of the F0 contour tend to be weaker. We further find that zero-shot systems require longer utterance lengths and are less robust to pronunciation variation, which indicates that pronunciation lexicons and fine-tuning on the respective corpus are essential ingredients for the successful recognition of conversational speech.
高性能的语音识别对于更流畅的人机交互(如对话系统)非常重要。现代 ASR 架构在阅读语音方面达到了人类水平的识别性能,但在会话语音方面的表现仍然不尽如人意,而会话语音可以说是或至少将是人机交互的关键。了解现代自动语音识别系统这一缺陷背后的因素,或许能为改进这些系统指明方向。在这项研究中,我们比较了基于 HMM 和转换器的 ASR 架构在奥地利德语对话语音语料库中的表现。具体来说,我们研究了语篇长度、拟声词、发音和语篇复杂度对不同 ASR 架构的影响。除其他发现外,我们还观察到,如果单字语篇的 F0 等高线是平坦的,则其识别率更高;对于较长的语篇,F0 等高线的影响往往较弱,而单字语篇是会话语音的特征,约占语料库的 30%。我们还发现,"0-shot "系统需要更长的语篇长度,而且对发音变化的稳健性较差,这表明发音词典和对相应语料的微调是成功识别会话语音的基本要素。
{"title":"What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures","authors":"Julian Linke , Bernhard C. Geiger , Gernot Kubin , Barbara Schuppler","doi":"10.1016/j.csl.2024.101738","DOIUrl":"10.1016/j.csl.2024.101738","url":null,"abstract":"<div><div>Highly performing speech recognition is important for more fluent human–machine interaction (e.g., dialogue systems). Modern ASR architectures achieve human-level recognition performance on read speech but still perform sub-par on conversational speech, which arguably is or, at least, will be instrumental for human–machine interaction. Understanding the factors behind this shortcoming of modern ASR systems may suggest directions for improving them. In this work, we compare the performances of HMM- vs. transformer-based ASR architectures on a corpus of Austrian German conversational speech. Specifically, we investigate how strongly utterance length, prosody, pronunciation, and utterance complexity as measured by perplexity affect different ASR architectures. Among other findings, we observe that single-word utterances – which are characteristic of conversational speech and constitute roughly 30% of the corpus – are recognized more accurately if their F0 contour is flat; for longer utterances, the effects of the F0 contour tend to be weaker. We further find that zero-shot systems require longer utterance lengths and are less robust to pronunciation variation, which indicates that pronunciation lexicons and fine-tuning on the respective corpus are essential ingredients for the successful recognition of conversational speech.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101738"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1016/j.csl.2024.101739
Farhan Dhanani, Muhammad Rafi, Muhammad Atif Tahir
Recent advancements in transformer-based language models have demonstrated substantial progress in producing good translations. Despite these achievements, challenges persist in translating playful requests, especially when users intentionally introduce humor. Deciphering the hidden pun among such playful requests is one of the major difficulties for modern language models, which causes user dissatisfaction. This paper targets a specific niche of humor translation, which is the translation of English-named entities containing puns into French using small-scale open-sourced transformer models. The transformer architecture serves as a foundation for popular language models like chatGPT. It allows learning long-range contextual relationships within sequences. The main novelty of the paper is the proposed extractive question/answering (Q/A) styled technique based on the transformers to find relevant translations for the provided English nouns using the openly available parallel corpora. To evaluate the effectiveness of our method, we utilize a dataset provided by the JOKER CLEF automatic pun and humor translation 2022 team. The dataset contains single-word nouns from popular novels, anime, movies, and games, each containing a pun. The discussed methodology and experimental framework are adaptable and can be extended to any language pair for which an open, available parallel corpus exists. This flexibility underscores the broader applicability of our findings and suggests the potential for enhancing humor translation across various language combinations.
{"title":"Tickling translations: Small but mighty open-sourced transformers bring English PUN-ny entities to life in French!","authors":"Farhan Dhanani, Muhammad Rafi, Muhammad Atif Tahir","doi":"10.1016/j.csl.2024.101739","DOIUrl":"10.1016/j.csl.2024.101739","url":null,"abstract":"<div><div>Recent advancements in transformer-based language models have demonstrated substantial progress in producing good translations. Despite these achievements, challenges persist in translating playful requests, especially when users intentionally introduce humor. Deciphering the hidden pun among such playful requests is one of the major difficulties for modern language models, which causes user dissatisfaction. This paper targets a specific niche of humor translation, which is the translation of English-named entities containing puns into French using small-scale open-sourced transformer models. The transformer architecture serves as a foundation for popular language models like chatGPT. It allows learning long-range contextual relationships within sequences. The main novelty of the paper is the proposed extractive question/answering (Q/A) styled technique based on the transformers to find relevant translations for the provided English nouns using the openly available parallel corpora. To evaluate the effectiveness of our method, we utilize a dataset provided by the JOKER CLEF automatic pun and humor translation 2022 team. The dataset contains single-word nouns from popular novels, anime, movies, and games, each containing a pun. The discussed methodology and experimental framework are adaptable and can be extended to any language pair for which an open, available parallel corpus exists. This flexibility underscores the broader applicability of our findings and suggests the potential for enhancing humor translation across various language combinations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101739"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models have significantly improved dialogue systems through enhanced capabilities in understanding queries and generating responses. Despite these enhancements, task-oriented dialogue systems- – which power many intelligent assistants – face challenges when adapting to new domains and applications. This challenge arises from a phenomenon known as catastrophic forgetting, where models forget previously acquired knowledge when learning new tasks. This paper addresses this issue through continual learning techniques to preserve previously learned knowledge while seamlessly integrating new tasks and domains. We propose Experience Replay Informative-Low Rank Adaptation or ERI-LoRA, a hybrid continual learning method for natural language understanding in dialogue systems that effectively combines the replay-based methods with parameter-efficient techniques. Our experiments on intent detection and slot-filling tasks demonstrate that ERI-LoRA significantly outperforms competitive baselines in continual learning. The results of our catastrophic forgetting experiments demonstrate that ERI-LoRA maintains robust memory stability in the model, demonstrating its effectiveness in mitigating these effects.
{"title":"Combining replay and LoRA for continual learning in natural language understanding","authors":"Zeinab Borhanifard, Heshaam Faili, Yadollah Yaghoobzadeh","doi":"10.1016/j.csl.2024.101737","DOIUrl":"10.1016/j.csl.2024.101737","url":null,"abstract":"<div><div>Large language models have significantly improved dialogue systems through enhanced capabilities in understanding queries and generating responses. Despite these enhancements, task-oriented dialogue systems- – which power many intelligent assistants – face challenges when adapting to new domains and applications. This challenge arises from a phenomenon known as catastrophic forgetting, where models forget previously acquired knowledge when learning new tasks. This paper addresses this issue through continual learning techniques to preserve previously learned knowledge while seamlessly integrating new tasks and domains. We propose <strong>E</strong>xperience <strong>R</strong>eplay <strong>I</strong>nformative-<strong>Lo</strong>w <strong>R</strong>ank <strong>A</strong>daptation or ERI-LoRA, a hybrid continual learning method for natural language understanding in dialogue systems that effectively combines the replay-based methods with parameter-efficient techniques. Our experiments on intent detection and slot-filling tasks demonstrate that ERI-LoRA significantly outperforms competitive baselines in continual learning. The results of our catastrophic forgetting experiments demonstrate that ERI-LoRA maintains robust memory stability in the model, demonstrating its effectiveness in mitigating these effects.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101737"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.csl.2024.101742
Atsumoto Ohashi, Ryuichiro Higashinaka
Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing the dialogue performance of a pipeline system that consists of modules implemented with arbitrary methods for dialogue. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating that each module be differentiable. Through dialogue simulations and human evaluations on two well-studied task-oriented dialogue datasets, CamRest676 and MultiWOZ, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules. In addition, a comprehensive analysis of the results of the MultiWOZ experiments reveals the patterns of post-processing by PPNs that contribute to the overall dialogue performance of the system.
{"title":"Optimizing pipeline task-oriented dialogue systems using post-processing networks","authors":"Atsumoto Ohashi, Ryuichiro Higashinaka","doi":"10.1016/j.csl.2024.101742","DOIUrl":"10.1016/j.csl.2024.101742","url":null,"abstract":"<div><div>Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing the dialogue performance of a pipeline system that consists of modules implemented with arbitrary methods for dialogue. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating that each module be differentiable. Through dialogue simulations and human evaluations on two well-studied task-oriented dialogue datasets, CamRest676 and MultiWOZ, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules. In addition, a comprehensive analysis of the results of the MultiWOZ experiments reveals the patterns of post-processing by PPNs that contribute to the overall dialogue performance of the system.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101742"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}