首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction TR-Net:用于联合实体和关系提取的令牌关系启发填表网络
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-09 DOI: 10.1016/j.csl.2024.101749
Yongle Kong , Zhihao Yang , Zeyuan Ding , Wenfei Liu , Shiqi Zhang , Jianan Xu , Hongfei Lin
Recently, table filling models have achieved promising performance in jointly extracting relation triplets from complex sentences, leveraging their inherent structural advantage of delineating entities and relations as table cells. Nonetheless, these models predominantly concentrate on the cells corresponding to entity pairs within the predicted tables, neglecting the interrelations among other token pairs. This oversight can potentially lead to the exclusion of essential token information. To address these challenges, we introduce the Token Relation-Inspired Network (TR-Net), a novel framework for the joint extraction of entities and relations. It encompasses a token relation generator that adaptively constructs a token relation table, concentrating on the prominent token cells. Moreover, it also uses a structure-enhanced encoder that integrates the structural and sequential data of sentences via a highway gate mechanism. Our experimental analysis demonstrates that TR-Net delivers considerable enhancements and achieves state-of-the-art performance on four public datasets.
最近,表格填充模型利用其固有的结构优势,将实体和关系划分为表格单元,在联合提取复杂句子中的关系三元组方面取得了可喜的成绩。然而,这些模型主要集中于预测表格中实体对对应的单元格,而忽略了其他标记对之间的相互关系。这种疏忽有可能导致重要的标记信息被排除在外。为了应对这些挑战,我们引入了令牌关系启发网络(TR-Net),这是一个联合提取实体和关系的新型框架。它包括一个令牌关系生成器,该生成器能自适应地构建令牌关系表,并集中于突出的令牌单元。此外,它还使用了结构增强编码器,通过高速公路门机制整合句子的结构和顺序数据。我们的实验分析表明,TR-Net 在四个公共数据集上实现了相当大的提升,并达到了最先进的性能。
{"title":"TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction","authors":"Yongle Kong ,&nbsp;Zhihao Yang ,&nbsp;Zeyuan Ding ,&nbsp;Wenfei Liu ,&nbsp;Shiqi Zhang ,&nbsp;Jianan Xu ,&nbsp;Hongfei Lin","doi":"10.1016/j.csl.2024.101749","DOIUrl":"10.1016/j.csl.2024.101749","url":null,"abstract":"<div><div>Recently, table filling models have achieved promising performance in jointly extracting relation triplets from complex sentences, leveraging their inherent structural advantage of delineating entities and relations as table cells. Nonetheless, these models predominantly concentrate on the cells corresponding to entity pairs within the predicted tables, neglecting the interrelations among other token pairs. This oversight can potentially lead to the exclusion of essential token information. To address these challenges, we introduce the <em>Token Relation-Inspired Network (TR-Net)</em>, a novel framework for the joint extraction of entities and relations. It encompasses a token relation generator that adaptively constructs a token relation table, concentrating on the prominent token cells. Moreover, it also uses a structure-enhanced encoder that integrates the structural and sequential data of sentences via a highway gate mechanism. Our experimental analysis demonstrates that TR-Net delivers considerable enhancements and achieves state-of-the-art performance on four public datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101749"},"PeriodicalIF":3.1,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification CLIPMulti:探索用于零镜头文本分类的多模态增强型 CLIP 的性能
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-07 DOI: 10.1016/j.csl.2024.101748
Peng Wang , Dagang Li , Xuesi Hu , Yongmei Wang , Youhua Zhang
Zero-shot text classification does not require large amounts of labeled data and is designed to handle text classification tasks that lack annotated training data. Existing zero-shot text classification uses either a text–text matching paradigm or a text–image matching paradigm, which shows good performance on different benchmark datasets. However, the existing classification paradigms only consider a single modality for text matching, and little attention is paid to the help of multimodality for text classification. In order to incorporate multimodality into zero-shot text classification, we propose a multimodal enhanced CLIP framework (CLIPMulti), which employs a text–image&text matching paradigm to enhance the effectiveness of zero-shot text classification. Three different image and text combinations are tested for their effects on zero-shot text classification, and a matching method (Match-CLIPMulti) is further proposed to find the corresponding text based on the classified images automatically. We conducted experiments on seven publicly available zero-shot text classification datasets and achieved competitive performance. In addition, we analyzed the effect of different parameters on the Match-CLIPMulti experiments. We hope this work will bring more thoughts and explorations on multimodal fusion in language tasks.
零镜头文本分类不需要大量标注数据,旨在处理缺乏标注训练数据的文本分类任务。现有的零镜头文本分类使用文本-文本匹配范式或文本-图像匹配范式,在不同的基准数据集上显示出良好的性能。然而,现有的分类范式只考虑了文本匹配的单一模态,很少关注多模态对文本分类的帮助。为了将多模态纳入零镜头文本分类,我们提出了一种多模态增强型 CLIP 框架(CLIPMulti),它采用文本-图像&文本匹配范式来增强零镜头文本分类的有效性。我们测试了三种不同的图像和文本组合对零镜头文本分类的影响,并进一步提出了一种匹配方法(Match-CLIPMulti),以根据分类图像自动查找相应的文本。我们在七个公开的零镜头文本分类数据集上进行了实验,并取得了具有竞争力的性能。此外,我们还分析了不同参数对 Match-CLIPMulti 实验的影响。我们希望这项工作能为语言任务中的多模态融合带来更多思考和探索。
{"title":"CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification","authors":"Peng Wang ,&nbsp;Dagang Li ,&nbsp;Xuesi Hu ,&nbsp;Yongmei Wang ,&nbsp;Youhua Zhang","doi":"10.1016/j.csl.2024.101748","DOIUrl":"10.1016/j.csl.2024.101748","url":null,"abstract":"<div><div>Zero-shot text classification does not require large amounts of labeled data and is designed to handle text classification tasks that lack annotated training data. Existing zero-shot text classification uses either a text–text matching paradigm or a text–image matching paradigm, which shows good performance on different benchmark datasets. However, the existing classification paradigms only consider a single modality for text matching, and little attention is paid to the help of multimodality for text classification. In order to incorporate multimodality into zero-shot text classification, we propose a multimodal enhanced CLIP framework (CLIPMulti), which employs a text–image&amp;text matching paradigm to enhance the effectiveness of zero-shot text classification. Three different image and text combinations are tested for their effects on zero-shot text classification, and a matching method (Match-CLIPMulti) is further proposed to find the corresponding text based on the classified images automatically. We conducted experiments on seven publicly available zero-shot text classification datasets and achieved competitive performance. In addition, we analyzed the effect of different parameters on the Match-CLIPMulti experiments. We hope this work will bring more thoughts and explorations on multimodal fusion in language tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101748"},"PeriodicalIF":3.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniKDD: A Unified Generative model for Knowledge-driven Dialogue UniKDD:知识驱动对话的统一生成模型
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.csl.2024.101740
Qian Wang , Yan Chen , Yang Wang , Xu Wang
knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at https://github.com/qianandfei/UniKDD.git.
知识驱动对话(KDD)的目的是引入外部知识库,生成信息丰富且流畅的回应。然而,以往的研究采用不同的模型来完成 KDD 的子任务,忽略了子任务之间的联系,导致训练和推理困难。为了解决上述问题,我们提出了统一的 KDD 生成模型 UniKDD,它将所有子任务建模为一个生成任务,加强了任务之间的联系,方便了训练和推理。具体来说,UniKDD 将复杂的 KDD 任务简化为三个主要子任务,即实体预测、属性预测和对话生成。这些任务被转化为文本生成任务,并通过端到端的方式进行训练。在推理阶段,UniKDD 首先根据对话历史记录预测一组用于当前回合对话的实体。然后,对于每个预测的实体,UniKDD 根据对话历史预测相应的属性。最后,UniKDD 利用对话历史和预测的知识三元组生成高质量和信息丰富的回复。实验结果表明,我们提出的 UniKDD 可以很好地完成 KDD 任务,在知识选择和响应生成的评估方面优于基线。代码见 https://github.com/qianandfei/UniKDD.git。
{"title":"UniKDD: A Unified Generative model for Knowledge-driven Dialogue","authors":"Qian Wang ,&nbsp;Yan Chen ,&nbsp;Yang Wang ,&nbsp;Xu Wang","doi":"10.1016/j.csl.2024.101740","DOIUrl":"10.1016/j.csl.2024.101740","url":null,"abstract":"<div><div>knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at <span><span>https://github.com/qianandfei/UniKDD.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101740"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the ability of LLMs to classify written proficiency levels 探索法律硕士划分书面能力水平的能力
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-29 DOI: 10.1016/j.csl.2024.101745
Susanne DeVore
This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.
本文采用基准研究设计方法,测试了 LLM 对英语和普通话学习者所写文章的语言水平评分进行分类的能力。首先,使用纯基础教学提示语测试了五个变量(LLM 模型、提示语版本、提示语、评分标准和温度)对评分准确性的影响。其次,测试结果的一致性。第三,利用第一次和第二次测试中表现最好的一致条件,测试添加示例和/或能力指南以及使用零、一和少量思维链提示技术对准确性评级的影响。虽然测试结果没有达到实际应用所需的水平,但可以为 LLM 和提示技术的持续开发提供参考,从而提高准确率。本文重点介绍了语言学领域之外有关提示工程的最新研究,并选择了理论上与能力评级相关的提示变量和技术。最后,本文讨论了从这些测试中获得的关键启示,这些启示可以为未来的开发提供参考,以及为什么在其他情况下有效的方法在熟练程度评级中却不那么有效。
{"title":"Exploring the ability of LLMs to classify written proficiency levels","authors":"Susanne DeVore","doi":"10.1016/j.csl.2024.101745","DOIUrl":"10.1016/j.csl.2024.101745","url":null,"abstract":"<div><div>This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101745"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entity and relationship extraction based on span contribution evaluation and focusing framework 基于跨度贡献评估和聚焦框架的实体和关系提取
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-29 DOI: 10.1016/j.csl.2024.101744
Qibin Li , Nianmin Yao , Nai Zhou , Jian Zhao
Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFFs, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.
实体和关系提取包括识别命名实体和提取实体之间的关系。现有研究侧重于增强跨度表示,但忽略了非目标跨度(即跨度为非实体或跨度对没有关系)对模型训练的影响。在这项工作中,我们提出了一个名为 CEFF 的跨度贡献评估和聚焦框架,通过预训练为句子中的每个非目标跨度分配一个贡献分值,以反映跨度对模型性能提升的贡献。这种方法在一定程度上考虑了不同跨度对模型训练的影响,使训练更有针对性。此外,利用非目标跨度的贡献分数,我们还引入了一种简化的模型变体,称为 CEFFs,它可以在利用较少跨度的情况下达到与所有跨度训练的模型相当的性能。这种方法降低了训练成本,提高了训练效率。通过广泛的验证,我们证明了我们的贡献分数能准确反映跨度贡献,并在五个基准数据集上取得了最先进的结果。
{"title":"Entity and relationship extraction based on span contribution evaluation and focusing framework","authors":"Qibin Li ,&nbsp;Nianmin Yao ,&nbsp;Nai Zhou ,&nbsp;Jian Zhao","doi":"10.1016/j.csl.2024.101744","DOIUrl":"10.1016/j.csl.2024.101744","url":null,"abstract":"<div><div>Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFF<span><math><msub><mrow></mrow><mrow><mi>s</mi></mrow></msub></math></span>, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101744"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taking relations as known conditions: A tagging based method for relational triple extraction 将关系作为已知条件基于标记的关系三提取方法
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-24 DOI: 10.1016/j.csl.2024.101734
Guanqing Kong , Qi Lei
Relational triple extraction refers to extracting entities and relations from natural texts, which is a crucial task in the construction of knowledge graph. Recently, tagging based methods have received increasing attention because of their simple and effective structural form. Among them, the two-step extraction method is easy to cause the problem of category imbalance. To address this issue, we propose a novel two-step extraction method, which first extracts subjects, generates a fixed-size embedding for each relation, and then regards these relations as known conditions to extract the objects directly with the identified subjects. In order to eliminate the influence of irrelevant relations when predicting objects, we use a relation-special attention mechanism and a gate unit to select appropriate relations. In addition, most current models do not account for two-way interaction between tasks, so we design a feature interactive network to achieve bidirectional interaction between subject and object extraction tasks and enhance their connection. Experimental results on NYT, WebNLG, NYT and WebNLG datasets show that our model is competitive among joint extraction models.
关系三元提取是指从自然文本中提取实体和关系,这是构建知识图谱的一项重要任务。近年来,基于标记的方法因其简单有效的结构形式而受到越来越多的关注。其中,两步提取法容易造成类别不平衡的问题。针对这一问题,我们提出了一种新颖的两步提取法,即首先提取主体,为每种关系生成固定大小的嵌入,然后将这些关系视为已知条件,直接提取与所识别主体相关的对象。为了在预测对象时消除无关关系的影响,我们使用了关系特别关注机制和门单元来选择适当的关系。此外,目前大多数模型都没有考虑到任务之间的双向交互,因此我们设计了一个特征交互网络,以实现主体和对象提取任务之间的双向交互,增强它们之间的联系。在 NYT、WebNLG、NYT⋆ 和 WebNLG⋆ 数据集上的实验结果表明,我们的模型在联合提取模型中具有竞争力。
{"title":"Taking relations as known conditions: A tagging based method for relational triple extraction","authors":"Guanqing Kong ,&nbsp;Qi Lei","doi":"10.1016/j.csl.2024.101734","DOIUrl":"10.1016/j.csl.2024.101734","url":null,"abstract":"<div><div>Relational triple extraction refers to extracting entities and relations from natural texts, which is a crucial task in the construction of knowledge graph. Recently, tagging based methods have received increasing attention because of their simple and effective structural form. Among them, the two-step extraction method is easy to cause the problem of category imbalance. To address this issue, we propose a novel two-step extraction method, which first extracts subjects, generates a fixed-size embedding for each relation, and then regards these relations as known conditions to extract the objects directly with the identified subjects. In order to eliminate the influence of irrelevant relations when predicting objects, we use a relation-special attention mechanism and a gate unit to select appropriate relations. In addition, most current models do not account for two-way interaction between tasks, so we design a feature interactive network to achieve bidirectional interaction between subject and object extraction tasks and enhance their connection. Experimental results on NYT, WebNLG, NYT<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> and WebNLG<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> datasets show that our model is competitive among joint extraction models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101734"},"PeriodicalIF":3.1,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures 对话语音有什么复杂的?基于 HMM 和基于变换器的 ASR 架构比较
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.csl.2024.101738
Julian Linke , Bernhard C. Geiger , Gernot Kubin , Barbara Schuppler
Highly performing speech recognition is important for more fluent human–machine interaction (e.g., dialogue systems). Modern ASR architectures achieve human-level recognition performance on read speech but still perform sub-par on conversational speech, which arguably is or, at least, will be instrumental for human–machine interaction. Understanding the factors behind this shortcoming of modern ASR systems may suggest directions for improving them. In this work, we compare the performances of HMM- vs. transformer-based ASR architectures on a corpus of Austrian German conversational speech. Specifically, we investigate how strongly utterance length, prosody, pronunciation, and utterance complexity as measured by perplexity affect different ASR architectures. Among other findings, we observe that single-word utterances – which are characteristic of conversational speech and constitute roughly 30% of the corpus – are recognized more accurately if their F0 contour is flat; for longer utterances, the effects of the F0 contour tend to be weaker. We further find that zero-shot systems require longer utterance lengths and are less robust to pronunciation variation, which indicates that pronunciation lexicons and fine-tuning on the respective corpus are essential ingredients for the successful recognition of conversational speech.
高性能的语音识别对于更流畅的人机交互(如对话系统)非常重要。现代 ASR 架构在阅读语音方面达到了人类水平的识别性能,但在会话语音方面的表现仍然不尽如人意,而会话语音可以说是或至少将是人机交互的关键。了解现代自动语音识别系统这一缺陷背后的因素,或许能为改进这些系统指明方向。在这项研究中,我们比较了基于 HMM 和转换器的 ASR 架构在奥地利德语对话语音语料库中的表现。具体来说,我们研究了语篇长度、拟声词、发音和语篇复杂度对不同 ASR 架构的影响。除其他发现外,我们还观察到,如果单字语篇的 F0 等高线是平坦的,则其识别率更高;对于较长的语篇,F0 等高线的影响往往较弱,而单字语篇是会话语音的特征,约占语料库的 30%。我们还发现,"0-shot "系统需要更长的语篇长度,而且对发音变化的稳健性较差,这表明发音词典和对相应语料的微调是成功识别会话语音的基本要素。
{"title":"What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures","authors":"Julian Linke ,&nbsp;Bernhard C. Geiger ,&nbsp;Gernot Kubin ,&nbsp;Barbara Schuppler","doi":"10.1016/j.csl.2024.101738","DOIUrl":"10.1016/j.csl.2024.101738","url":null,"abstract":"<div><div>Highly performing speech recognition is important for more fluent human–machine interaction (e.g., dialogue systems). Modern ASR architectures achieve human-level recognition performance on read speech but still perform sub-par on conversational speech, which arguably is or, at least, will be instrumental for human–machine interaction. Understanding the factors behind this shortcoming of modern ASR systems may suggest directions for improving them. In this work, we compare the performances of HMM- vs. transformer-based ASR architectures on a corpus of Austrian German conversational speech. Specifically, we investigate how strongly utterance length, prosody, pronunciation, and utterance complexity as measured by perplexity affect different ASR architectures. Among other findings, we observe that single-word utterances – which are characteristic of conversational speech and constitute roughly 30% of the corpus – are recognized more accurately if their F0 contour is flat; for longer utterances, the effects of the F0 contour tend to be weaker. We further find that zero-shot systems require longer utterance lengths and are less robust to pronunciation variation, which indicates that pronunciation lexicons and fine-tuning on the respective corpus are essential ingredients for the successful recognition of conversational speech.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101738"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tickling translations: Small but mighty open-sourced transformers bring English PUN-ny entities to life in French! 有趣的翻译:小而强大的开源变形金刚让英语 PUN-ny 实体在法语中栩栩如生!
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.csl.2024.101739
Farhan Dhanani, Muhammad Rafi, Muhammad Atif Tahir
Recent advancements in transformer-based language models have demonstrated substantial progress in producing good translations. Despite these achievements, challenges persist in translating playful requests, especially when users intentionally introduce humor. Deciphering the hidden pun among such playful requests is one of the major difficulties for modern language models, which causes user dissatisfaction. This paper targets a specific niche of humor translation, which is the translation of English-named entities containing puns into French using small-scale open-sourced transformer models. The transformer architecture serves as a foundation for popular language models like chatGPT. It allows learning long-range contextual relationships within sequences. The main novelty of the paper is the proposed extractive question/answering (Q/A) styled technique based on the transformers to find relevant translations for the provided English nouns using the openly available parallel corpora. To evaluate the effectiveness of our method, we utilize a dataset provided by the JOKER CLEF automatic pun and humor translation 2022 team. The dataset contains single-word nouns from popular novels, anime, movies, and games, each containing a pun. The discussed methodology and experimental framework are adaptable and can be extended to any language pair for which an open, available parallel corpus exists. This flexibility underscores the broader applicability of our findings and suggests the potential for enhancing humor translation across various language combinations.
最近,基于转换器的语言模型在产生良好翻译方面取得了长足进步。尽管取得了这些成就,但在翻译俏皮请求时仍然存在挑战,尤其是当用户有意引入幽默时。破译这类俏皮请求中隐藏的双关语是现代语言模型面临的主要困难之一,这会引起用户的不满。本文针对幽默翻译的一个特定细分领域,即使用小型开源转换器模型将包含双关语的英文实体翻译成法文。转换器架构是 chatGPT 等流行语言模型的基础。它可以学习序列中的长距离上下文关系。本文的主要创新点是基于转换器提出的提取问/答(Q/A)式技术,利用公开的平行语料库为所提供的英语名词找到相关翻译。为了评估我们方法的有效性,我们使用了 JOKER CLEF 自动双关语和幽默翻译 2022 小组提供的数据集。该数据集包含流行小说、动漫、电影和游戏中的单词名词,每个名词都包含一个双关语。所讨论的方法和实验框架具有很强的适应性,可以扩展到任何存在开放、可用平行语料库的语言对。这种灵活性凸显了我们的研究成果具有更广泛的适用性,并表明了在各种语言组合中加强幽默翻译的潜力。
{"title":"Tickling translations: Small but mighty open-sourced transformers bring English PUN-ny entities to life in French!","authors":"Farhan Dhanani,&nbsp;Muhammad Rafi,&nbsp;Muhammad Atif Tahir","doi":"10.1016/j.csl.2024.101739","DOIUrl":"10.1016/j.csl.2024.101739","url":null,"abstract":"<div><div>Recent advancements in transformer-based language models have demonstrated substantial progress in producing good translations. Despite these achievements, challenges persist in translating playful requests, especially when users intentionally introduce humor. Deciphering the hidden pun among such playful requests is one of the major difficulties for modern language models, which causes user dissatisfaction. This paper targets a specific niche of humor translation, which is the translation of English-named entities containing puns into French using small-scale open-sourced transformer models. The transformer architecture serves as a foundation for popular language models like chatGPT. It allows learning long-range contextual relationships within sequences. The main novelty of the paper is the proposed extractive question/answering (Q/A) styled technique based on the transformers to find relevant translations for the provided English nouns using the openly available parallel corpora. To evaluate the effectiveness of our method, we utilize a dataset provided by the JOKER CLEF automatic pun and humor translation 2022 team. The dataset contains single-word nouns from popular novels, anime, movies, and games, each containing a pun. The discussed methodology and experimental framework are adaptable and can be extended to any language pair for which an open, available parallel corpus exists. This flexibility underscores the broader applicability of our findings and suggests the potential for enhancing humor translation across various language combinations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101739"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining replay and LoRA for continual learning in natural language understanding 结合重放和 LoRA,在自然语言理解中实现持续学习
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.csl.2024.101737
Zeinab Borhanifard, Heshaam Faili, Yadollah Yaghoobzadeh
Large language models have significantly improved dialogue systems through enhanced capabilities in understanding queries and generating responses. Despite these enhancements, task-oriented dialogue systems- – which power many intelligent assistants – face challenges when adapting to new domains and applications. This challenge arises from a phenomenon known as catastrophic forgetting, where models forget previously acquired knowledge when learning new tasks. This paper addresses this issue through continual learning techniques to preserve previously learned knowledge while seamlessly integrating new tasks and domains. We propose Experience Replay Informative-Low Rank Adaptation or ERI-LoRA, a hybrid continual learning method for natural language understanding in dialogue systems that effectively combines the replay-based methods with parameter-efficient techniques. Our experiments on intent detection and slot-filling tasks demonstrate that ERI-LoRA significantly outperforms competitive baselines in continual learning. The results of our catastrophic forgetting experiments demonstrate that ERI-LoRA maintains robust memory stability in the model, demonstrating its effectiveness in mitigating these effects.
大型语言模型通过增强理解查询和生成回复的能力,极大地改进了对话系统。尽管有了这些改进,但在适应新领域和新应用时,以任务为导向的对话系统--它为许多智能助手提供了动力--仍面临着挑战。这种挑战源于一种被称为灾难性遗忘的现象,即模型在学习新任务时会遗忘以前获得的知识。本文通过持续学习技术来解决这一问题,从而在无缝集成新任务和新领域的同时,保留以前学到的知识。我们提出了 "经验重放-信息低等级适应"(ERI-LoRA),这是一种用于对话系统中自然语言理解的混合持续学习方法,它有效地将基于重放的方法与参数高效技术相结合。我们在意图检测和插槽填充任务上的实验表明,ERI-LoRA 在持续学习方面的表现明显优于竞争基线。我们的灾难性遗忘实验结果表明,ERI-LoRA 在模型中保持了强大的记忆稳定性,证明了它在减轻这些影响方面的有效性。
{"title":"Combining replay and LoRA for continual learning in natural language understanding","authors":"Zeinab Borhanifard,&nbsp;Heshaam Faili,&nbsp;Yadollah Yaghoobzadeh","doi":"10.1016/j.csl.2024.101737","DOIUrl":"10.1016/j.csl.2024.101737","url":null,"abstract":"<div><div>Large language models have significantly improved dialogue systems through enhanced capabilities in understanding queries and generating responses. Despite these enhancements, task-oriented dialogue systems- – which power many intelligent assistants – face challenges when adapting to new domains and applications. This challenge arises from a phenomenon known as catastrophic forgetting, where models forget previously acquired knowledge when learning new tasks. This paper addresses this issue through continual learning techniques to preserve previously learned knowledge while seamlessly integrating new tasks and domains. We propose <strong>E</strong>xperience <strong>R</strong>eplay <strong>I</strong>nformative-<strong>Lo</strong>w <strong>R</strong>ank <strong>A</strong>daptation or ERI-LoRA, a hybrid continual learning method for natural language understanding in dialogue systems that effectively combines the replay-based methods with parameter-efficient techniques. Our experiments on intent detection and slot-filling tasks demonstrate that ERI-LoRA significantly outperforms competitive baselines in continual learning. The results of our catastrophic forgetting experiments demonstrate that ERI-LoRA maintains robust memory stability in the model, demonstrating its effectiveness in mitigating these effects.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101737"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing pipeline task-oriented dialogue systems using post-processing networks 利用后处理网络优化面向任务的管道对话系统
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.csl.2024.101742
Atsumoto Ohashi, Ryuichiro Higashinaka
Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing the dialogue performance of a pipeline system that consists of modules implemented with arbitrary methods for dialogue. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating that each module be differentiable. Through dialogue simulations and human evaluations on two well-studied task-oriented dialogue datasets, CamRest676 and MultiWOZ, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules. In addition, a comprehensive analysis of the results of the MultiWOZ experiments reveals the patterns of post-processing by PPNs that contribute to the overall dialogue performance of the system.
许多研究都提出了通过使用强化学习联合训练系统中的模块来优化整个管道任务导向对话系统对话性能的方法。然而,这些方法都有局限性,因为它们只能应用于使用可训练神经方法实现的模块。为了解决这个问题,我们提出了一种优化管道系统对话性能的方法,该管道系统由使用任意对话方法实现的模块组成。在我们的方法中,基于神经的组件(称为后处理网络(PPN))被安装在这样的系统中,对每个模块的输出进行后处理。所有后处理网络都会更新,通过强化学习提高系统的整体对话性能,而不要求每个模块都是可微分的。通过在 CamRest676 和 MultiWOZ 这两个经过充分研究的任务导向型对话数据集上进行对话模拟和人工评估,我们证明了我们的方法可以提高由不同模块组成的管道系统的对话性能。此外,对 MultiWOZ 实验结果的综合分析揭示了 PPN 的后处理模式对系统整体对话性能的贡献。
{"title":"Optimizing pipeline task-oriented dialogue systems using post-processing networks","authors":"Atsumoto Ohashi,&nbsp;Ryuichiro Higashinaka","doi":"10.1016/j.csl.2024.101742","DOIUrl":"10.1016/j.csl.2024.101742","url":null,"abstract":"<div><div>Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing the dialogue performance of a pipeline system that consists of modules implemented with arbitrary methods for dialogue. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating that each module be differentiable. Through dialogue simulations and human evaluations on two well-studied task-oriented dialogue datasets, CamRest676 and MultiWOZ, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules. In addition, a comprehensive analysis of the results of the MultiWOZ experiments reveals the patterns of post-processing by PPNs that contribute to the overall dialogue performance of the system.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101742"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1