首页 > 最新文献

Proceedings of COLING. International Conference on Computational Linguistics最新文献

英文 中文
uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers uChecker:蒙面预训练语言模型作为无监督中文拼写检查器
Pub Date : 2022-09-15 DOI: 10.48550/arXiv.2209.07068
Piji Li
The task of Chinese Spelling Check (CSC) is aiming to detect and correct spelling errors that can be found in the text. While manually annotating a high-quality dataset is expensive and time-consuming, thus the scale of the training dataset is usually very small (e.g., SIGHAN15 only contains 2339 samples for training), therefore supervised-learning based models usually suffer the data sparsity limitation and over-fitting issue, especially in the era of big language models. In this paper, we are dedicated to investigating the unsupervised paradigm to address the CSC problem and we propose a framework named uChecker to conduct unsupervised spelling error detection and correction. Masked pretrained language models such as BERT are introduced as the backbone model considering their powerful language diagnosis capability. Benefiting from the various and flexible MASKing operations, we propose a Confusionset-guided masking strategy to fine-train the masked language model to further improve the performance of unsupervised detection and correction. Experimental results on standard datasets demonstrate the effectiveness of our proposed model uChecker in terms of character-level and sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of spelling error detection and correction respectively.
汉语拼写检查(CSC)的任务是发现并纠正文本中的拼写错误。而手工标注高质量的数据集是昂贵且耗时的,因此训练数据集的规模通常非常小(例如,SIGHAN15只包含2339个训练样本),因此基于监督学习的模型通常会受到数据稀疏性限制和过拟合问题,特别是在大语言模型时代。在本文中,我们致力于研究无监督范式来解决CSC问题,并提出了一个名为uChecker的框架来进行无监督拼写错误检测和纠正。考虑到BERT等屏蔽预训练语言模型具有强大的语言诊断能力,本文将其作为主干模型。利用掩蔽操作的多样性和灵活性,我们提出了一种混淆集引导掩蔽策略来精细训练掩蔽语言模型,以进一步提高无监督检测和校正的性能。在标准数据集上的实验结果表明,本文提出的uChecker模型分别在字符级和句子级的拼写错误检测和纠正任务上的准确性、精度、召回率和F1-Measure方面是有效的。
{"title":"uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers","authors":"Piji Li","doi":"10.48550/arXiv.2209.07068","DOIUrl":"https://doi.org/10.48550/arXiv.2209.07068","url":null,"abstract":"The task of Chinese Spelling Check (CSC) is aiming to detect and correct spelling errors that can be found in the text. While manually annotating a high-quality dataset is expensive and time-consuming, thus the scale of the training dataset is usually very small (e.g., SIGHAN15 only contains 2339 samples for training), therefore supervised-learning based models usually suffer the data sparsity limitation and over-fitting issue, especially in the era of big language models. In this paper, we are dedicated to investigating the unsupervised paradigm to address the CSC problem and we propose a framework named uChecker to conduct unsupervised spelling error detection and correction. Masked pretrained language models such as BERT are introduced as the backbone model considering their powerful language diagnosis capability. Benefiting from the various and flexible MASKing operations, we propose a Confusionset-guided masking strategy to fine-train the masked language model to further improve the performance of unsupervised detection and correction. Experimental results on standard datasets demonstrate the effectiveness of our proposed model uChecker in terms of character-level and sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of spelling error detection and correction respectively.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"62 1","pages":"2812-2822"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72803388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scene Graph Modification as Incremental Structure Expanding 场景图修改作为增量结构扩展
Pub Date : 2022-09-15 DOI: 10.48550/arXiv.2209.09093
Xuming Hu, Zhijiang Guo, Yuwei Fu, Lijie Wen, Philip S. Yu
A scene graph is a semantic representation that expresses the objects, attributes, and relationships between objects in a scene. Scene graphs play an important role in many cross modality tasks, as they are able to capture the interactions between images and texts. In this paper, we focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query. Unlike previous approaches that rebuilt the entire scene graph, we frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE). ISE constructs the target graph by incrementally expanding the source graph without changing the unmodified structure. Based on ISE, we further propose a model that iterates between nodes prediction and edges prediction, inferring more accurate and harmonious expansion decisions progressively. In addition, we construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets. Experiments on four benchmarks demonstrate the effectiveness of our approach, which surpasses the previous state-of-the-art model by large margins.
场景图是表达场景中对象、属性和对象之间关系的语义表示。场景图在许多跨模态任务中发挥着重要作用,因为它们能够捕捉图像和文本之间的交互。在本文中,我们关注场景图修改(SGM),其中系统需要学习如何基于自然语言查询更新现有的场景图。与之前重建整个场景图的方法不同,我们通过引入增量结构扩展(ISE)将SGM构建为一个图扩展任务。ISE在不改变未修改的结构的情况下,通过增量扩展源图来构建目标图。在ISE的基础上,我们进一步提出了一种在节点预测和边缘预测之间迭代的模型,逐步推断出更准确、更协调的展开决策。此外,我们构建了一个具有挑战性的数据集,其中包含比现有数据集更复杂的查询和更大的场景图。在四个基准上的实验证明了我们的方法的有效性,它大大超过了以前最先进的模型。
{"title":"Scene Graph Modification as Incremental Structure Expanding","authors":"Xuming Hu, Zhijiang Guo, Yuwei Fu, Lijie Wen, Philip S. Yu","doi":"10.48550/arXiv.2209.09093","DOIUrl":"https://doi.org/10.48550/arXiv.2209.09093","url":null,"abstract":"A scene graph is a semantic representation that expresses the objects, attributes, and relationships between objects in a scene. Scene graphs play an important role in many cross modality tasks, as they are able to capture the interactions between images and texts. In this paper, we focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query. Unlike previous approaches that rebuilt the entire scene graph, we frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE). ISE constructs the target graph by incrementally expanding the source graph without changing the unmodified structure. Based on ISE, we further propose a model that iterates between nodes prediction and edges prediction, inferring more accurate and harmonious expansion decisions progressively. In addition, we construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets. Experiments on four benchmarks demonstrate the effectiveness of our approach, which surpasses the previous state-of-the-art model by large margins.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"119 1","pages":"5707-5720"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Few Clean Instances Help Denoising Distant Supervision 少数干净的实例有助于去噪远程监督
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06596
Yufang Liu, Ziyin Huang, Yijun Wang, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaofeng Mou, Ding Wang
Existing distantly supervised relation extractors usually rely on noisy data for both model training and evaluation, which may lead to garbage-in-garbage-out systems. To alleviate the problem, we study whether a small clean dataset could help improve the quality of distantly supervised models. We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us to build more robust denoising models. Specifically, we propose a new criterion for clean instance selection based on influence functions. It collects sample-level evidence for recognizing good instances (which is more informative than loss-level evidence). We also propose a teacher-student mechanism for controlling purity of intermediate results when bootstrapping the clean set. The whole approach is model-agnostic and demonstrates strong performances on both denoising real (NYT) and synthetic noisy datasets.
现有的远程监督关系提取器通常依赖于有噪声的数据进行模型训练和评估,这可能导致垃圾中垃圾出的系统。为了缓解这个问题,我们研究了一个小的干净数据集是否有助于提高远程监督模型的质量。我们表明,除了对模型进行更有说服力的评估外,一个小而干净的数据集还有助于我们构建更健壮的去噪模型。具体来说,我们提出了一种新的基于影响函数的干净实例选择准则。它收集样本级别的证据来识别好的实例(这比损失级别的证据更有信息量)。我们还提出了一种师生机制,用于在引导干净集时控制中间结果的纯度。整个方法是模型不可知的,并且在去噪真实(NYT)和合成噪声数据集上都表现出很强的性能。
{"title":"Few Clean Instances Help Denoising Distant Supervision","authors":"Yufang Liu, Ziyin Huang, Yijun Wang, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaofeng Mou, Ding Wang","doi":"10.48550/arXiv.2209.06596","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06596","url":null,"abstract":"Existing distantly supervised relation extractors usually rely on noisy data for both model training and evaluation, which may lead to garbage-in-garbage-out systems. To alleviate the problem, we study whether a small clean dataset could help improve the quality of distantly supervised models. We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us to build more robust denoising models. Specifically, we propose a new criterion for clean instance selection based on influence functions. It collects sample-level evidence for recognizing good instances (which is more informative than loss-level evidence). We also propose a teacher-student mechanism for controlling purity of intermediate results when bootstrapping the clean set. The whole approach is model-agnostic and demonstrates strong performances on both denoising real (NYT) and synthetic noisy datasets.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"19 1","pages":"2528-2539"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81703987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation 如何找到较强的总结一致性度量?摘要相干测度评价的工具箱与比较研究
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06517
Julius Steen, K. Markert
Automatically evaluating the coherence of summaries is of great significance both to enable cost-efficient summarizer evaluation and as a tool for improving coherence by selecting high-scoring candidate summaries. While many different approaches have been suggested to model summary coherence, they are often evaluated using disparate datasets and metrics. This makes it difficult to understand their relative performance and identify ways forward towards better summary coherence modelling. In this work, we conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field. Additionally, we introduce two novel analysis measures, _intra-system correlation_ and _bias matrices_, that help identify biases in coherence measures and provide robustness against system-level confounders. While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models fine-tuned on self-supervised tasks show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.
摘要的连贯性自动评估对于提高摘要评估的成本效益和通过选择高分候选摘要来提高摘要的连贯性具有重要意义。虽然已经提出了许多不同的方法来建模摘要一致性,但它们通常使用不同的数据集和指标进行评估。这使得很难理解它们的相对性能,并确定朝着更好的总结相干建模前进的方法。在这项工作中,我们对在一个公平的竞争环境中进行总结相干建模的各种方法进行了大规模的调查。此外,我们还引入了两种新的分析度量,即系统内相关性和偏差矩阵,它们有助于识别相干度量中的偏差,并提供对系统级混杂因素的鲁棒性。虽然目前没有一个可用的自动一致性度量能够在所有评估度量中为系统摘要分配可靠的一致性分数,但是在自我监督任务上进行微调的大规模语言模型显示出有希望的结果,只要微调考虑到它们需要在不同的摘要长度上进行泛化。
{"title":"How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation","authors":"Julius Steen, K. Markert","doi":"10.48550/arXiv.2209.06517","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06517","url":null,"abstract":"Automatically evaluating the coherence of summaries is of great significance both to enable cost-efficient summarizer evaluation and as a tool for improving coherence by selecting high-scoring candidate summaries. While many different approaches have been suggested to model summary coherence, they are often evaluated using disparate datasets and metrics. This makes it difficult to understand their relative performance and identify ways forward towards better summary coherence modelling. In this work, we conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field. Additionally, we introduce two novel analysis measures, _intra-system correlation_ and _bias matrices_, that help identify biases in coherence measures and provide robustness against system-level confounders. While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models fine-tuned on self-supervised tasks show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"1 1","pages":"6035-6049"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88886784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Fragility of Multi-Treebank Parsing Evaluation 多树库解析求值的脆弱性
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06699
I. Alonso-Alonso, David Vilares, Carlos Gómez-Rodríguez
Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, some inadequate strategies can be easily avoided.
分析评估的树库选择和可能由有偏差的选择产生的虚假效果尚未详细探讨。本文研究了对树库的单个子集的评价如何导致弱结论。首先,我们采用一些对比解析器,并在之前工作中提出的树库子集上运行它们,这些子集的使用在类型学或数据稀缺性等标准上是合理的(或不合理的)。其次,我们运行这个实验的大规模版本,创建大量的树库随机子集,并在它们上比较许多可用分数的解析器。结果显示,不同子集之间存在很大的差异,尽管建立良好的树库选择指南很困难,但一些不适当的策略可以很容易地避免。
{"title":"The Fragility of Multi-Treebank Parsing Evaluation","authors":"I. Alonso-Alonso, David Vilares, Carlos Gómez-Rodríguez","doi":"10.48550/arXiv.2209.06699","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06699","url":null,"abstract":"Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, some inadequate strategies can be easily avoided.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"58 1","pages":"5345-5359"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73090663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation 基于贝叶斯近似的域外检测分布标定
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06612
Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Weiran Xu
Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set. Previous softmax-based detection algorithms are proved to be overconfident for OOD samples. In this paper, we analyze overconfident OOD comes from distribution uncertainty due to the mismatch between the training and test distributions, which makes the model can’t confidently make predictions thus probably causes abnormal softmax scores. We propose a Bayesian OOD detection framework to calibrate distribution uncertainty using Monte-Carlo Dropout. Our method is flexible and easily pluggable to existing softmax-based baselines and gains 33.33% OOD F1 improvements with increasing only 0.41% inference time compared to MSP. Further analyses show the effectiveness of Bayesian learning for OOD detection.
域外检测(Out-of-Domain, OOD)是面向任务的对话系统中的一个关键组件,其目的是识别查询是否超出预定义支持的意图集。以前基于softmax的检测算法被证明对OOD样本过于自信。在本文中,我们分析了过度自信的OOD来自于由于训练分布和测试分布不匹配导致的分布不确定性,这使得模型不能自信地做出预测,从而可能导致softmax分数异常。我们提出了一个贝叶斯OOD检测框架,利用蒙特卡罗Dropout来校准分布的不确定性。我们的方法灵活且易于插入到现有的基于softmax的基线中,与MSP相比,我们的方法在仅增加0.41%的推理时间的情况下获得了33.33%的OOD F1改进。进一步的分析表明了贝叶斯学习对OOD检测的有效性。
{"title":"Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation","authors":"Yanan Wu, Zhiyuan Zeng, Keqing He, Yutao Mou, Pei Wang, Weiran Xu","doi":"10.48550/arXiv.2209.06612","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06612","url":null,"abstract":"Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set. Previous softmax-based detection algorithms are proved to be overconfident for OOD samples. In this paper, we analyze overconfident OOD comes from distribution uncertainty due to the mismatch between the training and test distributions, which makes the model can’t confidently make predictions thus probably causes abnormal softmax scores. We propose a Bayesian OOD detection framework to calibrate distribution uncertainty using Monte-Carlo Dropout. Our method is flexible and easily pluggable to existing softmax-based baselines and gains 33.33% OOD F1 improvements with increasing only 0.41% inference time compared to MSP. Further analyses show the effectiveness of Bayesian learning for OOD detection.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"36 1","pages":"608-615"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82768872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Prompt-based Conservation Learning for Multi-hop Question Answering 基于提示的多跳问答保护学习
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06923
Zhenyun Deng, Yonghua Zhu, Yang Chen, Qianqian Qi, M. Witbrock, P. Riddle
Multi-hop question answering (QA) requires reasoning over multiple documents to answer a complex question and provide interpretable supporting evidence. However, providing supporting evidence is not enough to demonstrate that a model has performed the desired reasoning to reach the correct answer. Most existing multi-hop QA methods fail to answer a large fraction of sub-questions, even if their parent questions are answered correctly. In this paper, we propose the Prompt-based Conservation Learning (PCL) framework for multi-hop QA, which acquires new knowledge from multi-hop QA tasks while conserving old knowledge learned on single-hop QA tasks, mitigating forgetting. Specifically, we first train a model on existing single-hop QA tasks, and then freeze this model and expand it by allocating additional sub-networks for the multi-hop QA task. Moreover, to condition pre-trained language models to stimulate the kind of reasoning required for specific multi-hop questions, we learn soft prompts for the novel sub-networks to perform type-specific reasoning. Experimental results on the HotpotQA benchmark show that PCL is competitive for multi-hop QA and retains good performance on the corresponding single-hop sub-questions, demonstrating the efficacy of PCL in mitigating knowledge loss by forgetting.
多跳问答(QA)需要对多个文档进行推理,以回答一个复杂的问题,并提供可解释的支持证据。然而,提供支持性证据并不足以证明一个模型已经执行了期望的推理来得出正确的答案。大多数现有的多跳QA方法不能回答大部分子问题,即使它们的父问题被正确回答。本文提出了基于提示的多跳质量保证学习(PCL)框架,该框架从多跳质量保证任务中获取新知识,同时保留在单跳质量保证任务中学习到的旧知识,减少了遗忘。具体来说,我们首先在现有的单跳QA任务上训练一个模型,然后冻结这个模型,并通过为多跳QA任务分配额外的子网来扩展它。此外,为了调节预训练的语言模型来刺激特定多跳问题所需的推理类型,我们学习了新的子网络执行特定类型推理的软提示。在HotpotQA基准上的实验结果表明,PCL在多跳问答中具有竞争力,并且在相应的单跳子问题上保持了良好的性能,证明了PCL在减轻遗忘导致的知识损失方面的有效性。
{"title":"Prompt-based Conservation Learning for Multi-hop Question Answering","authors":"Zhenyun Deng, Yonghua Zhu, Yang Chen, Qianqian Qi, M. Witbrock, P. Riddle","doi":"10.48550/arXiv.2209.06923","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06923","url":null,"abstract":"Multi-hop question answering (QA) requires reasoning over multiple documents to answer a complex question and provide interpretable supporting evidence. However, providing supporting evidence is not enough to demonstrate that a model has performed the desired reasoning to reach the correct answer. Most existing multi-hop QA methods fail to answer a large fraction of sub-questions, even if their parent questions are answered correctly. In this paper, we propose the Prompt-based Conservation Learning (PCL) framework for multi-hop QA, which acquires new knowledge from multi-hop QA tasks while conserving old knowledge learned on single-hop QA tasks, mitigating forgetting. Specifically, we first train a model on existing single-hop QA tasks, and then freeze this model and expand it by allocating additional sub-networks for the multi-hop QA task. Moreover, to condition pre-trained language models to stimulate the kind of reasoning required for specific multi-hop questions, we learn soft prompts for the novel sub-networks to perform type-specific reasoning. Experimental results on the HotpotQA benchmark show that PCL is competitive for multi-hop QA and retains good performance on the corresponding single-hop sub-questions, demonstrating the efficacy of PCL in mitigating knowledge loss by forgetting.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"43 1","pages":"1791-1800"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88391859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding 面向任务的对话理解的树形结构半监督对比预训练
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06638
Wanwei He, Yinpei Dai, Binyuan Hui, Min Yang, Zhen Cao, Jianbo Dong, Fei Huang, Luo Si, Yongbin Li
Pre-training methods with contrastive learning objectives have shown remarkable success in dialog understanding tasks. However, current contrastive learning solely considers the self-augmented dialog samples as positive samples and treats all other dialog samples as negative ones, which enforces dissimilar representations even for dialogs that are semantically related. In this paper, we propose SPACE-2, a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised contrastive pre-training. Concretely, we first define a general semantic tree structure (STS) to unify the inconsistent annotation schema across different dialog datasets, so that the rich structural information stored in all labeled data can be exploited. Then we propose a novel multi-view score function to increase the relevance of all possible dialogs that share similar STSs and only push away other completely different dialogs during supervised contrastive pre-training. To fully exploit unlabeled dialogs, a basic self-supervised contrastive loss is also added to refine the learned representations. Experiments show that our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
具有对比学习目标的预训练方法在对话理解任务中取得了显著的成功。然而,目前的对比学习仅仅将自我增强的对话样本视为积极样本,而将所有其他对话样本视为消极样本,这使得即使是语义相关的对话也会产生不同的表征。在本文中,我们提出了一个树结构的预训练对话模型SPACE-2,它通过半监督对比预训练从有限的标记对话和大规模的未标记对话语料库中学习对话表示。具体而言,我们首先定义了一个通用的语义树结构(STS)来统一不同对话数据集之间不一致的标注模式,从而可以利用所有标注数据中存储的丰富的结构信息。然后,我们提出了一种新的多视图评分函数,以增加所有可能对话的相关性,这些对话具有相似的STSs,并且在监督对比预训练中只推掉其他完全不同的对话。为了充分利用未标记的对话,还添加了基本的自监督对比损失来改进学习到的表示。实验表明,我们的方法可以在由七个数据集和四个流行的对话理解任务组成的DialoGLUE基准测试上取得新的最先进的结果。
{"title":"SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding","authors":"Wanwei He, Yinpei Dai, Binyuan Hui, Min Yang, Zhen Cao, Jianbo Dong, Fei Huang, Luo Si, Yongbin Li","doi":"10.48550/arXiv.2209.06638","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06638","url":null,"abstract":"Pre-training methods with contrastive learning objectives have shown remarkable success in dialog understanding tasks. However, current contrastive learning solely considers the self-augmented dialog samples as positive samples and treats all other dialog samples as negative ones, which enforces dissimilar representations even for dialogs that are semantically related. In this paper, we propose SPACE-2, a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised contrastive pre-training. Concretely, we first define a general semantic tree structure (STS) to unify the inconsistent annotation schema across different dialog datasets, so that the rich structural information stored in all labeled data can be exploited. Then we propose a novel multi-view score function to increase the relevance of all possible dialogs that share similar STSs and only push away other completely different dialogs during supervised contrastive pre-training. To fully exploit unlabeled dialogs, a basic self-supervised contrastive loss is also added to refine the learned representations. Experiments show that our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"116 1","pages":"553-569"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77277478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
CoHS-CQG: Context and History Selection for Conversational Question Generation 会话问题生成的语境和历史选择
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06652
Xuan Long Do, Bowei Zou, Liangming Pan, Nancy F. Chen, Shafiq R. Joty, A. Aw
Conversational question generation (CQG) serves as a vital task for machines to assist humans, such as interactive reading comprehension, through conversations. Compared to traditional single-turn question generation (SQG), CQG is more challenging in the sense that the generated question is required not only to be meaningful, but also to align with the provided conversation. Previous studies mainly focus on how to model the flow and alignment of the conversation, but do not thoroughly study which parts of the context and history are necessary for the model. We believe that shortening the context and history is crucial as it can help the model to optimise more on the conversational alignment property. To this end, we propose CoHS-CQG, a two-stage CQG framework, which adopts a novel CoHS module to shorten the context and history of the input. In particular, it selects the top-p sentences and history turns by calculating the relevance scores of them. Our model achieves state-of-the-art performances on CoQA in both the answer-aware and answer-unaware settings.
会话问题生成(CQG)是机器通过对话辅助人类进行交互式阅读理解的一项重要任务。与传统的单轮问题生成(SQG)相比,CQG更具挑战性,因为生成的问题不仅要有意义,而且要与所提供的对话保持一致。以前的研究主要集中在如何对会话的流程和对齐进行建模,但没有深入研究上下文和历史的哪些部分对模型是必要的。我们认为,缩短上下文和历史是至关重要的,因为它可以帮助模型优化更多的会话一致性属性。为此,我们提出了CoHS-CQG,这是一个两阶段的CQG框架,它采用了一种新颖的CoHS模块来缩短输入的上下文和历史。特别是,它通过计算相关度分数来选择top-p句子和历史转折。我们的模型在答案感知和答案不感知两种情况下都实现了最先进的CoQA性能。
{"title":"CoHS-CQG: Context and History Selection for Conversational Question Generation","authors":"Xuan Long Do, Bowei Zou, Liangming Pan, Nancy F. Chen, Shafiq R. Joty, A. Aw","doi":"10.48550/arXiv.2209.06652","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06652","url":null,"abstract":"Conversational question generation (CQG) serves as a vital task for machines to assist humans, such as interactive reading comprehension, through conversations. Compared to traditional single-turn question generation (SQG), CQG is more challenging in the sense that the generated question is required not only to be meaningful, but also to align with the provided conversation. Previous studies mainly focus on how to model the flow and alignment of the conversation, but do not thoroughly study which parts of the context and history are necessary for the model. We believe that shortening the context and history is crucial as it can help the model to optimise more on the conversational alignment property. To this end, we propose CoHS-CQG, a two-stage CQG framework, which adopts a novel CoHS module to shorten the context and history of the input. In particular, it selects the top-p sentences and history turns by calculating the relevance scores of them. Our model achieves state-of-the-art performances on CoQA in both the answer-aware and answer-unaware settings.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"55 1","pages":"580-591"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83341910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Classical Sequence Match Is a Competitive Few-Shot One-Class Learner 经典序列匹配是一种竞争性的几次单班学习算法
Pub Date : 2022-09-14 DOI: 10.48550/arXiv.2209.06394
Mengting Hu, H. Gao, Yinhao Bai, Mingming Liu
Nowadays, transformer-based models gradually become the default choice for artificial intelligence pioneers. The models also show superiority even in the few-shot scenarios. In this paper, we revisit the classical methods and propose a new few-shot alternative. Specifically, we investigate the few-shot one-class problem, which actually takes a known sample as a reference to detect whether an unknown instance belongs to the same class. This problem can be studied from the perspective of sequence match. It is shown that with meta-learning, the classical sequence match method, i.e. Compare-Aggregate, significantly outperforms transformer ones. The classical approach requires much less training cost. Furthermore, we perform an empirical comparison between two kinds of sequence match approaches under simple fine-tuning and meta-learning. Meta-learning causes the transformer models’ features to have high-correlation dimensions. The reason is closely related to the number of layers and heads of transformer models. Experimental codes and data are available at https://github.com/hmt2014/FewOne.
如今,基于变压器的模型逐渐成为人工智能先驱的默认选择。即使在少量射击场景中,这些模型也显示出优越性。在本文中,我们回顾了经典方法,并提出了一种新的少镜头替代方案。具体来说,我们研究了少射单类问题,即以已知样本为参考来检测未知实例是否属于同一类。这个问题可以从序列匹配的角度来研究。结果表明,在元学习中,经典的序列匹配方法,即Compare-Aggregate,明显优于其他方法。经典方法所需的训练成本要低得多。此外,我们还对简单微调和元学习下的两种序列匹配方法进行了实证比较。元学习使转换模型的特征具有高度相关的维度。其原因与变压器模型的层数和机头有密切关系。实验代码和数据可在https://github.com/hmt2014/FewOne上获得。
{"title":"Classical Sequence Match Is a Competitive Few-Shot One-Class Learner","authors":"Mengting Hu, H. Gao, Yinhao Bai, Mingming Liu","doi":"10.48550/arXiv.2209.06394","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06394","url":null,"abstract":"Nowadays, transformer-based models gradually become the default choice for artificial intelligence pioneers. The models also show superiority even in the few-shot scenarios. In this paper, we revisit the classical methods and propose a new few-shot alternative. Specifically, we investigate the few-shot one-class problem, which actually takes a known sample as a reference to detect whether an unknown instance belongs to the same class. This problem can be studied from the perspective of sequence match. It is shown that with meta-learning, the classical sequence match method, i.e. Compare-Aggregate, significantly outperforms transformer ones. The classical approach requires much less training cost. Furthermore, we perform an empirical comparison between two kinds of sequence match approaches under simple fine-tuning and meta-learning. Meta-learning causes the transformer models’ features to have high-correlation dimensions. The reason is closely related to the number of layers and heads of transformer models. Experimental codes and data are available at https://github.com/hmt2014/FewOne.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"864 1","pages":"4728-4740"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85479391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of COLING. International Conference on Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1