Proceedings of COLING. International Conference on Computational Linguistics最新文献

英文中文

Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading Comprehension 链接语篇单元和关键短语构建层次推理链

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2023-06-21 DOI: 10.48550/arXiv.2306.12069

P. Hitzler, Shafiq R. Joty

Machine reading comprehension (MRC) poses new challenges to logical reasoning, which aims to understand the implicit logical relations entailed in the given contexts and perform inference over them. Due to the complexity of logic, logical connections exist at different granularity levels. However, most existing methods of logical reasoning individually focus on either entity-aware or discourse-based information but ignore the hierarchical relations that may even have mutual effects. This paper proposes a holistic graph network (HGN) that deals with context at both discourse-level and word-level as the basis for logical reasoning to provide a more fine-grained relation extraction. Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism to improve the interpretation of MRC systems. Experimental results on logical reasoning QA datasets (ReClor and LogiQA) and natural language inference datasets (SNLI and ANLI) show the effectiveness and generalization of our method, and in-depth analysis verifies its capability to understand complex logical relations.

机器阅读理解(MRC)对逻辑推理提出了新的挑战，其目的是理解给定上下文中隐含的逻辑关系并对其进行推理。由于逻辑的复杂性，逻辑连接存在于不同的粒度级别。然而，大多数现有的逻辑推理方法要么单独关注实体感知的信息，要么关注基于话语的信息，但忽视了甚至可能具有相互作用的层次关系。本文提出了一个整体图网络(HGN)，它在话语级和词级处理上下文，作为逻辑推理的基础，以提供更细粒度的关系提取。具体而言，节点级和类型级关系可以解释为推理过程中的桥梁，通过分层交互机制建模，以提高对MRC系统的解释。在逻辑推理QA数据集(ReClor和LogiQA)和自然语言推理数据集(SNLI和ANLI)上的实验结果表明了该方法的有效性和泛化性，并通过深入分析验证了其理解复杂逻辑关系的能力。

{"title":"Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading Comprehension","authors":"P. Hitzler, Shafiq R. Joty","doi":"10.48550/arXiv.2306.12069","DOIUrl":"https://doi.org/10.48550/arXiv.2306.12069","url":null,"abstract":"Machine reading comprehension (MRC) poses new challenges to logical reasoning, which aims to understand the implicit logical relations entailed in the given contexts and perform inference over them. Due to the complexity of logic, logical connections exist at different granularity levels. However, most existing methods of logical reasoning individually focus on either entity-aware or discourse-based information but ignore the hierarchical relations that may even have mutual effects. This paper proposes a holistic graph network (HGN) that deals with context at both discourse-level and word-level as the basis for logical reasoning to provide a more fine-grained relation extraction. Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism to improve the interpretation of MRC systems. Experimental results on logical reasoning QA datasets (ReClor and LogiQA) and natural language inference datasets (SNLI and ANLI) show the effectiveness and generalization of our method, and in-depth analysis verifies its capability to understand complex logical relations.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"2 1","pages":"1467-1479"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73581823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Event Causality Extraction with Event Argument Correlations 基于事件参数相关性的事件因果关系提取

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2023-01-27 DOI: 10.48550/arXiv.2301.11621

Shiyao Cui, Jiawei Sheng, Xin Cong, Quangang Li, Tingwen Liu, Jinqiao Shi

Event Causality Identification (ECI), which aims to detect whether a causality relation exists between two given textual events, is an important task for event causality understanding. However, the ECI task ignores crucial event structure and cause-effect causality component information, making it struggle for downstream applications. In this paper, we introduce a novel task, namely Event Causality Extraction (ECE), aiming to extract the cause-effect event causality pairs with their structured event information from plain texts. The ECE task is more challenging since each event can contain multiple event arguments, posing fine-grained correlations between events to decide the cause-effect event pair. Hence, we propose a method with a dual grid tagging scheme to capture the intra- and inter-event argument correlations for ECE. Further, we devise a event type-enhanced model architecture to realize the dual grid tagging scheme. Experiments demonstrate the effectiveness of our method, and extensive analyses point out several future directions for ECE.

事件因果关系识别(Event Causality Identification, ECI)是事件因果关系理解的一项重要任务，其目的是检测给定的两个文本事件之间是否存在因果关系。然而，ECI任务忽略了关键事件结构和因果关系组件信息，使其难以用于下游应用程序。本文引入了一种新的任务，即事件因果关系提取(Event Causality Extraction, ECE)，旨在从纯文本中提取出具有结构化事件信息的因果事件对。ECE任务更具挑战性，因为每个事件可以包含多个事件参数，在事件之间形成细粒度的相关性，以确定因果事件对。因此，我们提出了一种双网格标记方案的方法来捕获ECE的事件内和事件间参数相关性。在此基础上，设计了一种事件类型增强模型架构，实现了双网格标注方案。实验证明了该方法的有效性，广泛的分析指出了ECE未来的几个方向。

引用次数: 1

BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification BERT-Flow-VAE:多标签文本分类的弱监督模型

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-27 DOI: 10.48550/arXiv.2210.15225

Ziwen Liu, J. Grau-Bové, Scott Orr

Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully supervised learning requires manually fully annotated datasets which is costly and time-consuming. In this paper, we propose BERT-Flow-VAE (BFV), a Weakly-Supervised Multi-Label Text Classification (WSMLTC) model that reduces the need for full supervision. This new model (1) produces BERT sentence embeddings and calibrates them using a flow model, (2) generates an initial topic-document matrix by averaging results of a seeded sparse topic model and a textual entailment model which only require surface name of topics and 4-6 seed words per topic, and (3) adopts a VAE framework to reconstruct the embeddings under the guidance of the topic-document matrix. Finally, (4) it uses the means produced by the encoder model in the VAE architecture as predictions for MLTC. Experimental results on 6 multi-label datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics and achieve approximately 84% performance of a fully-supervised model.

多标签文本分类(MLTC)是将文档分类为一个或多个主题的任务。考虑到这些任务的大量数据和不同的领域，完全监督学习需要手动完全注释的数据集，这是昂贵且耗时的。在本文中，我们提出了BERT-Flow-VAE (BFV)，这是一种弱监督多标签文本分类(WSMLTC)模型，减少了对完全监督的需要。该模型(1)生成BERT句子嵌入并使用流模型对其进行校准;(2)将种子稀疏主题模型和文本蕴涵模型的结果平均生成初始主题-文档矩阵，每个主题只需要表面主题名称和4-6个种子词;(3)在主题-文档矩阵的指导下，采用VAE框架重构嵌入。最后，(4)它使用VAE体系结构中编码器模型产生的均值作为MLTC的预测。在6个多标签数据集上的实验结果表明，BFV在关键指标上明显优于其他基线WSMLTC模型，达到了全监督模型的84%左右。

{"title":"BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification","authors":"Ziwen Liu, J. Grau-Bové, Scott Orr","doi":"10.48550/arXiv.2210.15225","DOIUrl":"https://doi.org/10.48550/arXiv.2210.15225","url":null,"abstract":"Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully supervised learning requires manually fully annotated datasets which is costly and time-consuming. In this paper, we propose BERT-Flow-VAE (BFV), a Weakly-Supervised Multi-Label Text Classification (WSMLTC) model that reduces the need for full supervision. This new model (1) produces BERT sentence embeddings and calibrates them using a flow model, (2) generates an initial topic-document matrix by averaging results of a seeded sparse topic model and a textual entailment model which only require surface name of topics and 4-6 seed words per topic, and (3) adopts a VAE framework to reconstruct the embeddings under the guidance of the topic-document matrix. Finally, (4) it uses the means produced by the encoder model in the VAE architecture as predictions for MLTC. Experimental results on 6 multi-label datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics and achieve approximately 84% performance of a fully-supervised model.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"1 1","pages":"1203-1220"},"PeriodicalIF":0.0,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89449819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multilingual Word Sense Disambiguation with Unified Sense Representation 统一语义表示的多语种词义消歧

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.07447

Ying Su, Hongming Zhang, Yangqiu Song, Tong Zhang

As a key natural language processing (NLP) task, word sense disambiguation (WSD) evaluates how well NLP models can understand the fine-grained semantics of words under specific contexts. Benefited from the large-scale annotation, current WSD systems have achieved impressive performances in English by combining supervised learning with lexical knowledge. However, such success is hard to be replicated in other languages, where we only have very limited annotations. In this paper, based on that the multilingual lexicon BabelNet describing the same set of concepts across languages, we propose to build knowledge and supervised based Multilingual Word Sense Disambiguation (MWSD) systems. We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich sourced languages. With the unified sense representations, annotations from multiple languages can be jointly trained to benefit the MWSD tasks. Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.

作为自然语言处理(NLP)的一项关键任务，词义消歧(WSD)评估了NLP模型在特定语境下理解词的细粒度语义的能力。得益于大规模标注，当前的WSD系统将监督学习与词汇知识相结合，在英语学习方面取得了令人印象深刻的成绩。然而，这样的成功很难在其他语言中复制，因为我们只有非常有限的注释。本文基于多语言词典BabelNet跨语言描述同一组概念的特点，提出了基于知识和监督的多语言词义消歧(MWSD)系统。我们建立了多语言的统一意义表示，并通过从富源语言转移注释来解决MWSD的注释稀缺性问题。通过统一的语义表示，可以联合训练多种语言的注释，从而有利于MWSD任务。对SemEval-13和SemEval-15数据集的评估证明了我们方法的有效性。

引用次数: 3

TestAug: A Framework for Augmenting Capability-based NLP Tests TestAug:一个增强基于能力的NLP测试的框架

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.08097

Guanqun Yang, Mirazul Haque, Qiaochu Song, Wei Yang, Xueqing Liu

The recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures for models with good held-out evaluation scores. However, existing work on capability-based testing requires the developer to compose each individual test template from scratch. Such approach thus requires extensive manual efforts and is less scalable. In this paper, we investigate a different approach that requires the developer to only annotate a few test templates, while leveraging the GPT-3 engine to generate the majority of test cases. While our approach saves the manual efforts by design, it guarantees the correctness of the generated suites with a validity checker. Moreover, our experimental results show that the test suites generated by GPT-3 are more diverse than the manually created ones; they can also be used to detect more errors compared to manually created counterparts. Our test suites can be downloaded at https://anonymous-researcher-nlp.github.io/testaug/.

最近提出的基于能力的NLP测试允许模型开发人员测试NLP模型的功能能力，揭示具有良好评估分数的模型的功能故障。然而，现有的基于能力的测试工作要求开发人员从头开始组合每个单独的测试模板。因此，这种方法需要大量的手工工作，并且可伸缩性较差。在本文中，我们研究了一种不同的方法，它要求开发人员只注释几个测试模板，同时利用GPT-3引擎生成大多数测试用例。虽然我们的方法通过设计节省了手工工作，但它保证了使用有效性检查器生成的套件的正确性。此外，我们的实验结果表明，GPT-3生成的测试套件比手动创建的测试套件更多样化;与手动创建的副本相比，它们还可用于检测更多错误。我们的测试套件可以从https://anonymous-researcher-nlp.github.io/testaug/下载。

引用次数: 0

Categorizing Semantic Representations for Neural Machine Translation 神经机器翻译的语义表示分类

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.06709

Yongjing Yin, Yafu Li, Fandong Meng, Jie Zhou, Yue Zhang

Modern neural machine translation (NMT) models have achieved competitive performance in standard benchmarks. However, they have recently been shown to suffer limitation in compositional generalization, failing to effectively learn the translation of atoms (e.g., words) and their semantic composition (e.g., modification) from seen compounds (e.g., phrases), and thus suffering from significantly weakened translation performance on unseen compounds during inference.We address this issue by introducing categorization to the source contextualized representations. The main idea is to enhance generalization by reducing sparsity and overfitting, which is achieved by finding prototypes of token representations over the training set and integrating their embeddings into the source encoding. Experiments on a dedicated MT dataset (i.e., CoGnition) show that our method reduces compositional generalization error rates by 24% error reduction. In addition, our conceptually simple method gives consistently better results than the Transformer baseline on a range of general MT datasets.

现代神经机器翻译(NMT)模型在标准基准测试中取得了具有竞争力的性能。然而，它们最近被证明在组合泛化方面受到限制，无法有效地从可见的化合物(例如短语)中学习原子(例如单词)及其语义组成(例如修饰)的翻译，因此在推理过程中对不可见的化合物的翻译性能显着减弱。我们通过向源上下文化表示引入分类来解决这个问题。主要思想是通过减少稀疏性和过拟合来增强泛化，这是通过在训练集上找到标记表示的原型并将其嵌入到源编码中来实现的。在一个专用的机器翻译数据集(即cognitive)上的实验表明，我们的方法将成分泛化错误率降低了24%。此外，在一系列通用MT数据集上，我们概念上简单的方法始终比Transformer基线提供更好的结果。

引用次数: 4

CHAE: Fine-Grained Controllable Story Generation with Characters, Actions and Emotions CHAE:包含角色、动作和情感的细粒度可控故事生成

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05221

Xinpeng Wang, Han Jiang, Zhihua Wei, Shanlin Zhou

Story generation has emerged as an interesting yet challenging NLP task in recent years. Some existing studies aim at generating fluent and coherent stories from keywords and outlines; while others attempt to control the global features of the story, such as emotion, style and topic. However, these works focus on coarse-grained control on the story, neglecting control on the details of the story, which is also crucial for the task. To fill the gap, this paper proposes a model for fine-grained control on the story, which allows the generation of customized stories with characters, corresponding actions and emotions arbitrarily assigned. Extensive experimental results on both automatic and human manual evaluations show the superiority of our method. It has strong controllability to generate stories according to the fine-grained personalized guidance, unveiling the effectiveness of our methodology. Our code is available at https://github.com/victorup/CHAE.

近年来，故事生成已成为一项有趣但具有挑战性的NLP任务。一些现有的研究旨在从关键词和大纲中生成流畅连贯的故事;而另一些人则试图控制故事的整体特征，如情感、风格和主题。然而，这些作品侧重于对故事的粗粒度控制，而忽略了对故事细节的控制，而这对任务来说也是至关重要的。为了填补这一空白，本文提出了一个对故事进行细粒度控制的模型，该模型允许生成任意指定角色、相应动作和情感的定制故事。大量的自动和人工评估的实验结果表明了我们方法的优越性。它具有很强的可控性，可以根据细粒度的个性化指导生成故事，揭示了我们方法的有效性。我们的代码可在https://github.com/victorup/CHAE上获得。

引用次数: 3

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training SelfMix:基于自混合训练的抗文本标签噪声鲁棒学习

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04525

Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, M. Zhang

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise inevitably exists in training data, damaging the effectiveness, robustness, and generalization of the models constructed on such data. Recently, remarkable achievements have been made to mitigate this dilemma in visual data, while only a few explore textual data. To fill this gap, we present SelfMix, a simple yet effective method, to handle label noise in text classification tasks. SelfMix uses the Gaussian Mixture Model to separate samples and leverages semi-supervised learning. Unlike previous works requiring multiple models, our method utilizes the dropout mechanism on a single model to reduce the confirmation bias in self-training and introduces a textual level mixup training strategy. Experimental results on three text classification benchmarks with different types of text show that the performance of our proposed method outperforms these strong baselines designed for both textual and visual data under different noise ratios and noise types. Our anonymous code is available at https://github.com/noise-learning/SelfMix.

传统的文本分类的成功依赖于注释数据，而预训练语言模型(PLMs)的新范式仍然需要一些标记数据来完成下游任务。然而，在实际应用中，标签噪声不可避免地存在于训练数据中，破坏了基于这些数据构建的模型的有效性、鲁棒性和泛化性。最近，在缓解视觉数据中的这一困境方面取得了显著的成就，而对文本数据的探索却很少。为了填补这一空白，我们提出了一种简单而有效的方法SelfMix来处理文本分类任务中的标签噪声。SelfMix使用高斯混合模型来分离样本并利用半监督学习。与以往需要多个模型的工作不同，我们的方法利用单个模型上的dropout机制来减少自我训练中的确认偏差，并引入文本级混合训练策略。在三个不同文本类型的文本分类基准上的实验结果表明，在不同的噪声比和噪声类型下，我们提出的方法的性能优于为文本和视觉数据设计的强基线。我们的匿名代码可在https://github.com/noise-learning/SelfMix上获得。

{"title":"SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training","authors":"Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, M. Zhang","doi":"10.48550/arXiv.2210.04525","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04525","url":null,"abstract":"The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise inevitably exists in training data, damaging the effectiveness, robustness, and generalization of the models constructed on such data. Recently, remarkable achievements have been made to mitigate this dilemma in visual data, while only a few explore textual data. To fill this gap, we present SelfMix, a simple yet effective method, to handle label noise in text classification tasks. SelfMix uses the Gaussian Mixture Model to separate samples and leverages semi-supervised learning. Unlike previous works requiring multiple models, our method utilizes the dropout mechanism on a single model to reduce the confirmation bias in self-training and introduces a textual level mixup training strategy. Experimental results on three text classification benchmarks with different types of text show that the performance of our proposed method outperforms these strong baselines designed for both textual and visual data under different noise ratios and noise types. Our anonymous code is available at https://github.com/noise-learning/SelfMix.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"41 1","pages":"960-970"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81204837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improving Continual Relation Extraction through Prototypical Contrastive Learning 通过原型对比学习改进连续关系提取

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04513

Chengwei Hu, Deqing Yang, Hao Jin, Zhen Chen, Yanghua Xiao

Continual relation extraction (CRE) aims to extract relations towards the continuous and iterative arrival of new data, of which the major challenge is the catastrophic forgetting of old tasks. In order to alleviate this critical problem for enhanced CRE performance, we propose a novel Continual Relation Extraction framework with Contrastive Learning, namely CRECL, which is built with a classification network and a prototypical contrastive network to achieve the incremental-class learning of CRE. Specifically, in the contrastive network a given instance is contrasted with the prototype of each candidate relations stored in the memory module. Such contrastive learning scheme ensures the data distributions of all tasks more distinguishable, so as to alleviate the catastrophic forgetting further. Our experiment results not only demonstrate our CRECL’s advantage over the state-of-the-art baselines on two public datasets, but also verify the effectiveness of CRECL’s contrastive learning on improving performance.

持续关系抽取(CRE)旨在抽取新数据连续迭代到达的关系，其主要挑战是旧任务的灾难性遗忘。为了缓解这一关键问题，提高CRE的性能，我们提出了一种新的基于对比学习的持续关系提取框架，即CRECL，该框架由分类网络和原型对比网络组成，实现了CRE的增量类学习。具体地说，在对比网络中，给定实例与存储在存储器模块中的每个候选关系的原型进行对比。这种对比学习方案保证了所有任务的数据分布更容易区分，从而进一步减轻灾难性遗忘。我们的实验结果不仅证明了我们的CRECL在两个公共数据集上优于最先进的基线，而且验证了CRECL对比学习在提高性能方面的有效性。

引用次数: 9

Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering 生成式问答中零次多跳推理的理解与改进

Proceedings of COLING. International Conference on Computational Linguistics

Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04234

Zhengbao Jiang, J. Araki, Haibo Ding, Graham Neubig

Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models’ answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models’ zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques. Code is available at https://github.com/jzbjyb/multihop.

生成式问答(QA)模型生成问题的答案，要么完全基于模型的参数(闭卷设置)，要么额外检索相关证据(开卷设置)。生成式QA模型可以回答一些相对复杂的问题，但人们对其机制仍然知之甚少。我们进行了几项研究，旨在更好地理解生成QA模型的多跳推理能力。首先，我们将多跳问题分解为多个相应的单跳问题，并在这些表面上相同的问题链上发现QA模型的答案存在明显的不一致性。其次，我们发现模型缺乏零跳多跳推理能力:当只对单跳问题进行训练时，模型对多跳问题的泛化能力较差。最后，我们证明了有可能通过两种方法来提高模型的零跳多推理能力，这两种方法通过训练单跳问题的串联或逻辑形式(SPARQL)来近似真实的多跳自然语言(NL)问题。总而言之，这些结果表明，多跳推理不会在生成式QA模型中自然出现，但可以通过训练或建模技术的进步来鼓励。代码可从https://github.com/jzbjyb/multihop获得。

{"title":"Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering","authors":"Zhengbao Jiang, J. Araki, Haibo Ding, Graham Neubig","doi":"10.48550/arXiv.2210.04234","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04234","url":null,"abstract":"Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models’ answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models’ zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques. Code is available at https://github.com/jzbjyb/multihop.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"1 1","pages":"1765-1775"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81932269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of COLING. International Conference on Computational Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀