Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering

IF 5.3 2区计算机科学 Computational Linguistics Pub Date : 2023-11-15 DOI:10.1162/coli_a_00493

Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain

{"title":"Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering","authors":"Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain","doi":"10.1162/coli_a_00493","DOIUrl":null,"url":null,"abstract":"Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"6 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00493","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于问答输入干预的语言模型语义忠实度分析

基于转换器的语言模型已被证明对一些NLP任务非常有效。在本文中，我们考虑了三个转换模型，BERT, RoBERTa和XLNet，在小版本和大版本中，并研究了它们的表示对文本语义内容的忠实程度。我们形式化了语义忠实的概念，其中文本的语义内容应该在问答模型的推理中因果关系地出现。然后，我们通过观察模型在执行两种新的语义干预-删除干预和否定干预后回答关于故事的问题的行为来验证这一概念。虽然转换模型在标准问答任务中实现了高性能，但我们表明，一旦我们在大量情况下执行这些干预，它们就不能在语义上忠实(删除干预为50%，否定干预为20%)。然后，我们提出了一种基于干预的培训制度，可以显著减轻缺失干预的不良影响(从50%到6%)。我们分析了模型的内部工作原理，以更好地理解基于干预的缺失干预训练的有效性。但我们表明，这种训练并没有减弱语义不忠实的其他方面，例如模型无法处理否定干预或捕捉文本的谓词-论证结构。我们还通过提示测试了InstructGPT处理两个干预和捕获谓词-参数结构的能力。尽管InstructGPT模型在谓词-参数结构任务上确实实现了非常高的性能，但它们无法充分响应我们的删除和否定干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Linguistics Computer Science-Artificial Intelligence

自引率

0.00%

发文量

期刊介绍： Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.