Enhancing adversarial robustness in Natural Language Inference using explanations

Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou
{"title":"Enhancing adversarial robustness in Natural Language Inference using explanations","authors":"Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou","doi":"arxiv-2409.07423","DOIUrl":null,"url":null,"abstract":"The surge of state-of-the-art Transformer-based models has undoubtedly pushed\nthe limits of NLP model performance, excelling in a variety of tasks. We cast\nthe spotlight on the underexplored task of Natural Language Inference (NLI),\nsince models trained on popular well-suited datasets are susceptible to\nadversarial attacks, allowing subtle input interventions to mislead the model.\nIn this work, we validate the usage of natural language explanation as a\nmodel-agnostic defence strategy through extensive experimentation: only by\nfine-tuning a classifier on the explanation rather than premise-hypothesis\ninputs, robustness under various adversarial attacks is achieved in comparison\nto explanation-free baselines. Moreover, since there is no standard strategy of\ntesting the semantic validity of the generated explanations, we research the\ncorrelation of widely used language generation metrics with human perception,\nin order for them to serve as a proxy towards robust NLI models. Our approach\nis resource-efficient and reproducible without significant computational\nlimitations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The surge of state-of-the-art Transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks. We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since models trained on popular well-suited datasets are susceptible to adversarial attacks, allowing subtle input interventions to mislead the model. In this work, we validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation: only by fine-tuning a classifier on the explanation rather than premise-hypothesis inputs, robustness under various adversarial attacks is achieved in comparison to explanation-free baselines. Moreover, since there is no standard strategy of testing the semantic validity of the generated explanations, we research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models. Our approach is resource-efficient and reproducible without significant computational limitations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用解释增强自然语言推理的对抗鲁棒性
基于变换器的先进模型的涌现无疑突破了 NLP 模型性能的极限,在各种任务中表现出色。我们将目光投向了自然语言推理(NLI)这一尚未被充分探索的任务,因为在流行的合适数据集上训练的模型很容易受到对抗性攻击,使微妙的输入干预误导模型。在这项工作中,我们通过大量实验验证了使用自然语言解释作为与模型无关的防御策略:与无解释基线相比,只有通过在解释而非前提假设输入上对分类器进行微调,才能实现在各种对抗性攻击下的鲁棒性。此外,由于没有测试所生成解释语义有效性的标准策略,我们研究了广泛使用的语言生成指标与人类感知的相关性,以便将它们作为鲁棒性 NLI 模型的代理。我们的方法具有资源效率高和可重复性强的特点,没有明显的计算限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1