Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou
{"title":"Enhancing adversarial robustness in Natural Language Inference using explanations","authors":"Alexandros Koulakos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou","doi":"arxiv-2409.07423","DOIUrl":null,"url":null,"abstract":"The surge of state-of-the-art Transformer-based models has undoubtedly pushed\nthe limits of NLP model performance, excelling in a variety of tasks. We cast\nthe spotlight on the underexplored task of Natural Language Inference (NLI),\nsince models trained on popular well-suited datasets are susceptible to\nadversarial attacks, allowing subtle input interventions to mislead the model.\nIn this work, we validate the usage of natural language explanation as a\nmodel-agnostic defence strategy through extensive experimentation: only by\nfine-tuning a classifier on the explanation rather than premise-hypothesis\ninputs, robustness under various adversarial attacks is achieved in comparison\nto explanation-free baselines. Moreover, since there is no standard strategy of\ntesting the semantic validity of the generated explanations, we research the\ncorrelation of widely used language generation metrics with human perception,\nin order for them to serve as a proxy towards robust NLI models. Our approach\nis resource-efficient and reproducible without significant computational\nlimitations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"2019 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The surge of state-of-the-art Transformer-based models has undoubtedly pushed
the limits of NLP model performance, excelling in a variety of tasks. We cast
the spotlight on the underexplored task of Natural Language Inference (NLI),
since models trained on popular well-suited datasets are susceptible to
adversarial attacks, allowing subtle input interventions to mislead the model.
In this work, we validate the usage of natural language explanation as a
model-agnostic defence strategy through extensive experimentation: only by
fine-tuning a classifier on the explanation rather than premise-hypothesis
inputs, robustness under various adversarial attacks is achieved in comparison
to explanation-free baselines. Moreover, since there is no standard strategy of
testing the semantic validity of the generated explanations, we research the
correlation of widely used language generation metrics with human perception,
in order for them to serve as a proxy towards robust NLI models. Our approach
is resource-efficient and reproducible without significant computational
limitations.