Rubem G. Nanclarez, N. T. Roman, F. J. V. D. Silva
{"title":"数据集泛化:BERT用于自然语言推理的初步研究","authors":"Rubem G. Nanclarez, N. T. Roman, F. J. V. D. Silva","doi":"10.5753/eniac.2022.227593","DOIUrl":null,"url":null,"abstract":"Natural language inference is the task of automatically identifying whether a given text (premise) implies another (hypothesis). Among multiple possible applications, it is especially relevant in the legal field to understand textual entailment between legal sentences, being the focus of recent research efforts. In this work, we evaluated the usage of BERT for natural language inference by conducting experiments and comparing results obtained by testing on a larger corpus with texts from multiple domains and a smaller corpus of legal sentences. Furthermore, we conducted a cross-experiment by training on the larger corpus and testing on the legal corpus. As a result, we obtained a mean accuracy of 88.91% in the corpus with multiple domains, a value comparable to related work. However, the same technique presented lower scores in the legal corpus and the cross-experiment.","PeriodicalId":165095,"journal":{"name":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generalizing over data sets: a preliminary study with BERT for Natural Language Inference\",\"authors\":\"Rubem G. Nanclarez, N. T. Roman, F. J. V. D. Silva\",\"doi\":\"10.5753/eniac.2022.227593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural language inference is the task of automatically identifying whether a given text (premise) implies another (hypothesis). Among multiple possible applications, it is especially relevant in the legal field to understand textual entailment between legal sentences, being the focus of recent research efforts. In this work, we evaluated the usage of BERT for natural language inference by conducting experiments and comparing results obtained by testing on a larger corpus with texts from multiple domains and a smaller corpus of legal sentences. Furthermore, we conducted a cross-experiment by training on the larger corpus and testing on the legal corpus. As a result, we obtained a mean accuracy of 88.91% in the corpus with multiple domains, a value comparable to related work. However, the same technique presented lower scores in the legal corpus and the cross-experiment.\",\"PeriodicalId\":165095,\"journal\":{\"name\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"volume\":\"242 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/eniac.2022.227593\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/eniac.2022.227593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generalizing over data sets: a preliminary study with BERT for Natural Language Inference
Natural language inference is the task of automatically identifying whether a given text (premise) implies another (hypothesis). Among multiple possible applications, it is especially relevant in the legal field to understand textual entailment between legal sentences, being the focus of recent research efforts. In this work, we evaluated the usage of BERT for natural language inference by conducting experiments and comparing results obtained by testing on a larger corpus with texts from multiple domains and a smaller corpus of legal sentences. Furthermore, we conducted a cross-experiment by training on the larger corpus and testing on the legal corpus. As a result, we obtained a mean accuracy of 88.91% in the corpus with multiple domains, a value comparable to related work. However, the same technique presented lower scores in the legal corpus and the cross-experiment.