Ngan Nguyen Luu Thuy, Đặng Văn Thìn, Hoàng Xuân Vũ, Nguyễn Văn Tài, Khoa Thi-Kim Phan
{"title":"vnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained Multilingual Language Models","authors":"Ngan Nguyen Luu Thuy, Đặng Văn Thìn, Hoàng Xuân Vũ, Nguyễn Văn Tài, Khoa Thi-Kim Phan","doi":"10.25073/2588-1086/vnucsce.329","DOIUrl":null,"url":null,"abstract":"Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.