ViMRC - VLSP 2021: Using XLM-RoBERTa and Filter Output for Vietnamese Machine Reading Comprehension

Văn Nhân Đặng, Minh Le Nguyen
{"title":"ViMRC - VLSP 2021: Using XLM-RoBERTa and Filter Output for Vietnamese Machine Reading Comprehension","authors":"Văn Nhân Đặng, Minh Le Nguyen","doi":"10.25073/2588-1086/vnucsce.336","DOIUrl":null,"url":null,"abstract":"Machine Reading Comprehension (MRC) has recently made significant progress. This paper is the result of our participation in building an MRC system specifically for Vietnamese on Vietnamese Machine Reading Comprehension at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021). Based on SQuAD2.0, the organizing committee developed the Vietnamese Question Answering Dataset UIT-ViQuAD2.0, a reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles. The UIT-ViQuAD2.0 dataset evolved from version 1.0 with the difference that version 2.0 contained answerable and unanswerable questions. The challenge of this problem is to distinguish between answerable and unanswerable questions. The answer to every question is a span of text, from the corresponding reading passage, or the question might be unanswerable. Our system employs simple yet highly effective methods. The system uses a pre-trained language model called XLM-RoBERTa (XLM-R), combined with filtering results from multiple output files to produce the final result. We created about 5-7 output files and select the answers with the most repetitions as the final prediction answer. After filtering, our system increased from 75.172% to 76.386% at the F1 measure and achieved 65,329% in the EM measure on the Private Test set.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"18 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Reading Comprehension (MRC) has recently made significant progress. This paper is the result of our participation in building an MRC system specifically for Vietnamese on Vietnamese Machine Reading Comprehension at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021). Based on SQuAD2.0, the organizing committee developed the Vietnamese Question Answering Dataset UIT-ViQuAD2.0, a reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles. The UIT-ViQuAD2.0 dataset evolved from version 1.0 with the difference that version 2.0 contained answerable and unanswerable questions. The challenge of this problem is to distinguish between answerable and unanswerable questions. The answer to every question is a span of text, from the corresponding reading passage, or the question might be unanswerable. Our system employs simple yet highly effective methods. The system uses a pre-trained language model called XLM-RoBERTa (XLM-R), combined with filtering results from multiple output files to produce the final result. We created about 5-7 output files and select the answers with the most repetitions as the final prediction answer. After filtering, our system increased from 75.172% to 76.386% at the F1 measure and achieved 65,329% in the EM measure on the Private Test set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ViMRC - VLSP 2021:使用XLM-RoBERTa和过滤器输出进行越南语机器阅读理解
机器阅读理解(MRC)最近取得了重大进展。这篇论文是我们在第八届越南语语言和语音处理国际研讨会(VLSP 2021)上参与构建越南语机器阅读理解MRC系统的结果。在SQuAD2.0的基础上,组委会开发了越南问答数据集unit - viquad2.0,这是一个阅读理解数据集,由一组维基百科越南文文章上的人群工作者提出的问题组成。unit - viquad2.0数据集从1.0版本演变而来,不同之处在于2.0版本包含了可回答和不可回答的问题。这个问题的挑战在于区分可回答和不可回答的问题。每个问题的答案都是一段文字,来自相应的阅读文章,否则问题可能无法回答。我们的系统采用简单而高效的方法。该系统使用一种称为XLM-RoBERTa (XLM-R)的预训练语言模型,结合从多个输出文件中过滤结果来产生最终结果。我们创建了大约5-7个输出文件,并选择重复次数最多的答案作为最终的预测答案。经过滤波后,我们的系统在F1测度上从75.172%提高到76.386%,在Private Test集的EM测度上达到65329%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Aspect-Category based Sentiment Analysis with Unified Sequence-To-Sequence Transfer Transformers A Bandwidth-Efficient High-Performance RTL-Microarchitecture of 2D-Convolution for Deep Neural Networks Noisy-label propagation for Video Anomaly Detection with Graph Transformer Network FRSL: A Domain Specific Language to Specify Functional Requirements A Contract-Based Specification Method for Model Transformations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1