A comparative evaluation for question answering over Greek texts by using machine translation and BERT

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Language Resources and Evaluation Pub Date : 2024-06-19 DOI:10.1007/s10579-024-09745-9
Michalis Mountantonakis, Loukas Mertzanis, Michalis Bastakis, Yannis Tzitzikas
{"title":"A comparative evaluation for question answering over Greek texts by using machine translation and BERT","authors":"Michalis Mountantonakis, Loukas Mertzanis, Michalis Bastakis, Yannis Tzitzikas","doi":"10.1007/s10579-024-09745-9","DOIUrl":null,"url":null,"abstract":"<p>Although there are numerous and effective BERT models for question answering (QA) over plain texts in English, it is not the same for other languages, such as Greek. Since it can be time-consuming to train a new BERT model for a given language, we present a generic methodology for multilingual QA by combining at runtime existing machine translation (MT) models and BERT QA models pretrained in English, and we perform a comparative evaluation for Greek language. Particularly, we propose a pipeline that (a) exploits widely used MT libraries for translating a question and a context from a source language to the English language, (b) extracts the answer from the translated English context through popular BERT models (pretrained in English corpus), (c) translates the answer back to the source language, and (d) evaluates the answer through semantic similarity metrics based on sentence embeddings, such as Bi-Encoder and BERTScore. For evaluating our system, we use 21 models, whereas we have created a test set with 20 texts and 200 questions and we have manually labelled 4200 answers. These resources can be reused for several tasks including QA and sentence similarity. Moreover, we use the existing multilingual test set XQuAD, with 240 texts and 1190 questions in Greek language. We focus on both the effectiveness and efficiency, through manually and machine labelled results. The results of the evaluation show that the proposed approach can be an efficient and effective alternative option to multilingual BERT. In particular, although the multilingual BERT QA model provides the highest scores for both human and automatic evaluation, all the models combining MT and BERT QA models are faster and some of them achieve quite similar scores.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"1782 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-024-09745-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Although there are numerous and effective BERT models for question answering (QA) over plain texts in English, it is not the same for other languages, such as Greek. Since it can be time-consuming to train a new BERT model for a given language, we present a generic methodology for multilingual QA by combining at runtime existing machine translation (MT) models and BERT QA models pretrained in English, and we perform a comparative evaluation for Greek language. Particularly, we propose a pipeline that (a) exploits widely used MT libraries for translating a question and a context from a source language to the English language, (b) extracts the answer from the translated English context through popular BERT models (pretrained in English corpus), (c) translates the answer back to the source language, and (d) evaluates the answer through semantic similarity metrics based on sentence embeddings, such as Bi-Encoder and BERTScore. For evaluating our system, we use 21 models, whereas we have created a test set with 20 texts and 200 questions and we have manually labelled 4200 answers. These resources can be reused for several tasks including QA and sentence similarity. Moreover, we use the existing multilingual test set XQuAD, with 240 texts and 1190 questions in Greek language. We focus on both the effectiveness and efficiency, through manually and machine labelled results. The results of the evaluation show that the proposed approach can be an efficient and effective alternative option to multilingual BERT. In particular, although the multilingual BERT QA model provides the highest scores for both human and automatic evaluation, all the models combining MT and BERT QA models are faster and some of them achieve quite similar scores.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器翻译和 BERT 对希腊语文本进行问题解答的比较评估
尽管有大量有效的 BERT 模型用于英语纯文本的问题解答(QA),但对于希腊语等其他语言而言,情况却并非如此。由于为特定语言训练一个新的 BERT 模型非常耗时,我们提出了一种通用的多语言 QA 方法,在运行时将现有的机器翻译(MT)模型和在英语中预先训练的 BERT QA 模型结合起来,并对希腊语进行了比较评估。特别是,我们提出了一个管道:(a) 利用广泛使用的 MT 库将问题和上下文从源语言翻译成英语;(b) 通过流行的 BERT 模型(在英语语料库中预先训练)从翻译的英语上下文中提取答案;(c) 将答案翻译回源语言;(d) 通过基于句子嵌入的语义相似性度量(如 Bi-Encoder 和 BERTScore)评估答案。为了评估我们的系统,我们使用了 21 个模型,同时创建了一个包含 20 篇文本和 200 个问题的测试集,并人工标注了 4200 个答案。这些资源可重复用于多项任务,包括质量保证和句子相似性。此外,我们还使用了现有的多语言测试集 XQuAD,其中包含 240 篇文本和 1190 个希腊语问题。通过人工和机器标注的结果,我们重点考察了有效性和效率。评估结果表明,建议的方法可以成为多语种 BERT 的高效替代选择。特别是,尽管多语种 BERT QA 模型在人工和自动评估中都获得了最高分,但所有将 MT 和 BERT QA 模型结合起来的模型都更快,其中一些还获得了相当接近的分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Language Resources and Evaluation
Language Resources and Evaluation 工程技术-计算机:跨学科应用
CiteScore
6.50
自引率
3.70%
发文量
55
审稿时长
>12 weeks
期刊介绍: Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.
期刊最新文献
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect Studying word meaning evolution through incremental semantic shift detection PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines Normalized dataset for Sanskrit word segmentation and morphological parsing Conversion of the Spanish WordNet databases into a Prolog-readable format
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1