A comparative evaluation for question answering over Greek texts by using machine translation and BERT

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Language Resources and Evaluation Pub Date : 2024-06-19 DOI:10.1007/s10579-024-09745-9

Michalis Mountantonakis, Loukas Mertzanis, Michalis Bastakis, Yannis Tzitzikas

{"title":"A comparative evaluation for question answering over Greek texts by using machine translation and BERT","authors":"Michalis Mountantonakis, Loukas Mertzanis, Michalis Bastakis, Yannis Tzitzikas","doi":"10.1007/s10579-024-09745-9","DOIUrl":null,"url":null,"abstract":"<p>Although there are numerous and effective BERT models for question answering (QA) over plain texts in English, it is not the same for other languages, such as Greek. Since it can be time-consuming to train a new BERT model for a given language, we present a generic methodology for multilingual QA by combining at runtime existing machine translation (MT) models and BERT QA models pretrained in English, and we perform a comparative evaluation for Greek language. Particularly, we propose a pipeline that (a) exploits widely used MT libraries for translating a question and a context from a source language to the English language, (b) extracts the answer from the translated English context through popular BERT models (pretrained in English corpus), (c) translates the answer back to the source language, and (d) evaluates the answer through semantic similarity metrics based on sentence embeddings, such as Bi-Encoder and BERTScore. For evaluating our system, we use 21 models, whereas we have created a test set with 20 texts and 200 questions and we have manually labelled 4200 answers. These resources can be reused for several tasks including QA and sentence similarity. Moreover, we use the existing multilingual test set XQuAD, with 240 texts and 1190 questions in Greek language. We focus on both the effectiveness and efficiency, through manually and machine labelled results. The results of the evaluation show that the proposed approach can be an efficient and effective alternative option to multilingual BERT. In particular, although the multilingual BERT QA model provides the highest scores for both human and automatic evaluation, all the models combining MT and BERT QA models are faster and some of them achieve quite similar scores.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"1782 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-024-09745-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Although there are numerous and effective BERT models for question answering (QA) over plain texts in English, it is not the same for other languages, such as Greek. Since it can be time-consuming to train a new BERT model for a given language, we present a generic methodology for multilingual QA by combining at runtime existing machine translation (MT) models and BERT QA models pretrained in English, and we perform a comparative evaluation for Greek language. Particularly, we propose a pipeline that (a) exploits widely used MT libraries for translating a question and a context from a source language to the English language, (b) extracts the answer from the translated English context through popular BERT models (pretrained in English corpus), (c) translates the answer back to the source language, and (d) evaluates the answer through semantic similarity metrics based on sentence embeddings, such as Bi-Encoder and BERTScore. For evaluating our system, we use 21 models, whereas we have created a test set with 20 texts and 200 questions and we have manually labelled 4200 answers. These resources can be reused for several tasks including QA and sentence similarity. Moreover, we use the existing multilingual test set XQuAD, with 240 texts and 1190 questions in Greek language. We focus on both the effectiveness and efficiency, through manually and machine labelled results. The results of the evaluation show that the proposed approach can be an efficient and effective alternative option to multilingual BERT. In particular, although the multilingual BERT QA model provides the highest scores for both human and automatic evaluation, all the models combining MT and BERT QA models are faster and some of them achieve quite similar scores.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用机器翻译和 BERT 对希腊语文本进行问题解答的比较评估

尽管有大量有效的 BERT 模型用于英语纯文本的问题解答（QA），但对于希腊语等其他语言而言，情况却并非如此。由于为特定语言训练一个新的 BERT 模型非常耗时，我们提出了一种通用的多语言 QA 方法，在运行时将现有的机器翻译（MT）模型和在英语中预先训练的 BERT QA 模型结合起来，并对希腊语进行了比较评估。特别是，我们提出了一个管道：(a) 利用广泛使用的 MT 库将问题和上下文从源语言翻译成英语；(b) 通过流行的 BERT 模型（在英语语料库中预先训练）从翻译的英语上下文中提取答案；(c) 将答案翻译回源语言；(d) 通过基于句子嵌入的语义相似性度量（如 Bi-Encoder 和 BERTScore）评估答案。为了评估我们的系统，我们使用了 21 个模型，同时创建了一个包含 20 篇文本和 200 个问题的测试集，并人工标注了 4200 个答案。这些资源可重复用于多项任务，包括质量保证和句子相似性。此外，我们还使用了现有的多语言测试集 XQuAD，其中包含 240 篇文本和 1190 个希腊语问题。通过人工和机器标注的结果，我们重点考察了有效性和效率。评估结果表明，建议的方法可以成为多语种 BERT 的高效替代选择。特别是，尽管多语种 BERT QA 模型在人工和自动评估中都获得了最高分，但所有将 MT 和 BERT QA 模型结合起来的模型都更快，其中一些还获得了相当接近的分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Language Resources and Evaluation 工程技术-计算机：跨学科应用

CiteScore

6.50

自引率

3.70%

发文量

审稿时长

>12 weeks

期刊介绍： Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.