零机会跨语言开放领域问答

Proceedings of the Workshop on Multilingual Information Access (MIA) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.mia-1.9

Sumit Agarwal, Suraj Tripathi, T. Mitamura, C. Rosé

{"title":"零机会跨语言开放领域问答","authors":"Sumit Agarwal, Suraj Tripathi, T. Mitamura, C. Rosé","doi":"10.18653/v1/2022.mia-1.9","DOIUrl":null,"url":null,"abstract":"People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.","PeriodicalId":333865,"journal":{"name":"Proceedings of the Workshop on Multilingual Information Access (MIA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Zero-shot cross-lingual open domain question answering\",\"authors\":\"Sumit Agarwal, Suraj Tripathi, T. Mitamura, C. Rosé\",\"doi\":\"10.18653/v1/2022.mia-1.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.\",\"PeriodicalId\":333865,\"journal\":{\"name\":\"Proceedings of the Workshop on Multilingual Information Access (MIA)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Multilingual Information Access (MIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.mia-1.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Multilingual Information Access (MIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.mia-1.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

说不同语言的人以跨语言的方式搜索信息。他们倾向于用自己的语言提出问题，并期望用同样的语言得到答案，尽管证据存在于另一种语言中。在本文中，我们提出了跨语言开放域问答任务的方法。我们提出的方法采用了一个通道重新排序器，用于生成的融合解码器技术，以及一个基于wiki数据实体的后处理系统来解决无法跨所有语言生成实体的问题。与XOR-TyDi数据集上的基线CORA模型相比，我们的端到端管道在F1和EM指标上分别提高了3分和4.6分。我们还使用MKQA数据集评估了我们提出的技术在零射击设置中的有效性，并显示高资源的F1提高了5分，低资源的零射击语言提高了3分。我们的团队CMUmQA提交的MIA-Shared任务在开发受限设置中排名第一，在测试设置中排名第二。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Zero-shot cross-lingual open domain question answering

People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Workshop on Multilingual Information Access (MIA)

自引率

0.00%

发文量