使用屏蔽语言模型对多跳问题分解进行评分

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-05-15 DOI:10.1145/3665140

Abdellah Hamouda Sidhoum, M'hamed Mataoui, Faouzi Sebbak, Adil Imad Eddine Hosni, Kamel Smaili

{"title":"使用屏蔽语言模型对多跳问题分解进行评分","authors":"Abdellah Hamouda Sidhoum, M'hamed Mataoui, Faouzi Sebbak, Adil Imad Eddine Hosni, Kamel Smaili","doi":"10.1145/3665140","DOIUrl":null,"url":null,"abstract":"Question answering (QA) is a sub-field of Natural Language Processing (NLP) that focuses on developing systems capable of answering natural language queries. Within this domain, multi-hop question answering represents an advanced QA task that requires gathering and reasoning over multiple pieces of information from diverse sources or passages. To handle the complexity of multi-hop questions, question decomposition has been proven to be a valuable approach. This technique involves breaking down complex questions into simpler sub-questions, reducing the complexity of the problem. However, it’s worth noting that existing question decomposition methods often rely on training data, which may not always be readily available for low-resource languages or specialized domains. To address this issue, we propose a novel approach that utilizes pre-trained masked language models to score decomposition candidates in a zero-shot manner. The method involves generating decomposition candidates, scoring them using a pseudo-log likelihood estimation, and ranking them based on their scores. To evaluate the efficacy of the decomposition process, we conducted experiments on two datasets annotated on decomposition in two different languages, Arabic and English. Subsequently, we integrated our approach into a complete QA system and conducted a reading comprehension performance evaluation on the HotpotQA dataset. The obtained results emphasize that while the system exhibited a small drop in performance, it still maintained a significant advance compared to the baseline model. The proposed approach highlights the efficiency of the language model scoring technique in complex reasoning tasks such as multi-hop question decomposition.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scoring Multi-hop Question Decomposition Using Masked Language Models\",\"authors\":\"Abdellah Hamouda Sidhoum, M'hamed Mataoui, Faouzi Sebbak, Adil Imad Eddine Hosni, Kamel Smaili\",\"doi\":\"10.1145/3665140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering (QA) is a sub-field of Natural Language Processing (NLP) that focuses on developing systems capable of answering natural language queries. Within this domain, multi-hop question answering represents an advanced QA task that requires gathering and reasoning over multiple pieces of information from diverse sources or passages. To handle the complexity of multi-hop questions, question decomposition has been proven to be a valuable approach. This technique involves breaking down complex questions into simpler sub-questions, reducing the complexity of the problem. However, it’s worth noting that existing question decomposition methods often rely on training data, which may not always be readily available for low-resource languages or specialized domains. To address this issue, we propose a novel approach that utilizes pre-trained masked language models to score decomposition candidates in a zero-shot manner. The method involves generating decomposition candidates, scoring them using a pseudo-log likelihood estimation, and ranking them based on their scores. To evaluate the efficacy of the decomposition process, we conducted experiments on two datasets annotated on decomposition in two different languages, Arabic and English. Subsequently, we integrated our approach into a complete QA system and conducted a reading comprehension performance evaluation on the HotpotQA dataset. The obtained results emphasize that while the system exhibited a small drop in performance, it still maintained a significant advance compared to the baseline model. The proposed approach highlights the efficiency of the language model scoring technique in complex reasoning tasks such as multi-hop question decomposition.\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3665140\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3665140","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

问题解答（QA）是自然语言处理（NLP）的一个分支领域，主要致力于开发能够回答自然语言查询的系统。在这一领域中，多跳问题解答代表了一种高级 QA 任务，需要收集和推理来自不同来源或段落的多条信息。为了处理多跳问题的复杂性，问题分解已被证明是一种有价值的方法。这种技术包括将复杂的问题分解成更简单的子问题，从而降低问题的复杂性。不过，值得注意的是，现有的问题分解方法往往依赖于训练数据，而对于低资源语言或专业领域来说，训练数据可能并不总是现成的。为了解决这个问题，我们提出了一种新颖的方法，利用预先训练好的屏蔽语言模型，以 "0-shot "的方式对分解候选问题进行评分。该方法包括生成分解候选语料，使用伪对数似然估计法对其进行评分，并根据评分对其进行排序。为了评估分解过程的有效性，我们在阿拉伯语和英语这两种不同语言的两个数据集上进行了分解注释实验。随后，我们将我们的方法集成到一个完整的质量保证系统中，并在 HotpotQA 数据集上进行了阅读理解性能评估。所得结果表明，虽然系统的性能略有下降，但与基线模型相比仍保持了显著的进步。所提出的方法凸显了语言模型评分技术在多跳问题分解等复杂推理任务中的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Scoring Multi-hop Question Decomposition Using Masked Language Models

Question answering (QA) is a sub-field of Natural Language Processing (NLP) that focuses on developing systems capable of answering natural language queries. Within this domain, multi-hop question answering represents an advanced QA task that requires gathering and reasoning over multiple pieces of information from diverse sources or passages. To handle the complexity of multi-hop questions, question decomposition has been proven to be a valuable approach. This technique involves breaking down complex questions into simpler sub-questions, reducing the complexity of the problem. However, it’s worth noting that existing question decomposition methods often rely on training data, which may not always be readily available for low-resource languages or specialized domains. To address this issue, we propose a novel approach that utilizes pre-trained masked language models to score decomposition candidates in a zero-shot manner. The method involves generating decomposition candidates, scoring them using a pseudo-log likelihood estimation, and ranking them based on their scores. To evaluate the efficacy of the decomposition process, we conducted experiments on two datasets annotated on decomposition in two different languages, Arabic and English. Subsequently, we integrated our approach into a complete QA system and conducted a reading comprehension performance evaluation on the HotpotQA dataset. The obtained results emphasize that while the system exhibited a small drop in performance, it still maintained a significant advance compared to the baseline model. The proposed approach highlights the efficiency of the language model scoring technique in complex reasoning tasks such as multi-hop question decomposition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.