BT-CKBQA:一种高效的中文知识库问答方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2023-09-01 DOI:10.1016/j.datak.2023.102204

Erhe Yang , Fei Hao , Jiaxing Shang , Xiaoliang Chen , Doo-Soon Park

{"title":"BT-CKBQA:一种高效的中文知识库问答方法","authors":"Erhe Yang , Fei Hao , Jiaxing Shang , Xiaoliang Chen , Doo-Soon Park","doi":"10.1016/j.datak.2023.102204","DOIUrl":null,"url":null,"abstract":"<div>Knowledge Base Question Answering (KBQA), as an increasingly essential application, can provide accurate responses to user queries. ensuring that users obtain relevant information and make decisions promptly. The deep learning-based approaches have achieved satisfactory QA results by leveraging the neural network models. However, these approaches require numerous parameters, which increases the workload of tuning model parameters. To address this problem, we propose BT-CKBQA, a practical and highly efficient approach incorporating BM25 and Template-based predicate mapping for CKBQA. Besides, a concept lattice based approach is proposed for summarizing the knowledge base, which can largely improve the execution efficiency of QA with little loss of performance. Concretely, BT-CKBQA leverages the BM25 algorithm and custom dictionary to detect the subject of a question sentence. A template-based predicate generation approach is then proposed to generate candidate predicates. Finally, a ranking approach is provided with the joint consideration of character similarity and semantic similarity for predicate mapping. Extensive experiments are conducted over the NLPCC-ICCPOL 2016 and 2018 KBQA datasets, and the experimental results demonstrate the superiority of the proposed approach over the compared baselines. Particularly, the averaged F1-score result of BT-CKBQA for mention detection is up to 98.25%, which outperforms the best method currently available in the literature. For question answering, the proposed approach achieves superior results than most baselines with the F1-score value of 82.68%. Compared to state-of-the-art baselines, the execution efficiency and performance of QA per unit time can be improved with up to 56.39% and 44.06% gains, respectively. The experimental results for the diversification of questions indicate that the proposed approach performs better for diversified questions than domain-specific questions. The case study over a constructed COVID-19 knowledge base illustrates the effectiveness and practicability of BT-CKBQA.</div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"147 ","pages":"Article 102204"},"PeriodicalIF":2.7000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BT-CKBQA: An efficient approach for Chinese knowledge base question answering\",\"authors\":\"Erhe Yang , Fei Hao , Jiaxing Shang , Xiaoliang Chen , Doo-Soon Park\",\"doi\":\"10.1016/j.datak.2023.102204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Knowledge Base Question Answering (KBQA), as an increasingly essential application, can provide accurate responses to user queries. ensuring that users obtain relevant information and make decisions promptly. The deep learning-based approaches have achieved satisfactory QA results by leveraging the neural network models. However, these approaches require numerous parameters, which increases the workload of tuning model parameters. To address this problem, we propose BT-CKBQA, a practical and highly efficient approach incorporating BM25 and Template-based predicate mapping for CKBQA. Besides, a concept lattice based approach is proposed for summarizing the knowledge base, which can largely improve the execution efficiency of QA with little loss of performance. Concretely, BT-CKBQA leverages the BM25 algorithm and custom dictionary to detect the subject of a question sentence. A template-based predicate generation approach is then proposed to generate candidate predicates. Finally, a ranking approach is provided with the joint consideration of character similarity and semantic similarity for predicate mapping. Extensive experiments are conducted over the NLPCC-ICCPOL 2016 and 2018 KBQA datasets, and the experimental results demonstrate the superiority of the proposed approach over the compared baselines. Particularly, the averaged F1-score result of BT-CKBQA for mention detection is up to 98.25%, which outperforms the best method currently available in the literature. For question answering, the proposed approach achieves superior results than most baselines with the F1-score value of 82.68%. Compared to state-of-the-art baselines, the execution efficiency and performance of QA per unit time can be improved with up to 56.39% and 44.06% gains, respectively. The experimental results for the diversification of questions indicate that the proposed approach performs better for diversified questions than domain-specific questions. The case study over a constructed COVID-19 knowledge base illustrates the effectiveness and practicability of BT-CKBQA.</div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"147 \",\"pages\":\"Article 102204\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X23000642\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23000642","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

知识库问答（KBQA）作为一种日益重要的应用程序，可以为用户查询提供准确的响应。确保用户及时获得相关信息并做出决策。基于深度学习的方法通过利用神经网络模型获得了令人满意的QA结果。然而，这些方法需要大量的参数，这增加了调优模型参数的工作量。为了解决这个问题，我们提出了BT-CKBQA，这是一种实用且高效的方法，结合了BM25和基于模板的CKBQA谓词映射。此外，还提出了一种基于概念格的知识库总结方法，该方法可以在不损失性能的情况下大大提高QA的执行效率。具体来说，BT-CKBQA利用BM25算法和自定义词典来检测疑问句的主题。然后提出了一种基于模板的谓词生成方法来生成候选谓词。最后，提出了一种在谓词映射中同时考虑字符相似度和语义相似度的排序方法。在NLPCC-ICCPOL 2016和2018 KBQA数据集上进行了广泛的实验，实验结果证明了所提出的方法优于比较的基线。特别地，提及检测的BT-CKBQA的平均F1分数结果高达98.25%，这优于文献中目前可用的最佳方法。在问答方面，该方法比大多数基线取得了更好的结果，F1得分值为82.68%。与最先进的基线相比，单位时间QA的执行效率和性能分别提高了56.39%和44.06%。问题多样化的实验结果表明，与特定领域的问题相比，所提出的方法在多样化问题上表现更好。通过构建新冠肺炎知识库的案例研究，说明了BT-CKBQA的有效性和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BT-CKBQA: An efficient approach for Chinese knowledge base question answering

Knowledge Base Question Answering (KBQA), as an increasingly essential application, can provide accurate responses to user queries. ensuring that users obtain relevant information and make decisions promptly. The deep learning-based approaches have achieved satisfactory QA results by leveraging the neural network models. However, these approaches require numerous parameters, which increases the workload of tuning model parameters. To address this problem, we propose BT-CKBQA, a practical and highly efficient approach incorporating BM25 and Template-based predicate mapping for CKBQA. Besides, a concept lattice based approach is proposed for summarizing the knowledge base, which can largely improve the execution efficiency of QA with little loss of performance. Concretely, BT-CKBQA leverages the BM25 algorithm and custom dictionary to detect the subject of a question sentence. A template-based predicate generation approach is then proposed to generate candidate predicates. Finally, a ranking approach is provided with the joint consideration of character similarity and semantic similarity for predicate mapping. Extensive experiments are conducted over the NLPCC-ICCPOL 2016 and 2018 KBQA datasets, and the experimental results demonstrate the superiority of the proposed approach over the compared baselines. Particularly, the averaged F1-score result of BT-CKBQA for mention detection is up to 98.25%, which outperforms the best method currently available in the literature. For question answering, the proposed approach achieves superior results than most baselines with the F1-score value of 82.68%. Compared to state-of-the-art baselines, the execution efficiency and performance of QA per unit time can be improved with up to 56.39% and 44.06% gains, respectively. The experimental results for the diversification of questions indicate that the proposed approach performs better for diversified questions than domain-specific questions. The case study over a constructed COVID-19 knowledge base illustrates the effectiveness and practicability of BT-CKBQA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.