Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2025-01-24 DOI:10.1016/j.datak.2025.102406

Aneesa Bashir, Rong Peng, Yongchang Ding

{"title":"Logic-infused knowledge graph QA: Enhancing large language models for specialized domains through Prolog integration","authors":"Aneesa Bashir, Rong Peng, Yongchang Ding","doi":"10.1016/j.datak.2025.102406","DOIUrl":null,"url":null,"abstract":"<div><div>Efficiently answering questions over complex, domain-specific knowledge graphs remain a substantial challenge, as large language models (LLMs) often lack the logical reasoning abilities and particular knowledge required for such tasks. This paper presents a novel framework integrating LLMs with logical programming languages like Prolog for Logic-Infused Knowledge Graph Question Answering (KGQA) in specialized domains. The proposed methodology uses a transformer-based encoder–decoder architecture. An encoder reads the question, and a named entity recognition (NER) module connects entities to the knowledge graph. The extracted entities are fed into a grammar-guided decoder, producing a logical form (Prolog query) that captures the semantic constraints and relationships. The Prolog query is executed over the knowledge graph to perform symbolic reasoning and retrieve relevant answer entities. Comprehensive experiments on the MetaQA benchmark dataset demonstrate the superior performance of this logic-infused method in accurately identifying correct answer entities from the knowledge graph. Even when trained on a limited subset of annotated data, it outperforms state-of-the-art baselines, achieving 89.60 % and F1-scores of up to 89.61 %, showcasing its effectiveness in enhancing large language models with symbolic reasoning capabilities for specialized question-answering tasks. The seamless integration of LLMs and logical programming enables the proposed framework to reason effectively over complex, domain-specific knowledge graphs, overcoming a key limitation of existing KGQA systems. In specialized domains, the interpretability provided by representing questions such as Prologue queries is a valuable asset.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"157 ","pages":"Article 102406"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000011","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Efficiently answering questions over complex, domain-specific knowledge graphs remain a substantial challenge, as large language models (LLMs) often lack the logical reasoning abilities and particular knowledge required for such tasks. This paper presents a novel framework integrating LLMs with logical programming languages like Prolog for Logic-Infused Knowledge Graph Question Answering (KGQA) in specialized domains. The proposed methodology uses a transformer-based encoder–decoder architecture. An encoder reads the question, and a named entity recognition (NER) module connects entities to the knowledge graph. The extracted entities are fed into a grammar-guided decoder, producing a logical form (Prolog query) that captures the semantic constraints and relationships. The Prolog query is executed over the knowledge graph to perform symbolic reasoning and retrieve relevant answer entities. Comprehensive experiments on the MetaQA benchmark dataset demonstrate the superior performance of this logic-infused method in accurately identifying correct answer entities from the knowledge graph. Even when trained on a limited subset of annotated data, it outperforms state-of-the-art baselines, achieving 89.60 % and F1-scores of up to 89.61 %, showcasing its effectiveness in enhancing large language models with symbolic reasoning capabilities for specialized question-answering tasks. The seamless integration of LLMs and logical programming enables the proposed framework to reason effectively over complex, domain-specific knowledge graphs, overcoming a key limitation of existing KGQA systems. In specialized domains, the interpretability provided by representing questions such as Prologue queries is a valuable asset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.