Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2025-01-13 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1454258
Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger
{"title":"Evaluating the effectiveness of prompt engineering for knowledge graph question answering.","authors":"Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger","doi":"10.3389/frai.2024.1454258","DOIUrl":null,"url":null,"abstract":"<p><p>Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1454258"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770024/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1454258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评价知识图谱问答提示工程的有效性。
自 2022 年 11 月 OpenAI 的 ChatGPT 出现以来,已经开发出许多不同的大型语言模型提示方法。在这项工作中,我们评估了六种不同的少量提示方法。第一组实验评估了三种侧重于提示中镜头数量或类型的框架:具有简单提示和少量镜头的基线方法、具有 10、20 和 30 个镜头的随机少量镜头提示,以及基于相似性的少量镜头提示。第二组实验使用三种提示框架,通过大语言模型(LLM)生成的解释来优化提示或增强镜头:先解释后翻译、问题分解意义表示和提示优化。我们在新创建的 Spider4SPARQL 基准上评估了这六种提示方法,因为它是迄今为止最复杂的基于 SPARQL 的知识图谱问题解答(KGQA)基准。在所使用的各种提示框架中,商业模型无法获得 51% 以上的分数,这表明 KGQA,尤其是复杂查询、多跳、集合操作和过滤器,对于 LLM 来说仍然是一项具有挑战性的任务。我们的实验发现,KGQA 最成功的提示框架是将简单提示与本体和五个随机镜头相结合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Editorial: Enhancing kidney transplant outcomes through machine learning innovations. Development and validation of a multi-agent AI pipeline for automated credibility assessment of tobacco misinformation: a proof-of-concept study. Self-evolving cognitive substrates through metabolic data processing and recursive self-representation with autonomous memory prioritization mechanisms. Retrieving interpretability to support vector machine regression models in dynamic system identification. Machine learning-guided clinical pharmacist interventions improve treatment outcomes in tuberculosis patients: a precision medicine approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1