Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2025-01-13 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1454258

Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger

{"title":"Evaluating the effectiveness of prompt engineering for knowledge graph question answering.","authors":"Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger","doi":"10.3389/frai.2024.1454258","DOIUrl":null,"url":null,"abstract":"<p><p>Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1454258"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770024/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1454258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自 2022 年 11 月 OpenAI 的 ChatGPT 出现以来，已经开发出许多不同的大型语言模型提示方法。在这项工作中，我们评估了六种不同的少量提示方法。第一组实验评估了三种侧重于提示中镜头数量或类型的框架：具有简单提示和少量镜头的基线方法、具有 10、20 和 30 个镜头的随机少量镜头提示，以及基于相似性的少量镜头提示。第二组实验使用三种提示框架，通过大语言模型（LLM）生成的解释来优化提示或增强镜头：先解释后翻译、问题分解意义表示和提示优化。我们在新创建的 Spider4SPARQL 基准上评估了这六种提示方法，因为它是迄今为止最复杂的基于 SPARQL 的知识图谱问题解答（KGQA）基准。在所使用的各种提示框架中，商业模型无法获得 51% 以上的分数，这表明 KGQA，尤其是复杂查询、多跳、集合操作和过滤器，对于 LLM 来说仍然是一项具有挑战性的任务。我们的实验发现，KGQA 最成功的提示框架是将简单提示与本体和五个随机镜头相结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊