评估大型语言模型的 SPARQL 功能

Lars-Peter Meyer, Johannes Frey, Felix Brei, Natanael Arndt
{"title":"评估大型语言模型的 SPARQL 功能","authors":"Lars-Peter Meyer, Johannes Frey, Felix Brei, Natanael Arndt","doi":"arxiv-2409.05925","DOIUrl":null,"url":null,"abstract":"The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs)\noffers significant synergistic potential for knowledge-driven applications. One\npossible integration is the interpretation and generation of formal languages,\nsuch as those used in the Semantic Web, with SPARQL being a core technology for\naccessing KGs. In this paper, we focus on measuring out-of-the box capabilities\nof LLMs to work with SPARQL and more specifically with SPARQL SELECT queries\napplying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for\nautomated execution and evaluation with several LLMs. The tasks assess\ncapabilities along the dimensions of syntax, semantic read, semantic create,\nand the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini,\nand Claude models. Our findings indicate that working with SPARQL SELECT\nqueries is still challenging for LLMs and heavily depends on the specific LLM\nas well as the complexity of the task. While fixing basic syntax errors seems\nto pose no problems for the best of the current LLMs evaluated, creating\nsemantically correct SPARQL SELECT queries is difficult in several cases.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing SPARQL capabilities of Large Language Models\",\"authors\":\"Lars-Peter Meyer, Johannes Frey, Felix Brei, Natanael Arndt\",\"doi\":\"arxiv-2409.05925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs)\\noffers significant synergistic potential for knowledge-driven applications. One\\npossible integration is the interpretation and generation of formal languages,\\nsuch as those used in the Semantic Web, with SPARQL being a core technology for\\naccessing KGs. In this paper, we focus on measuring out-of-the box capabilities\\nof LLMs to work with SPARQL and more specifically with SPARQL SELECT queries\\napplying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for\\nautomated execution and evaluation with several LLMs. The tasks assess\\ncapabilities along the dimensions of syntax, semantic read, semantic create,\\nand the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini,\\nand Claude models. Our findings indicate that working with SPARQL SELECT\\nqueries is still challenging for LLMs and heavily depends on the specific LLM\\nas well as the complexity of the task. While fixing basic syntax errors seems\\nto pose no problems for the best of the current LLMs evaluated, creating\\nsemantically correct SPARQL SELECT queries is difficult in several cases.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLM)与知识图谱(KG)的集成为知识驱动型应用提供了巨大的协同潜力。一种可能的整合是解释和生成形式语言,如语义网(Semantic Web)中使用的形式语言,而 SPARQL 是访问知识图谱的核心技术。在本文中,我们将重点放在测量 LLM 与 SPARQL(更具体地说是 SPARQL SELECT 查询)协同工作的开箱即用能力上,采用的是一种定量方法。我们在 LLM-KG-Bench 框架中实施了各种基准测试任务,以便对多个 LLM 进行自动执行和评估。这些任务从语法、语义读取、语义创建以及知识图谱提示包含的作用等方面对能力进行评估。通过这项新的基准测试任务,我们对 GPT、Gemini 和 Claude 模型进行了评估。我们的研究结果表明,处理 SPARQL SELECT 查询对于 LLM 来说仍然具有挑战性,这在很大程度上取决于特定的 LLM 以及任务的复杂性。虽然修复基本的语法错误似乎对当前评估的最佳 LLM 不构成问题,但在某些情况下创建实质上正确的 SPARQL SELECT 查询却很困难。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Assessing SPARQL capabilities of Large Language Models
The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such as those used in the Semantic Web, with SPARQL being a core technology for accessing KGs. In this paper, we focus on measuring out-of-the box capabilities of LLMs to work with SPARQL and more specifically with SPARQL SELECT queries applying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation with several LLMs. The tasks assess capabilities along the dimensions of syntax, semantic read, semantic create, and the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini, and Claude models. Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs and heavily depends on the specific LLM as well as the complexity of the task. While fixing basic syntax errors seems to pose no problems for the best of the current LLMs evaluated, creating semantically correct SPARQL SELECT queries is difficult in several cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1