SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-10-21 DOI:10.1016/j.neucom.2024.128726
Lizhuang Sun, Peng Zhang, Fang Gao, Yuan An, Zhixing Li, Yuanwei Zhao
{"title":"SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs","authors":"Lizhuang Sun,&nbsp;Peng Zhang,&nbsp;Fang Gao,&nbsp;Yuan An,&nbsp;Zhixing Li,&nbsp;Yuanwei Zhao","doi":"10.1016/j.neucom.2024.128726","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128726"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014978","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include ”Pretrain and Fine-tuning” and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs’ multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SF-GPT:一种无需训练的方法,可提高 LLM 中知识图谱构建的能力
知识图谱(KG)是通过从文本中提取知识三元组并融合知识来构建的,可提高信息检索效率。目前的知识三元提取方法包括 "预训练和微调"(Pretrain and Fine-tuning)和大型语言模型(LLMs)。前者将精力从人工提取转移到数据集标注上,但在测试集和训练集分布不同的情况下性能下降。基于 LLMs 的方法在提取过程中面临错误和不完整性。我们引入了 SF-GPT,一种无需训练的方法来解决这些问题。首先,我们提出了实体提取过滤器(EEF)模块,用于过滤三重生成结果,解决评估和清洗难题。其次,我们引入了基于实体别名生成(EAG)的免训练实体对齐模块,以解决基于 LLM 的知识融合中的语义丰富性和可解释性问题。最后,我们的自融合子图(Self-Fusion Subgraph)策略使用多响应自融合和通用实体列表来过滤三重结果,从而减少来自 LLM 多响应的噪声。在实验中,与在 NYT 数据集上训练的 UniRel 模型相比,SF-GPT 在 BDNC 数据集上的召回率提高了 55.5%,F1 分数提高了 32.6%;与 WebNLG 数据集上的 GPT-4+EEF 基线相比,在三轮融合的情况下,SF-GPT 的 F1 分数提高了 5%。SF-GPT 为从非结构化信息中提取知识提供了一种很有前景的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
A novel multi-level hierarchy optimization algorithm for pipeline inner detector speed control Editorial Board ZSDECNet: A zero-shot deep learning framework for image exposure correction Deep learning-based depression recognition through facial expression: A systematic review StereoSqueezeNet: With fewer parameters but higher accuracy than SqueezeNet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1