Learning Rules from KGs Guided by Language Models

arXiv - CS - Computation and Language Pub Date : 2024-09-12 DOI:arxiv-2409.07869

Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott

{"title":"Learning Rules from KGs Guided by Language Models","authors":"Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott","doi":"arxiv-2409.07869","DOIUrl":null,"url":null,"abstract":"Advances in information extraction have enabled the automatic construction of\nlarge knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely\nused in many applications like semantic search or data analytics. However, due\nto their semi-automatic construction, KGs are often incomplete. Rule learning\nmethods, concerned with the extraction of frequent patterns from KGs and\ncasting them into rules, can be applied to predict potentially missing facts. A\ncrucial step in this process is rule ranking. Ranking of rules is especially\nchallenging over highly incomplete or biased KGs (e.g., KGs predominantly\nstoring facts about famous people), as in this case biased rules might fit the\ndata best and be ranked at the top based on standard statistical metrics like\nrule confidence. To address this issue, prior works proposed to rank rules not\nonly relying on the original KG but also facts predicted by a KG embedding\nmodel. At the same time, with the recent rise of Language Models (LMs), several\nworks have claimed that LMs can be used as alternative means for KG completion.\nIn this work, our goal is to verify to which extent the exploitation of LMs is\nhelpful for improving the quality of rule learning systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Advances in information extraction have enabled the automatic construction of large knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely used in many applications like semantic search or data analytics. However, due to their semi-automatic construction, KGs are often incomplete. Rule learning methods, concerned with the extraction of frequent patterns from KGs and casting them into rules, can be applied to predict potentially missing facts. A crucial step in this process is rule ranking. Ranking of rules is especially challenging over highly incomplete or biased KGs (e.g., KGs predominantly storing facts about famous people), as in this case biased rules might fit the data best and be ranked at the top based on standard statistical metrics like rule confidence. To address this issue, prior works proposed to rank rules not only relying on the original KG but also facts predicted by a KG embedding model. At the same time, with the recent rise of Language Models (LMs), several works have claimed that LMs can be used as alternative means for KG completion. In this work, our goal is to verify to which extent the exploitation of LMs is helpful for improving the quality of rule learning systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在语言模型指导下从幼稚园学习规则

信息提取技术的进步使得自动构建大型知识图谱（如 Yago、Wikidata 或 Google KG）成为可能，这些图谱在语义搜索或数据分析等许多应用中得到了广泛应用。然而，由于是半自动构建，知识图谱往往是不完整的。规则学习方法涉及从 KG 中提取频繁模式并将其转化为规则，可用于预测可能缺失的事实。这一过程的关键步骤是规则排序。规则排序对于高度不完整或有偏见的 KG（例如主要存储名人事实的 KG）尤其具有挑战性，因为在这种情况下，有偏见的规则可能最适合数据，并根据标准统计指标（如规则置信度）被排在最前面。为了解决这个问题，之前的研究提出不仅要根据原始 KG，还要根据 KG 嵌入模型预测的事实对规则进行排序。与此同时，随着语言模型（LMs）的兴起，一些工作声称 LMs 可以作为完成 KG 的替代手段。在这项工作中，我们的目标是验证利用 LMs 在多大程度上有助于提高规则学习系统的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量