Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott
{"title":"Learning Rules from KGs Guided by Language Models","authors":"Zihang Peng, Daria Stepanova, Vinh Thinh Ho, Heike Adel, Alessandra Russo, Simon Ott","doi":"arxiv-2409.07869","DOIUrl":null,"url":null,"abstract":"Advances in information extraction have enabled the automatic construction of\nlarge knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely\nused in many applications like semantic search or data analytics. However, due\nto their semi-automatic construction, KGs are often incomplete. Rule learning\nmethods, concerned with the extraction of frequent patterns from KGs and\ncasting them into rules, can be applied to predict potentially missing facts. A\ncrucial step in this process is rule ranking. Ranking of rules is especially\nchallenging over highly incomplete or biased KGs (e.g., KGs predominantly\nstoring facts about famous people), as in this case biased rules might fit the\ndata best and be ranked at the top based on standard statistical metrics like\nrule confidence. To address this issue, prior works proposed to rank rules not\nonly relying on the original KG but also facts predicted by a KG embedding\nmodel. At the same time, with the recent rise of Language Models (LMs), several\nworks have claimed that LMs can be used as alternative means for KG completion.\nIn this work, our goal is to verify to which extent the exploitation of LMs is\nhelpful for improving the quality of rule learning systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Advances in information extraction have enabled the automatic construction of
large knowledge graphs (e.g., Yago, Wikidata or Google KG), which are widely
used in many applications like semantic search or data analytics. However, due
to their semi-automatic construction, KGs are often incomplete. Rule learning
methods, concerned with the extraction of frequent patterns from KGs and
casting them into rules, can be applied to predict potentially missing facts. A
crucial step in this process is rule ranking. Ranking of rules is especially
challenging over highly incomplete or biased KGs (e.g., KGs predominantly
storing facts about famous people), as in this case biased rules might fit the
data best and be ranked at the top based on standard statistical metrics like
rule confidence. To address this issue, prior works proposed to rank rules not
only relying on the original KG but also facts predicted by a KG embedding
model. At the same time, with the recent rise of Language Models (LMs), several
works have claimed that LMs can be used as alternative means for KG completion.
In this work, our goal is to verify to which extent the exploitation of LMs is
helpful for improving the quality of rule learning systems.
信息提取技术的进步使得自动构建大型知识图谱(如 Yago、Wikidata 或 Google KG)成为可能,这些图谱在语义搜索或数据分析等许多应用中得到了广泛应用。然而,由于是半自动构建,知识图谱往往是不完整的。规则学习方法涉及从 KG 中提取频繁模式并将其转化为规则,可用于预测可能缺失的事实。这一过程的关键步骤是规则排序。规则排序对于高度不完整或有偏见的 KG(例如主要存储名人事实的 KG)尤其具有挑战性,因为在这种情况下,有偏见的规则可能最适合数据,并根据标准统计指标(如规则置信度)被排在最前面。为了解决这个问题,之前的研究提出不仅要根据原始 KG,还要根据 KG 嵌入模型预测的事实对规则进行排序。与此同时,随着语言模型(LMs)的兴起,一些工作声称 LMs 可以作为完成 KG 的替代手段。在这项工作中,我们的目标是验证利用 LMs 在多大程度上有助于提高规则学习系统的质量。