RevOnt: Reverse engineering of competency questions from knowledge graphs via language models

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Web Semantics Pub Date : 2024-05-17 DOI:10.1016/j.websem.2024.100822

Fiorela Ciroku , Jacopo de Berardinis , Jongmo Kim , Albert Meroño-Peñuela , Valentina Presutti , Elena Simperl

{"title":"RevOnt: Reverse engineering of competency questions from knowledge graphs via language models","authors":"Fiorela Ciroku , Jacopo de Berardinis , Jongmo Kim , Albert Meroño-Peñuela , Valentina Presutti , Elena Simperl","doi":"10.1016/j.websem.2024.100822","DOIUrl":null,"url":null,"abstract":"<div><p>The process of developing ontologies – a formal, explicit specification of a shared conceptualisation – is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. This reverses the traditional workflow in which knowledge graphs are built from ontologies, which in turn are engineered from competency questions. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the resulting questions to yield a meaningful set of competency questions; the WDV dataset. This approach is implemented utilising the Wikidata knowledge graph as a use case, and contributes a set of core competency questions from 20 domains present in the WDV dataset. To evaluate RevOnt, we contribute a new dataset of manually-annotated high-quality competency questions, and compare the extracted competency questions by calculating their BLEU score against the human references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtering component is above 86%, which is comparable to the state-of-the-art classifications.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100822"},"PeriodicalIF":2.1000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000088/pdfft?md5=df0ecfc8d3506e224b7b22fbafe38dbf&pid=1-s2.0-S1570826824000088-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826824000088","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The process of developing ontologies – a formal, explicit specification of a shared conceptualisation – is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. This reverses the traditional workflow in which knowledge graphs are built from ontologies, which in turn are engineered from competency questions. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the resulting questions to yield a meaningful set of competency questions; the WDV dataset. This approach is implemented utilising the Wikidata knowledge graph as a use case, and contributes a set of core competency questions from 20 domains present in the WDV dataset. To evaluate RevOnt, we contribute a new dataset of manually-annotated high-quality competency questions, and compare the extracted competency questions by calculating their BLEU score against the human references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtering component is above 86%, which is comparable to the state-of-the-art classifications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RevOnt：通过语言模型从知识图谱中反向设计能力问题

本体论是对共享概念的一种正式、明确的规范，其开发过程由著名的方法论加以处理。与任何工程开发一样，其根本基础是收集需求，其中包括征集能力问题。能力问题是通过与领域和应用专家互动，或通过调查可用于填充本体（即其知识图谱）的现有数据集来定义的。知识图谱的普及和可访问性的提高为使用自动工具支持这一阶段提供了机会。在这项工作中，我们探索了从知识图谱中提取能力问题的可能性。这颠覆了传统的工作流程，即知识图谱由本体构建，而本体又由能力问题设计。我们详细介绍了 RevOnt，这是一种从知识图谱中提取和抽象三元组，根据三元组的口头表达生成问题，并对生成的问题进行过滤，以产生一组有意义的能力问题的方法；WDV 数据集。这种方法是利用维基数据知识图谱作为用例实现的，并从 WDV 数据集中的 20 个领域中提供了一组核心能力问题。为了对 RevOnt 进行评估，我们提供了一个包含人工标注的高质量能力问题的新数据集，并通过计算提取的能力问题与人工参考的 BLEU 分数进行比较。该方法的抽象和问题生成部分的结果显示出良好到较高的质量。同时，过滤部分的准确率超过 86%，与最先进的分类方法相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Web Semantics 工程技术-计算机：人工智能

CiteScore

6.20

自引率

12.00%

发文量

审稿时长

14.6 weeks

期刊介绍： The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.