Semantic Web最新文献_第2页

Psychiq and Wwwyzzerdd: Wikidata completion using Wikipedia Psychiq和Wwwyzzerdd:使用维基百科完成维基数据

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-09-12 DOI: 10.3233/sw-233450

Daniel Erenrich

Despite its size, Wikidata remains incomplete and inaccurate in many areas. Hundreds of thousands of articles on English Wikipedia have zero or limited meaningful structure on Wikidata. Much work has been done in the literature to partially or fully automate the process of completing knowledge graphs, but little of it has been practically applied to Wikidata. This paper presents two interconnected practical approaches to speeding up the Wikidata completion task. The first is Wwwyzzerdd, a browser extension that allows users to quickly import statements from Wikipedia to Wikidata. Wwwyzzerdd has been used to make over 100 thousand edits to Wikidata. The second is Psychiq, a new model for predicting instance and subclass statements based on English Wikipedia articles. Psychiq’s performance and characteristics make it well suited to solving a variety of problems for the Wikidata community. One initial use is integrating the Psychiq model into the Wwwyzzerdd browser extension.

尽管规模庞大，维基数据在许多领域仍然不完整和不准确。英文维基百科上成千上万的文章在维基数据上没有或只有有限的有意义的结构。文献中已经做了很多工作来部分或完全自动化完成知识图的过程，但很少实际应用于维基数据。本文提出了两种相互关联的实用方法来加快维基数据完成任务。第一个是Wwwyzzerdd，这是一个浏览器扩展，允许用户快速从维基百科导入语句到维基数据。Wwwyzzerdd已被用于对维基数据进行超过10万次的编辑。第二个是Psychiq，一个基于英文维基百科文章预测实例和子类陈述的新模型。Psychiq的性能和特点使它非常适合为维基数据社区解决各种问题。一个最初的用途是将Psychiq模型集成到Wwwyzzerdd浏览器扩展中。

引用次数: 0

ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources ProVe:一个针对文本源自动验证知识图出处的管道

3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-09-12 DOI: 10.3233/sw-233467

Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

Knowledge Graphs are repositories of information that gather data from a multitude of domains and sources in the form of semantic triples, serving as a source of structured data for various crucial applications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve as secondary sources of information and depend on well-documented and verifiable provenance to ensure their trustworthiness and usability. However, their ability to systematically assess and assure the quality of this provenance, most crucially whether it properly supports the graph’s information, relies mainly on manual processes that do not scale with size. ProVe aims at remedying this, consisting of a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance. ProVe is intended to assist information curators and consists of four main steps involving rule-based methods and machine learning models: text extraction, triple verbalisation, sentence selection, and claim verification. ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent performance on the binary classification task of detecting support from provenance, with 87.5 % accuracy and 82.9 % F1-macro on text-rich sources. The evaluation data and scripts used in this paper are available in GitHub and Figshare.

知识图谱是以语义三元组的形式从众多领域和来源收集数据的信息库，作为现代网络环境中各种关键应用程序(从维基百科信息框到搜索引擎)的结构化数据来源。这些图表主要作为次要信息来源，并依赖于良好的文档和可验证的来源，以确保其可靠性和可用性。然而，他们系统地评估和确保这个来源的质量的能力，最关键的是它是否正确地支持了图表的信息，主要依赖于不随规模缩放的手动过程。ProVe旨在解决这个问题，它包括一种流水线方法，可以自动验证从文档来源中提取的文本是否支持知识图三元组。ProVe旨在帮助信息管理员，包括四个主要步骤，涉及基于规则的方法和机器学习模型:文本提取、三重语言化、句子选择和声明验证。ProVe在Wikidata数据集上进行了评估，总体上取得了很好的结果，在检测来源支持的二元分类任务上表现出色，在文本丰富的源上准确率为87.5%，F1-macro为82.9%。本文中使用的评估数据和脚本可以在GitHub和Figshare中获得。

{"title":"ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources","authors":"Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl","doi":"10.3233/sw-233467","DOIUrl":"https://doi.org/10.3233/sw-233467","url":null,"abstract":"Knowledge Graphs are repositories of information that gather data from a multitude of domains and sources in the form of semantic triples, serving as a source of structured data for various crucial applications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve as secondary sources of information and depend on well-documented and verifiable provenance to ensure their trustworthiness and usability. However, their ability to systematically assess and assure the quality of this provenance, most crucially whether it properly supports the graph’s information, relies mainly on manual processes that do not scale with size. ProVe aims at remedying this, consisting of a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance. ProVe is intended to assist information curators and consists of four main steps involving rule-based methods and machine learning models: text extraction, triple verbalisation, sentence selection, and claim verification. ProVe is evaluated on a Wikidata dataset, achieving promising results overall and excellent performance on the binary classification task of detecting support from provenance, with 87.5 % accuracy and 82.9 % F1-macro on text-rich sources. The evaluation data and scripts used in this paper are available in GitHub and Figshare.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135877961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What is in your cookie box? Explaining ingredients of web cookies with knowledge graphs 你的饼干盒子里是什么?用知识图谱解释网络cookie的成分

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-08-21 DOI: 10.3233/sw-233435

Geni Bushati, Sven Carsten Rasmusen, Anelia Kurteva, Anurag Vats, Petraq Nako, A. Fensel

The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies being unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie “disappears”, and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific Internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation approach and tool help to increase users’ awareness of cookies and data sharing.

通用数据保护条例(GDPR)对数据共享提出了严格的要求，其中之一是知情同意。一种常见的在线请求同意的方式是通过cookies。然而，通常情况下，用户接受在线cookie并不知道给定同意的含义和以下含义。一旦同意，饼干就会“消失”，人们就会忘记同意最初是被给予的。检索cookie和同意日志变得很有挑战性，因为大多数信息都存储在特定Internet浏览器的日志中。为了让用户意识到cookie同意所隐含的数据共享，并支持系统内的透明度和可追溯性，我们提出了一个基于知识图(KG)的工具，用于个性化cookie同意信息可视化。KG基于OntoCookie本体，该本体以机器可读的格式对cookie进行建模，并支持跨域的数据可解释性。评估结果证实，用户对通过cookie共享的数据的理解是模糊和不足的。此外，我们的工作导致用户在同意之前查看cookie横幅时谨慎的意愿增加了47.5%。这些和其他评估结果证实，我们的cookie数据可视化方法和工具有助于提高用户对cookie和数据共享的认识。

{"title":"What is in your cookie box? Explaining ingredients of web cookies with knowledge graphs","authors":"Geni Bushati, Sven Carsten Rasmusen, Anelia Kurteva, Anurag Vats, Petraq Nako, A. Fensel","doi":"10.3233/sw-233435","DOIUrl":"https://doi.org/10.3233/sw-233435","url":null,"abstract":"The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies being unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie “disappears”, and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific Internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation approach and tool help to increase users’ awareness of cookies and data sharing.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"543 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76927534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Focused categorization power of ontologies: General framework and study on simple existential concept expressions 本体的集中分类能力:简单存在概念表达的一般框架与研究

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-08-02 DOI: 10.3233/sw-233401

V. Svátek, Ondřej Zamazal, Viet Bach Nguyen, J. Ivánek, Ján Kľuka, Miroslav Vacura

When reusing existing ontologies for publishing a dataset in RDF (or developing a new ontology), preference may be given to those providing extensive subcategorization for important classes (denoted as focus classes). The subcategories may consist not only of named classes but also of compound class expressions. We define the notion of focused categorization power of a given ontology, with respect to a focus class and a concept expression language, as the (estimated) weighted count of the categories that can be built from the ontology’s signature, conform to the language, and are subsumed by the focus class. For the sake of tractable initial experiments we then formulate a restricted concept expression language based on existential restrictions, and heuristically map it to syntactic patterns over ontology axioms (so-called FCE patterns). The characteristics of the chosen concept expression language and associated FCE patterns are investigated using three different empirical sources derived from ontology collections: first, the concept expression pattern frequency in class definitions; second, the occurrence of FCE patterns in the Tbox of ontologies; and last, for class expressions generated from the Tbox of ontologies (through the FCE patterns); their ‘meaningfulness’ was assessed by different groups of users, yielding a ‘quality ordering’ of the concept expression patterns. The complementary analyses are then compared and summarized. To allow for further experimentation, a web-based prototype was also implemented, which covers the whole process of ontology reuse from keyword-based ontology search through the FCP computation to the selection of ontologies and their enrichment with new concepts built from compound expressions.

当重用现有本体以RDF发布数据集(或开发新的本体)时，可以优先考虑那些为重要类(表示为焦点类)提供广泛子分类的本体。子类别不仅可以由命名类组成，还可以由复合类表达式组成。我们将给定本体的焦点分类能力的概念定义为，相对于焦点类和概念表达语言，可以从本体的签名中构建的类别的(估计的)加权计数，符合语言，并被焦点类所包含。为了易于处理的初始实验，我们在存在限制的基础上制定了一个受限制的概念表达语言，并启发式地将其映射到本体公理上的语法模式(所谓的FCE模式)。本文利用从本体集合中获得的三个不同的经验来源，研究了所选择的概念表达语言和相关FCE模式的特征:第一，类定义中的概念表达模式频率;二是本体Tbox中FCE模式的出现;最后，对于本体的Tbox(通过FCE模式)生成的类表达式;它们的“意义”由不同的用户组进行评估，从而产生概念表达模式的“质量排序”。然后对互补分析进行比较和总结。为了进一步实验，还实现了一个基于web的原型，该原型涵盖了本体重用的整个过程，从基于关键字的本体搜索到FCP计算，再到本体的选择以及用复合表达式构建的新概念对其进行充实。

{"title":"Focused categorization power of ontologies: General framework and study on simple existential concept expressions","authors":"V. Svátek, Ondřej Zamazal, Viet Bach Nguyen, J. Ivánek, Ján Kľuka, Miroslav Vacura","doi":"10.3233/sw-233401","DOIUrl":"https://doi.org/10.3233/sw-233401","url":null,"abstract":"When reusing existing ontologies for publishing a dataset in RDF (or developing a new ontology), preference may be given to those providing extensive subcategorization for important classes (denoted as focus classes). The subcategories may consist not only of named classes but also of compound class expressions. We define the notion of focused categorization power of a given ontology, with respect to a focus class and a concept expression language, as the (estimated) weighted count of the categories that can be built from the ontology’s signature, conform to the language, and are subsumed by the focus class. For the sake of tractable initial experiments we then formulate a restricted concept expression language based on existential restrictions, and heuristically map it to syntactic patterns over ontology axioms (so-called FCE patterns). The characteristics of the chosen concept expression language and associated FCE patterns are investigated using three different empirical sources derived from ontology collections: first, the concept expression pattern frequency in class definitions; second, the occurrence of FCE patterns in the Tbox of ontologies; and last, for class expressions generated from the Tbox of ontologies (through the FCE patterns); their ‘meaningfulness’ was assessed by different groups of users, yielding a ‘quality ordering’ of the concept expression patterns. The complementary analyses are then compared and summarized. To allow for further experimentation, a web-based prototype was also implemented, which covers the whole process of ontology reuse from keyword-based ontology search through the FCP computation to the selection of ontologies and their enrichment with new concepts built from compound expressions.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83630655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Searching for explanations of black-box classifiers in the space of semantic queries 在语义查询空间中搜索黑箱分类器的解释

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-08-02 DOI: 10.3233/sw-233469

Jason Liartis, Edmund Dervakos, Orfeas Menis-Mastromichalakis, A. Chortaras, G. Stamou

Deep learning models have achieved impressive performance in various tasks, but they are usually opaque with regards to their inner complex operation, obfuscating the reasons for which they make decisions. This opacity raises ethical and legal concerns regarding the real-life use of such models, especially in critical domains such as in medicine, and has led to the emergence of the eXplainable Artificial Intelligence (XAI) field of research, which aims to make the operation of opaque AI systems more comprehensible to humans. The problem of explaining a black-box classifier is often approached by feeding it data and observing its behaviour. In this work, we feed the classifier with data that are part of a knowledge graph, and describe the behaviour with rules that are expressed in the terminology of the knowledge graph, that is understandable by humans. We first theoretically investigate the problem to provide guarantees for the extracted rules and then we investigate the relation of “explanation rules for a specific class” with “semantic queries collecting from the knowledge graph the instances classified by the black-box classifier to this specific class”. Thus we approach the problem of extracting explanation rules as a semantic query reverse engineering problem. We develop algorithms for solving this inverse problem as a heuristic search in the space of semantic queries and we evaluate the proposed algorithms on four simulated use-cases and discuss the results.

深度学习模型在各种任务中取得了令人印象深刻的表现，但它们的内部复杂操作通常是不透明的，混淆了它们做出决策的原因。这种不透明性引发了对这些模型在现实生活中使用的伦理和法律担忧，特别是在医学等关键领域，并导致了可解释人工智能(XAI)研究领域的出现，该研究旨在使不透明的人工智能系统的操作更容易被人类理解。解释黑盒分类器的问题通常是通过给它提供数据和观察它的行为来解决的。在这项工作中，我们为分类器提供知识图的一部分数据，并用知识图的术语表示的规则描述行为，这是人类可以理解的。我们首先从理论上研究问题，为所提取的规则提供保证，然后研究“特定类的解释规则”与“从知识图中收集由黑箱分类器分类到该特定类的实例的语义查询”的关系。因此，我们将解释规则的提取问题作为一个语义查询逆向工程问题来处理。我们开发了一种算法来解决这个逆问题，作为语义查询空间中的启发式搜索，我们在四个模拟用例上评估了所提出的算法并讨论了结果。

{"title":"Searching for explanations of black-box classifiers in the space of semantic queries","authors":"Jason Liartis, Edmund Dervakos, Orfeas Menis-Mastromichalakis, A. Chortaras, G. Stamou","doi":"10.3233/sw-233469","DOIUrl":"https://doi.org/10.3233/sw-233469","url":null,"abstract":"Deep learning models have achieved impressive performance in various tasks, but they are usually opaque with regards to their inner complex operation, obfuscating the reasons for which they make decisions. This opacity raises ethical and legal concerns regarding the real-life use of such models, especially in critical domains such as in medicine, and has led to the emergence of the eXplainable Artificial Intelligence (XAI) field of research, which aims to make the operation of opaque AI systems more comprehensible to humans. The problem of explaining a black-box classifier is often approached by feeding it data and observing its behaviour. In this work, we feed the classifier with data that are part of a knowledge graph, and describe the behaviour with rules that are expressed in the terminology of the knowledge graph, that is understandable by humans. We first theoretically investigate the problem to provide guarantees for the extracted rules and then we investigate the relation of “explanation rules for a specific class” with “semantic queries collecting from the knowledge graph the instances classified by the black-box classifier to this specific class”. Thus we approach the problem of extracting explanation rules as a semantic query reverse engineering problem. We develop algorithms for solving this inverse problem as a heuristic search in the space of semantic queries and we evaluate the proposed algorithms on four simulated use-cases and discuss the results.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"107 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80802224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifiable integrity for Linked Data on the web 网络上关联数据的可量化完整性

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-07-18 DOI: 10.3233/sw-233409

Christoph H.-J. Braun, Tobias Käfer

We present an approach to publish Linked Data on the Web with quantifiable integrity using Web technologies, and in which rational agents are incentivised to contribute to the integrity of the link network. To this end, we introduce self-verifying resource representations, that include Linked Data Signatures whose signature value is used as a suffix in the resource’s URI. Links among such representations, typically managed as web documents, contribute therefore to preserving the integrity of the resulting document graphs. To quantify how well a document’s integrity can be relied on, we introduce the notion of trust scores and present an interpretation based on hubs and authorities. In addition, we present how specific agent behaviour may be induced by the choice of trust score regarding their optimisation, e.g., in general but also using a heuristic strategy called Additional Reach Strategy (ARS). We discuss our approach in a three-fold evaluation: First, we evaluate the effect of different graph metrics as trust scores on induced agent behaviour and resulting evolution of the document graph. We show that trust scores based on hubs and authorities induce agent behaviour that contributes to integrity preservation in the document graph. Next, we evaluate different heuristics for agents to optimise trust scores when general optimisation strategies are not applicable. We show that ARS outperforms other potential optimisation strategies. Last, we evaluate the whole approach by examining the resilience of integrity preservation in a document graph when resources are deleted. To this end, we propose a simulation system based on the Watts–Strogatz model for simulating a social network. We show that our approach produces a document graph that can recover from such attacks or failures in the document graph.

我们提出了一种使用Web技术在Web上发布具有可量化完整性的关联数据的方法，并在该方法中激励理性代理为链接网络的完整性做出贡献。为此，我们引入了自验证资源表示，其中包括链接数据签名，其签名值用作资源URI中的后缀。因此，这些表示之间的链接(通常作为web文档进行管理)有助于保持生成的文档图的完整性。为了量化文档的完整性可以依赖的程度，我们引入了信任分数的概念，并基于中心和权威给出了解释。此外，我们还介绍了如何通过选择优化的信任分数来诱导特定的代理行为，例如，一般情况下，但也使用一种称为额外到达策略(ARS)的启发式策略。我们以三方面的评估来讨论我们的方法:首先，我们评估了不同的图形度量作为信任分数对诱导代理行为和文档图的最终进化的影响。我们表明，基于中心和权威的信任分数诱导代理行为，有助于文档图中的完整性保存。接下来，我们评估了当一般优化策略不适用时，代理优化信任分数的不同启发式方法。我们表明，ARS优于其他潜在的优化策略。最后，我们通过检查资源被删除时文档图中完整性保存的弹性来评估整个方法。为此，我们提出了一个基于Watts-Strogatz模型的仿真系统来模拟社会网络。我们展示了我们的方法生成的文档图可以从文档图中的此类攻击或故障中恢复。

{"title":"Quantifiable integrity for Linked Data on the web","authors":"Christoph H.-J. Braun, Tobias Käfer","doi":"10.3233/sw-233409","DOIUrl":"https://doi.org/10.3233/sw-233409","url":null,"abstract":"We present an approach to publish Linked Data on the Web with quantifiable integrity using Web technologies, and in which rational agents are incentivised to contribute to the integrity of the link network. To this end, we introduce self-verifying resource representations, that include Linked Data Signatures whose signature value is used as a suffix in the resource’s URI. Links among such representations, typically managed as web documents, contribute therefore to preserving the integrity of the resulting document graphs. To quantify how well a document’s integrity can be relied on, we introduce the notion of trust scores and present an interpretation based on hubs and authorities. In addition, we present how specific agent behaviour may be induced by the choice of trust score regarding their optimisation, e.g., in general but also using a heuristic strategy called Additional Reach Strategy (ARS). We discuss our approach in a three-fold evaluation: First, we evaluate the effect of different graph metrics as trust scores on induced agent behaviour and resulting evolution of the document graph. We show that trust scores based on hubs and authorities induce agent behaviour that contributes to integrity preservation in the document graph. Next, we evaluate different heuristics for agents to optimise trust scores when general optimisation strategies are not applicable. We show that ARS outperforms other potential optimisation strategies. Last, we evaluate the whole approach by examining the resilience of integrity preservation in a document graph when resources are deleted. To this end, we propose a simulation system based on the Watts–Strogatz model for simulating a social network. We show that our approach produces a document graph that can recover from such attacks or failures in the document graph.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"12 8 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90173778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A conceptual model for ontology quality assessment 本体质量评价的概念模型

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-06-29 DOI: 10.3233/sw-233393

R. Wilson, J. Goonetillake, W. A. Indika, A. Ginige

With the continuous advancement of methods, tools, and techniques in ontology development, ontologies have emerged in various fields such as machine learning, robotics, biomedical informatics, agricultural informatics, crowdsourcing, database management, and the Internet of Things. Nevertheless, the nonexistence of a universally agreed methodology for specifying and evaluating the quality of an ontology hinders the success of ontology-based systems in such fields as the quality of each component is required for the overall quality of a system and in turn impacts the usability in use. Moreover, a number of anomalies in definitions of ontology quality concepts are visible, and in addition to that, the ontology quality assessment is limited only to a certain set of characteristics in practice even though some other significant characteristics have to be considered for the specified use-case. Thus, in this research, a comprehensive analysis was performed to uncover the existing contributions specifically on ontology quality models, characteristics, and the associated measures of these characteristics. Consequently, the characteristics identified through this review were classified with the associated aspects of the ontology evaluation space. Furthermore, the formalized definitions for each quality characteristic are provided through this study from the ontological perspective based on the accepted theories and standards. Additionally, a thorough analysis of the extent to which the existing works have covered the quality evaluation aspects is presented and the areas further to be investigated are outlined.

随着本体开发方法、工具和技术的不断进步，本体在机器学习、机器人、生物医学信息学、农业信息学、众包、数据库管理、物联网等各个领域涌现。然而，没有一种普遍认可的方法来指定和评估本体的质量，这阻碍了基于本体的系统在某些领域的成功，因为每个组件的质量是系统整体质量所必需的，反过来又影响了使用中的可用性。此外，本体质量概念定义中的一些异常是可见的，除此之外，尽管必须考虑特定用例的其他一些重要特征，但在实践中，本体质量评估仅限于特定的一组特征。因此，在本研究中，我们进行了全面的分析，以揭示现有的贡献，特别是在本体质量模型、特征以及这些特征的相关度量方面。因此，通过本综述识别的特征与本体评价空间的相关方面进行了分类。此外，基于公认的理论和标准，本研究从本体论的角度对各个质量特征给出了形式化的定义。此外，对现有工作涵盖质量评估方面的程度进行了全面分析，并概述了进一步调查的领域。

{"title":"A conceptual model for ontology quality assessment","authors":"R. Wilson, J. Goonetillake, W. A. Indika, A. Ginige","doi":"10.3233/sw-233393","DOIUrl":"https://doi.org/10.3233/sw-233393","url":null,"abstract":"With the continuous advancement of methods, tools, and techniques in ontology development, ontologies have emerged in various fields such as machine learning, robotics, biomedical informatics, agricultural informatics, crowdsourcing, database management, and the Internet of Things. Nevertheless, the nonexistence of a universally agreed methodology for specifying and evaluating the quality of an ontology hinders the success of ontology-based systems in such fields as the quality of each component is required for the overall quality of a system and in turn impacts the usability in use. Moreover, a number of anomalies in definitions of ontology quality concepts are visible, and in addition to that, the ontology quality assessment is limited only to a certain set of characteristics in practice even though some other significant characteristics have to be considered for the specified use-case. Thus, in this research, a comprehensive analysis was performed to uncover the existing contributions specifically on ontology quality models, characteristics, and the associated measures of these characteristics. Consequently, the characteristics identified through this review were classified with the associated aspects of the ontology evaluation space. Furthermore, the formalized definitions for each quality characteristic are provided through this study from the ontological perspective based on the accepted theories and standards. Additionally, a thorough analysis of the extent to which the existing works have covered the quality evaluation aspects is presented and the areas further to be investigated are outlined.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"42 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85850377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimizing SPARQL queries over decentralized knowledge graphs 优化分散知识图上的SPARQL查询

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-06-22 DOI: 10.3233/sw-233438

Christian Aebeloe, Gabriela Montoya, K. Hose

While the Web of Data in principle offers access to a wide range of interlinked data, the architecture of the Semantic Web today relies mostly on the data providers to maintain access to their data through SPARQL endpoints. Several studies, however, have shown that such endpoints often experience downtime, meaning that the data they maintain becomes inaccessible. While decentralized systems based on Peer-to-Peer (P2P) technology have previously shown to increase the availability of knowledge graphs, even when a large proportion of the nodes fail, processing queries in such a setup can be an expensive task since data necessary to answer a single query might be distributed over multiple nodes. In this paper, we therefore propose an approach to optimizing SPARQL queries over decentralized knowledge graphs, called Lothbrok. While there are potentially many aspects to consider when optimizing such queries, we focus on three aspects: cardinality estimation, locality awareness, and data fragmentation. We empirically show that Lothbrok is able to achieve significantly faster query processing performance compared to the state of the art when processing challenging queries as well as when the network is under high load.

虽然数据Web原则上提供了对广泛的互连数据的访问，但语义Web的体系结构目前主要依赖于数据提供者通过SPARQL端点维护对其数据的访问。然而，一些研究表明，这样的端点经常经历停机，这意味着它们维护的数据变得不可访问。虽然基于点对点(P2P)技术的去中心化系统以前已经证明可以提高知识图的可用性，但即使在大部分节点失败的情况下，在这种设置中处理查询可能是一项昂贵的任务，因为回答单个查询所需的数据可能分布在多个节点上。因此，在本文中，我们提出了一种方法来优化分散知识图上的SPARQL查询，称为Lothbrok。虽然在优化此类查询时可能有许多方面需要考虑，但我们主要关注三个方面:基数估计、位置感知和数据碎片。我们的经验表明，在处理具有挑战性的查询以及网络处于高负载状态时，与现有的查询处理性能相比，Lothbrok能够实现更快的查询处理性能。

{"title":"Optimizing SPARQL queries over decentralized knowledge graphs","authors":"Christian Aebeloe, Gabriela Montoya, K. Hose","doi":"10.3233/sw-233438","DOIUrl":"https://doi.org/10.3233/sw-233438","url":null,"abstract":"While the Web of Data in principle offers access to a wide range of interlinked data, the architecture of the Semantic Web today relies mostly on the data providers to maintain access to their data through SPARQL endpoints. Several studies, however, have shown that such endpoints often experience downtime, meaning that the data they maintain becomes inaccessible. While decentralized systems based on Peer-to-Peer (P2P) technology have previously shown to increase the availability of knowledge graphs, even when a large proportion of the nodes fail, processing queries in such a setup can be an expensive task since data necessary to answer a single query might be distributed over multiple nodes. In this paper, we therefore propose an approach to optimizing SPARQL queries over decentralized knowledge graphs, called Lothbrok. While there are potentially many aspects to consider when optimizing such queries, we focus on three aspects: cardinality estimation, locality awareness, and data fragmentation. We empirically show that Lothbrok is able to achieve significantly faster query processing performance compared to the state of the art when processing challenging queries as well as when the network is under high load.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"47 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84563463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Conjunctive query answering over unrestricted OWL 2 ontologies 在不受限制的owl2本体上的联合查询应答

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-06-22 DOI: 10.3233/sw-233382

Federico Igne, Stefano Germano, Ian Horrocks

Conjunctive Query (CQ) answering is a primary reasoning task over knowledge bases. However, when considering expressive description logics, query answering can be computationally very expensive; reasoners for CQ answering, although heavily optimized, often sacrifice expressive power of the input ontology or completeness of the computed answers in order to achieve tractability and scalability for the problem. In this work, we present a hybrid query answering architecture that combines various services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used, to ensure maximum performance. Otherwise, the tractable services are used to compute lower and upper bound approximations. The union of the lower bounds and the intersection of the upper bounds are then compared. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. These techniques led to the development of two new systems: (i) RSAComb, an efficient implementation of a new tractable answering service for RSA (role safety acyclic) (ii) ACQuA, a reference implementation of the proposed hybrid architecture combining RSAComb, PAGOdA, and HermiT to provide a CQ answering service for OWL. Our extensive evaluation shows how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we show how ACQuA can reliably match the performance of PAGOdA, a state-of-the-art CQ answering system that uses a similar approach, and can significantly improve performance when PAGOdA extensively relies on the underlying fully-fledged reasoner.

联合查询(CQ)回答是基于知识库的主要推理任务。然而，当考虑表达性描述逻辑时，查询应答可能在计算上非常昂贵;CQ回答的推理器虽然经过了大量优化，但为了实现问题的可跟踪性和可扩展性，往往牺牲了输入本体的表达能力或计算答案的完整性。在这项工作中，我们提出了一种混合查询应答体系结构，它结合了各种服务，为OWL提供CQ应答服务。具体来说，它将可伸缩的可处理语言的CQ应答服务与更具表达性的接近完整OWL 2的语言的CQ应答服务相结合。如果查询可以由一个可处理的服务完全回答，则使用该服务，以确保最大性能。否则，使用可处理的服务来计算下界和上界近似。然后比较下界的并和上界的交。如果边界不一致，则使用“full”服务检查“gap”答案。这些技术导致了两个新系统的发展:(i) RSAComb，一种针对RSA(角色安全无环)的新型可处理应答服务的有效实现;(ii) ACQuA，一种将RSAComb、PAGOdA和HermiT结合在一起的混合架构的参考实现，为OWL提供CQ应答服务。我们的广泛评估表明，与依赖完全成熟的推理器相比，在像RSA这样更具表现力的语言上进行推理所引入的额外计算成本仍然可以提供显着的改进。此外，我们还展示了ACQuA如何可靠地匹配宝塔的性能，宝塔是一种使用类似方法的最先进的CQ应答系统，当宝塔广泛依赖于底层成熟的推理器时，ACQuA可以显著提高性能。

{"title":"Conjunctive query answering over unrestricted OWL 2 ontologies","authors":"Federico Igne, Stefano Germano, Ian Horrocks","doi":"10.3233/sw-233382","DOIUrl":"https://doi.org/10.3233/sw-233382","url":null,"abstract":"Conjunctive Query (CQ) answering is a primary reasoning task over knowledge bases. However, when considering expressive description logics, query answering can be computationally very expensive; reasoners for CQ answering, although heavily optimized, often sacrifice expressive power of the input ontology or completeness of the computed answers in order to achieve tractability and scalability for the problem. In this work, we present a hybrid query answering architecture that combines various services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used, to ensure maximum performance. Otherwise, the tractable services are used to compute lower and upper bound approximations. The union of the lower bounds and the intersection of the upper bounds are then compared. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. These techniques led to the development of two new systems: (i) RSAComb, an efficient implementation of a new tractable answering service for RSA (role safety acyclic) (ii) ACQuA, a reference implementation of the proposed hybrid architecture combining RSAComb, PAGOdA, and HermiT to provide a CQ answering service for OWL. Our extensive evaluation shows how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we show how ACQuA can reliably match the performance of PAGOdA, a state-of-the-art CQ answering system that uses a similar approach, and can significantly improve performance when PAGOdA extensively relies on the underlying fully-fledged reasoner.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"58 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81958959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AgreementMakerLight AgreementMakerLight

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-06-22 DOI: 10.3233/sw-233304

Daniel Faria, Emanuel Santos, B. Balasubramani, M. C. Silva, Francisco M. Couto, Catia Pesquita

Ontology matching establishes correspondences between entities of related ontologies, with applications ranging from enabling semantic interoperability to supporting ontology and knowledge graph development. Its demand within the Semantic Web community is on the rise, as the popularity of knowledge graph supporting information systems or artificial intelligence applications continues to increase. In this article, we showcase AgreementMakerLight (AML), an ontology matching system in continuous development since 2013, with demonstrated performance over nine editions of the Ontology Alignment Evaluation Initiative (OAEI), and a history of real-world applications across a variety of domains. We overview AML’s architecture and algorithms, its user interfaces and functionalities, its performance, and its impact. AML has participated in more OAEI tracks since 2013 than any other matching system, has a median rank by F-measure between 1 and 2 across all tracks in every year since 2014, and a rank by run time between 3 and 4. Thus, it offers a combination of range, quality and efficiency that few matching systems can rival. Moreover, AML’s impact can be gauged by the 263 (non-self) publications that cite one or more of its papers, among which we count 34 real-world applications.

本体匹配在相关本体的实体之间建立对应关系，其应用范围从启用语义互操作性到支持本体和知识图开发。随着支持信息系统或人工智能应用程序的知识图的普及，语义Web社区对它的需求正在上升。在本文中，我们展示了AgreementMakerLight (AML)，一个自2013年以来不断开发的本体匹配系统，在九个版本的本体校准评估计划(OAEI)中展示了性能，以及跨各种领域的实际应用历史。我们概述了AML的架构和算法、用户界面和功能、性能及其影响。自2013年以来，AML参与的OAEI曲目比其他任何匹配系统都多，自2014年以来，每年所有曲目的F-measure排名中位数在1到2之间，运行时间排名在3到4之间。因此，它提供了范围，质量和效率的组合，很少有匹配系统可以竞争。此外，“反洗钱”的影响可以通过引用一篇或多篇论文的263篇(非自我)出版物来衡量，其中我们统计了34篇实际应用。

{"title":"AgreementMakerLight","authors":"Daniel Faria, Emanuel Santos, B. Balasubramani, M. C. Silva, Francisco M. Couto, Catia Pesquita","doi":"10.3233/sw-233304","DOIUrl":"https://doi.org/10.3233/sw-233304","url":null,"abstract":"Ontology matching establishes correspondences between entities of related ontologies, with applications ranging from enabling semantic interoperability to supporting ontology and knowledge graph development. Its demand within the Semantic Web community is on the rise, as the popularity of knowledge graph supporting information systems or artificial intelligence applications continues to increase. In this article, we showcase AgreementMakerLight (AML), an ontology matching system in continuous development since 2013, with demonstrated performance over nine editions of the Ontology Alignment Evaluation Initiative (OAEI), and a history of real-world applications across a variety of domains. We overview AML’s architecture and algorithms, its user interfaces and functionalities, its performance, and its impact. AML has participated in more OAEI tracks since 2013 than any other matching system, has a median rank by F-measure between 1 and 2 across all tracks in every year since 2014, and a rank by run time between 3 and 4. Thus, it offers a combination of range, quality and efficiency that few matching systems can rival. Moreover, AML’s impact can be gauged by the 263 (non-self) publications that cite one or more of its papers, among which we count 34 real-world applications.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"17 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85729717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0