Pub Date : 2024-09-10DOI: 10.1016/j.websem.2024.100833
Soumajit Pramanik , Jesujoba Alabi , Rishiraj Saha Roy , Gerhard Weikum
Question answering over RDF data like knowledge graphs has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents a method for complex questions that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called Uniqorn, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph typically contains all question-relevant evidences but also a lot of noise. Uniqorn copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that Uniqorn significantly outperforms state-of-the-art methods for heterogeneous QA – in a full training mode, as well as in zero-shot settings. The graph-based methodology provides user-interpretable evidence for the complete answering process.
{"title":"Uniqorn: Unified question answering over RDF knowledge graphs and natural language text","authors":"Soumajit Pramanik , Jesujoba Alabi , Rishiraj Saha Roy , Gerhard Weikum","doi":"10.1016/j.websem.2024.100833","DOIUrl":"10.1016/j.websem.2024.100833","url":null,"abstract":"<div><p>Question answering over RDF data like knowledge graphs has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents a method for <em>complex questions</em> that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called <span>Uniqorn</span>, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph typically contains all question-relevant evidences but also a lot of noise. <span>Uniqorn</span> copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that <span>Uniqorn</span> significantly outperforms state-of-the-art methods for <em>heterogeneous QA</em> – in a full training mode, as well as in zero-shot settings. The graph-based methodology provides user-interpretable evidence for the complete answering process.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"83 ","pages":"Article 100833"},"PeriodicalIF":2.1,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000192/pdfft?md5=1b3a7cdd704527ca28fe0609b32bbd44&pid=1-s2.0-S1570826824000192-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142238245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.websem.2024.100832
Daqian Shi, Xiaoyue Li, Fausto Giunchiglia
A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.
解决语义异构问题的常见方法是利用一个或多个候选知识图谱中编码的信息进行知识图谱(KG)扩展,其中参考知识图谱和候选知识图谱之间的配准被认为是关键步骤。然而,现有的知识图谱配准方法主要依赖实体类型(etype)标签匹配作为前提条件,但这种方法在实际应用中效果不佳,或者在某些情况下并不适用。在本文中,我们设计了一个基于机器学习的 KG 扩展框架,其中包括另一种基于属性的新型配准方法,该方法允许根据用于定义实体类型的属性对实体类型进行配准。主要的直觉是,是属性有意定义了 etype,而这种定义与用于命名 etype 的特定标签和 KG 的特定分层模式无关。与最先进的方法相比,实验结果从定量和定性两个方面显示了 KG 对齐方法的有效性和所建议的 KG 扩展框架的优越性。
{"title":"KAE: A property-based method for knowledge graph alignment and extension","authors":"Daqian Shi, Xiaoyue Li, Fausto Giunchiglia","doi":"10.1016/j.websem.2024.100832","DOIUrl":"10.1016/j.websem.2024.100832","url":null,"abstract":"<div><p>A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performing in practice or not applicable in some cases. In this paper, we design a machine learning-based framework for KG extension, including an alternative novel property-based alignment approach that allows aligning etypes on the basis of the properties used to define them. The main intuition is that it is properties that intentionally define the etype, and this definition is independent of the specific label used to name an etype, and of the specific hierarchical schema of KGs. Compared with the state-of-the-art, the experimental results show the validity of the KG alignment approach and the superiority of the proposed KG extension framework, both quantitatively and qualitatively.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100832"},"PeriodicalIF":2.1,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000180/pdfft?md5=0e32d6cca795e8742e917608eef1c323&pid=1-s2.0-S1570826824000180-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141692104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-29DOI: 10.1016/j.websem.2024.100831
Zhifei Hu , Feng Xia
In recent years, the powerful modeling ability of Graph Neural Networks (GNNs) has led to their widespread use in knowledge-aware recommender systems. However, existing GNN-based methods for information propagation among entities in knowledge graphs (KGs) may not efficiently filter out less informative entities. To address this challenge and improve the encoding of high-order structure information among many entities, we propose an end-to-end neural network-based method called Multi-stream Graph Attention Network (MSGAT). MSGAT explicitly discriminates the importance of entities from four critical perspectives and recursively propagates neighbor embeddings to refine the target node. Specifically, we use an attention mechanism from the user's perspective to distill the domain nodes' information of the predicted item in the KG, enhance the user's information on items, and generate the feature representation of the predicted item. We also propose a multi-stream attention mechanism to aggregate user history click item's neighborhood entity information in the KG and generate the user's feature representation. We conduct extensive experiments on three real datasets for movies, music, and books, and the empirical results demonstrate that MSGAT outperforms current state-of-the-art baselines.
近年来,图神经网络(GNN)强大的建模能力使其在知识感知推荐系统中得到广泛应用。然而,现有的基于图神经网络的知识图谱(KG)实体间信息传播方法可能无法有效过滤掉信息量较少的实体。为了应对这一挑战并改进对众多实体间高阶结构信息的编码,我们提出了一种基于端到端神经网络的方法,称为多流图注意网络(MSGAT)。MSGAT 从四个关键角度明确判别实体的重要性,并递归传播邻居嵌入来完善目标节点。具体来说,我们利用用户视角的关注机制来提炼 KG 中预测项目的域节点信息,增强用户对项目的信息,并生成预测项目的特征表示。我们还提出了一种多流关注机制,以聚合 KG 中用户历史点击项目的邻域实体信息,并生成用户的特征表示。我们在电影、音乐和书籍三个真实数据集上进行了大量实验,实证结果表明 MSGAT 优于当前最先进的基线。
{"title":"Multi-stream graph attention network for recommendation with knowledge graph","authors":"Zhifei Hu , Feng Xia","doi":"10.1016/j.websem.2024.100831","DOIUrl":"10.1016/j.websem.2024.100831","url":null,"abstract":"<div><p>In recent years, the powerful modeling ability of Graph Neural Networks (GNNs) has led to their widespread use in knowledge-aware recommender systems. However, existing GNN-based methods for information propagation among entities in knowledge graphs (KGs) may not efficiently filter out less informative entities. To address this challenge and improve the encoding of high-order structure information among many entities, we propose an end-to-end neural network-based method called Multi-stream Graph Attention Network (MSGAT). MSGAT explicitly discriminates the importance of entities from four critical perspectives and recursively propagates neighbor embeddings to refine the target node. Specifically, we use an attention mechanism from the user's perspective to distill the domain nodes' information of the predicted item in the KG, enhance the user's information on items, and generate the feature representation of the predicted item. We also propose a multi-stream attention mechanism to aggregate user history click item's neighborhood entity information in the KG and generate the user's feature representation. We conduct extensive experiments on three real datasets for movies, music, and books, and the empirical results demonstrate that MSGAT outperforms current state-of-the-art baselines.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100831"},"PeriodicalIF":2.1,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000179/pdfft?md5=b3464b8bed3c0ac35eee561e19ca6a2a&pid=1-s2.0-S1570826824000179-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1016/j.websem.2024.100823
Cogan Shimizu , Andrew Eells , Seila Gonzalez , Lu Zhou , Pascal Hitzler , Alicia Sheill , Catherine Foley , Dean Rehberger
Wikibase – which is the software underlying Wikidata – is a powerful platform for knowledge graph creation and management. However, it has been developed with a crowd-sourced knowledge graph creation scenario in mind, which in particular means that it has not been designed for use case scenarios in which a tightly controlled high-quality schema, in the form of an ontology, is to be imposed, and indeed, independently developed ontologies do not necessarily map seamlessly to the Wikibase approach. In this paper, we provide the key ingredients needed in order to combine traditional ontology modeling with use of the Wikibase platform, namely a set of axiom patterns that bridge the paradigm gap, together with usage instructions and a worked example for historical data.
{"title":"Ontology design facilitating Wikibase integration — and a worked example for historical data","authors":"Cogan Shimizu , Andrew Eells , Seila Gonzalez , Lu Zhou , Pascal Hitzler , Alicia Sheill , Catherine Foley , Dean Rehberger","doi":"10.1016/j.websem.2024.100823","DOIUrl":"https://doi.org/10.1016/j.websem.2024.100823","url":null,"abstract":"<div><p>Wikibase – which is the software underlying Wikidata – is a powerful platform for knowledge graph creation and management. However, it has been developed with a crowd-sourced knowledge graph creation scenario in mind, which in particular means that it has not been designed for use case scenarios in which a tightly controlled high-quality schema, in the form of an ontology, is to be imposed, and indeed, independently developed ontologies do not necessarily map seamlessly to the Wikibase approach. In this paper, we provide the key ingredients needed in order to combine traditional ontology modeling with use of the Wikibase platform, namely a set of <em>axiom</em> patterns that bridge the paradigm gap, together with usage instructions and a worked example for historical data.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100823"},"PeriodicalIF":2.1,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S157082682400009X/pdfft?md5=f2d0e2fffb17f5e6856c8379d489136d&pid=1-s2.0-S157082682400009X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1016/j.websem.2024.100830
María-Cruz Valiente, Juan Pavón
Decentralized autonomous organizations (DAOs) are relatively a newly emerging type of online entity related to governance or business models where all their members work together and participate in the decision-making processes affecting the DAO in a decentralized, collective, fair, and democratic manner. In a DAO, members interaction is mediated by software agents running on a blockchain that encode the governance of the specific entity in terms of rules that optimize their business and goals. In this context, most popular DAO software frameworks provide decision-making models aiming to facilitate digital governance and the collaboration among their members intertwining social and economic concerns. However, these models are complex, not interoperable among them and lack a common understanding and shared knowledge concerning DAOs, as well as the computational semantics needed to enable automated validation, simulation or execution. Thus, this paper presents an ontology (Web3-DAO), which can support machine-readable digital governance of DAOs adding semantics to their decision-making models. The proposed ontology captures the domain logic that allows the sharing of updated information and decisions for all the members that interact with a DAO by the interoperability of their own assessment and decision tools. Furthermore, the ontology detects semantic ambiguities, uncertainties and contradictions. The Web3-DAO ontology is available in open access at https://github.com/Grasia/semantic-web3-dao.
去中心化自治组织(DAO)是一种新出现的与治理或商业模式相关的在线实体,其所有成员共同协作,以去中心化、集体、公平和民主的方式参与影响 DAO 的决策过程。在 DAO 中,成员之间的互动由在区块链上运行的软件代理进行调解,这些代理根据优化其业务和目标的规则对特定实体的治理进行编码。在这种情况下,大多数流行的 DAO 软件框架都提供了决策模型,旨在促进数字治理及其成员之间的合作,将社会和经济问题交织在一起。然而,这些模型非常复杂,相互之间不能互操作,缺乏对 DAO 的共同理解和共享知识,也缺乏实现自动验证、模拟或执行所需的计算语义。因此,本文提出了一种本体(Web3-DAO),它可以支持机器可读的 DAO 数字治理,并为其决策模型添加语义。所提出的本体论捕捉了领域逻辑,通过与 DAO 交互的所有成员的评估和决策工具的互操作性,可以共享更新的信息和决策。此外,本体还能检测语义模糊、不确定性和矛盾之处。Web3-DAO 本体论可在 https://github.com/Grasia/semantic-web3-dao 网站上公开获取。
{"title":"Web3-DAO: An ontology for decentralized autonomous organizations","authors":"María-Cruz Valiente, Juan Pavón","doi":"10.1016/j.websem.2024.100830","DOIUrl":"10.1016/j.websem.2024.100830","url":null,"abstract":"<div><p>Decentralized autonomous organizations (DAOs) are relatively a newly emerging type of online entity related to governance or business models where all their members work together and participate in the decision-making processes affecting the DAO in a decentralized, collective, fair, and democratic manner. In a DAO, members interaction is mediated by software agents running on a blockchain that encode the governance of the specific entity in terms of rules that optimize their business and goals. In this context, most popular DAO software frameworks provide decision-making models aiming to facilitate digital governance and the collaboration among their members intertwining social and economic concerns. However, these models are complex, not interoperable among them and lack a common understanding and shared knowledge concerning DAOs, as well as the computational semantics needed to enable automated validation, simulation or execution. Thus, this paper presents an ontology (Web3-DAO), which can support machine-readable digital governance of DAOs adding semantics to their decision-making models. The proposed ontology captures the domain logic that allows the sharing of updated information and decisions for all the members that interact with a DAO by the interoperability of their own assessment and decision tools. Furthermore, the ontology detects semantic ambiguities, uncertainties and contradictions. The Web3-DAO ontology is available in open access at <span>https://github.com/Grasia/semantic-web3-dao</span><svg><path></path></svg>.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100830"},"PeriodicalIF":2.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000167/pdfft?md5=50e4d4b40c93103a13362ae80c817a36&pid=1-s2.0-S1570826824000167-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141411685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1016/j.websem.2024.100824
Jinfa Yang, Xianghua Ying, Yongjie Shi, Ruibin Wang
To find a suitable embedding for a knowledge graph (KG) remains a big challenge nowadays. By measuring the distance or plausibility of triples and quadruples in static and temporal knowledge graphs, many reliable knowledge graph embedding (KGE) models are proposed. However, these classical models may not be able to represent and infer various relation patterns well, such as TransE cannot represent symmetric relations, DistMult cannot represent inverse relations, RotatE cannot represent multiple relations, etc.. In this paper, we improve the ability of these models to represent various relation patterns by introducing the affine transformation framework. Specifically, we first utilize a set of affine transformations related to each relation or timestamp to operate on entity vectors, and then these transformed vectors can be applied not only to static KGE models, but also to temporal KGE models. The main advantage of using affine transformations is their good geometry properties with interpretability. Our experimental results demonstrate that the proposed intuitive design with affine transformations provides a statistically significant increase in performance with adding a few extra processing steps and keeping the same number of embedding parameters. Taking TransE as an example, we employ the scale transformation (the special case of an affine transformation). Surprisingly, it even outperforms RotatE to some extent on various datasets. We also introduce affine transformations into RotatE, Distmult, ComplEx, TTransE and TComplEx respectively, and experiments demonstrate that affine transformations consistently and significantly improve the performance of state-of-the-art KGE models on both static and temporal knowledge graph benchmarks.
{"title":"Improving static and temporal knowledge graph embedding using affine transformations of entities","authors":"Jinfa Yang, Xianghua Ying, Yongjie Shi, Ruibin Wang","doi":"10.1016/j.websem.2024.100824","DOIUrl":"https://doi.org/10.1016/j.websem.2024.100824","url":null,"abstract":"<div><p>To find a suitable embedding for a knowledge graph (KG) remains a big challenge nowadays. By measuring the distance or plausibility of triples and quadruples in static and temporal knowledge graphs, many reliable knowledge graph embedding (KGE) models are proposed. However, these classical models may not be able to represent and infer various relation patterns well, such as TransE cannot represent symmetric relations, DistMult cannot represent inverse relations, RotatE cannot represent multiple relations, <em>etc</em>.. In this paper, we improve the ability of these models to represent various relation patterns by introducing the affine transformation framework. Specifically, we first utilize a set of affine transformations related to each relation or timestamp to operate on entity vectors, and then these transformed vectors can be applied not only to static KGE models, but also to temporal KGE models. The main advantage of using affine transformations is their good geometry properties with interpretability. Our experimental results demonstrate that the proposed intuitive design with affine transformations provides a statistically significant increase in performance with adding a few extra processing steps and keeping the same number of embedding parameters. Taking TransE as an example, we employ the scale transformation (the special case of an affine transformation). Surprisingly, it even outperforms RotatE to some extent on various datasets. We also introduce affine transformations into RotatE, Distmult, ComplEx, TTransE and TComplEx respectively, and experiments demonstrate that affine transformations consistently and significantly improve the performance of state-of-the-art KGE models on both static and temporal knowledge graph benchmarks.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100824"},"PeriodicalIF":2.5,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000106/pdfft?md5=c556da96eab16cdef47d1fff590e4a7d&pid=1-s2.0-S1570826824000106-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-17DOI: 10.1016/j.websem.2024.100822
Fiorela Ciroku , Jacopo de Berardinis , Jongmo Kim , Albert Meroño-Peñuela , Valentina Presutti , Elena Simperl
The process of developing ontologies – a formal, explicit specification of a shared conceptualisation – is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. This reverses the traditional workflow in which knowledge graphs are built from ontologies, which in turn are engineered from competency questions. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the resulting questions to yield a meaningful set of competency questions; the WDV dataset. This approach is implemented utilising the Wikidata knowledge graph as a use case, and contributes a set of core competency questions from 20 domains present in the WDV dataset. To evaluate RevOnt, we contribute a new dataset of manually-annotated high-quality competency questions, and compare the extracted competency questions by calculating their BLEU score against the human references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtering component is above 86%, which is comparable to the state-of-the-art classifications.
{"title":"RevOnt: Reverse engineering of competency questions from knowledge graphs via language models","authors":"Fiorela Ciroku , Jacopo de Berardinis , Jongmo Kim , Albert Meroño-Peñuela , Valentina Presutti , Elena Simperl","doi":"10.1016/j.websem.2024.100822","DOIUrl":"10.1016/j.websem.2024.100822","url":null,"abstract":"<div><p>The process of developing ontologies – a formal, explicit specification of a shared conceptualisation – is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. This reverses the traditional workflow in which knowledge graphs are built from ontologies, which in turn are engineered from competency questions. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the resulting questions to yield a meaningful set of competency questions; the WDV dataset. This approach is implemented utilising the Wikidata knowledge graph as a use case, and contributes a set of core competency questions from 20 domains present in the WDV dataset. To evaluate RevOnt, we contribute a new dataset of manually-annotated high-quality competency questions, and compare the extracted competency questions by calculating their BLEU score against the human references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtering component is above 86%, which is comparable to the state-of-the-art classifications.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"82 ","pages":"Article 100822"},"PeriodicalIF":2.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000088/pdfft?md5=df0ecfc8d3506e224b7b22fbafe38dbf&pid=1-s2.0-S1570826824000088-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141058442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.1016/j.websem.2024.100821
João Gama , Rita P. Ribeiro , Saulo Mastelini , Narjes Davari , Bruno Veloso
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black-box models are popular approaches based on deep-learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black-box model predicts failures. The proposed system solves two problems in parallel: (i) anomaly detection and (ii) explanation of the anomaly. For the first problem, we use an unsupervised state-of-the-art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder’s reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non-linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder’s reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.
预测性维护应用越来越复杂,许多组件之间都会产生相互作用。黑盒模型因其预测准确性而成为基于深度学习技术的流行方法。本文提出了一种神经符号架构,该架构使用在线规则学习算法来解释黑盒模型何时预测故障。该系统同时解决两个问题:(i) 异常检测和 (ii) 异常解释。对于第一个问题,我们使用无监督的最先进自动编码器。对于第二个问题,我们训练一个规则学习系统,学习从输入特征到自动编码器重构误差的映射。两个系统在线并行运行。对于重构误差超过阈值的示例,自动编码器会发出警报信号。人类很难理解信号报警的原因,因为它们是传感器数据非线性组合的结果。触发该示例的规则描述了输入特征与自动编码器重构误差之间的关系。该规则对故障信号进行了解释,指出哪些传感器导致了警报,并允许识别故障涉及的组件。该系统可以对黑盒模型进行全局解释,也可以对黑盒模型预测故障的原因进行局部解释。我们在波尔图地铁的实际案例研究中对所提议的系统进行了评估,并提供了说明其优点的解释。
{"title":"From fault detection to anomaly explanation: A case study on predictive maintenance","authors":"João Gama , Rita P. Ribeiro , Saulo Mastelini , Narjes Davari , Bruno Veloso","doi":"10.1016/j.websem.2024.100821","DOIUrl":"10.1016/j.websem.2024.100821","url":null,"abstract":"<div><p>Predictive Maintenance applications are increasingly complex, with interactions between many components. Black-box models are popular approaches based on deep-learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black-box model predicts failures. The proposed system solves two problems in parallel: (i) anomaly detection and (ii) explanation of the anomaly. For the first problem, we use an unsupervised state-of-the-art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder’s reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non-linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder’s reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"81 ","pages":"Article 100821"},"PeriodicalIF":2.5,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000076/pdfft?md5=5ac7d7b9118cab57dfc9acc7b6e52d40&pid=1-s2.0-S1570826824000076-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141040939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-08DOI: 10.1016/j.websem.2024.100820
Wenjun Liu , Hai Wang , Jieyang Wang , Huan Guo , Yuyan Sun , Mengshu Hou , Bao Yu , Hailan Wang , Qingcheng Peng , Chao Zhang , Cheng Liu
Popular topic detection is a topic identification by the information of documents posted by users in social networking platforms. In a large body of research literature, most popular topic detection methods identify the distribution of unknown topics by integrating information from documents based on social networking platforms. However, among these popular topic detection methods, most of them have a low accuracy in topic detection due to the short text content and the abundance of useless punctuation marks and emoticons. Image information in short texts has also been overlooked, while this information may contain the real topic matter of the user's posted content. In order to solve the above problems and improve the quality of topic detection, this paper proposes a popular topic detection method based on microblog images and short text information. The method uses an image description model to obtain more information about short texts, identifies hot words by a new word discovery algorithm in the preprocessing stage, and uses a PTM model to improve the quality and effectiveness of topic detection during topic detection and aggregation. The experimental results show that the topic detection method in this paper improves the values of evaluation indicators compared with the other three topic detection methods. In conclusion, the popular topic detection method proposed in this paper can improve the performance of topic detection by integrating microblog images and short text information, and outperforms other topic detection methods selected in this paper.
{"title":"A popular topic detection method based on microblog images and short text information","authors":"Wenjun Liu , Hai Wang , Jieyang Wang , Huan Guo , Yuyan Sun , Mengshu Hou , Bao Yu , Hailan Wang , Qingcheng Peng , Chao Zhang , Cheng Liu","doi":"10.1016/j.websem.2024.100820","DOIUrl":"10.1016/j.websem.2024.100820","url":null,"abstract":"<div><p>Popular topic detection is a topic identification by the information of documents posted by users in social networking platforms. In a large body of research literature, most popular topic detection methods identify the distribution of unknown topics by integrating information from documents based on social networking platforms. However, among these popular topic detection methods, most of them have a low accuracy in topic detection due to the short text content and the abundance of useless punctuation marks and emoticons. Image information in short texts has also been overlooked, while this information may contain the real topic matter of the user's posted content. In order to solve the above problems and improve the quality of topic detection, this paper proposes a popular topic detection method based on microblog images and short text information. The method uses an image description model to obtain more information about short texts, identifies hot words by a new word discovery algorithm in the preprocessing stage, and uses a PTM model to improve the quality and effectiveness of topic detection during topic detection and aggregation. The experimental results show that the topic detection method in this paper improves the values of evaluation indicators compared with the other three topic detection methods. In conclusion, the popular topic detection method proposed in this paper can improve the performance of topic detection by integrating microblog images and short text information, and outperforms other topic detection methods selected in this paper.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"81 ","pages":"Article 100820"},"PeriodicalIF":2.5,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000064/pdfft?md5=27a6b3b5059b99e5d02665a7a31e8e9d&pid=1-s2.0-S1570826824000064-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141034966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-27DOI: 10.1016/j.websem.2024.100819
Sayed Hoseini , Johannes Theissen-Lipp , Christoph Quix
In recent years, data lakes emerged as a way to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Such approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontology-based data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies.
{"title":"A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakes","authors":"Sayed Hoseini , Johannes Theissen-Lipp , Christoph Quix","doi":"10.1016/j.websem.2024.100819","DOIUrl":"https://doi.org/10.1016/j.websem.2024.100819","url":null,"abstract":"<div><p>In recent years, data lakes emerged as a way to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Such approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontology-based data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"81 ","pages":"Article 100819"},"PeriodicalIF":2.5,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826824000052/pdfft?md5=ba83860fb725179723385f42b29b9908&pid=1-s2.0-S1570826824000052-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140816991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}