首页 > 最新文献

Semantic Web最新文献

英文 中文
Wikidata subsetting: Approaches, tools, and evaluation 维基数据子集:方法、工具和评估
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-27 DOI: 10.3233/sw-233491
Seyed Amir Hosseini Beghaeiraveri, J. E. Labra Gayo, A. Waagmeester, Ammar Ammar, Carolina Gonzalez, D. Slenter, Sabah Ul-Hasan, E. Willighagen, Fiona McNeill, A. Gray
Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hosting 100 GB of data. While Wikidata provides a public SPARQL endpoint, it can only be used for short-running queries. Often, researchers only require a limited range of data from Wikidata focusing on a particular topic for their use case. Subsetting is the process of defining and extracting the required data range from the KG; this process has received increasing attention in recent years. Specific tools and several approaches have been developed for subsetting, which have not been evaluated yet. In this paper, we survey the available subsetting approaches, introducing their general strengths and weaknesses, and evaluate four practical tools specific for Wikidata subsetting – WDSub, KGTK, WDumper, and WDF – in terms of execution performance, extraction accuracy, and flexibility in defining the subsets. Results show that all four tools have a minimum of 99.96% accuracy in extracting defined items and 99.25% in extracting statements. The fastest tool in extraction is WDF, while the most flexible tool is WDSub. During the experiments, multiple subset use cases have been defined and the extracted subsets have been analyzed, obtaining valuable information about the variety and quality of Wikidata, which would otherwise not be possible through the public Wikidata SPARQL endpoint.
Wikidata 是一个庞大的知识图谱 (KG),包括 1 亿多个数据项和近 15 亿条语句,涵盖地理、历史、学术文章和生命科学数据等广泛主题。Wikidata 的海量数据很难用于研究目的;许多研究人员无力承担托管 100 GB 数据的费用。虽然 Wikidata 提供了一个公共 SPARQL 端点,但它只能用于短期查询。通常情况下,研究人员只需要从 Wikidata 中获取有限范围的数据,重点关注其使用案例中的特定主题。子集化是从维基数据中定义和提取所需数据范围的过程;近年来,这一过程受到越来越多的关注。针对子集开发的特定工具和几种方法尚未得到评估。在本文中,我们调查了现有的子集方法,介绍了它们的一般优缺点,并从执行性能、提取准确性和定义子集的灵活性方面评估了四种专门用于维基数据子集的实用工具--WDSub、KGTK、WDumper 和 WDF。结果表明,所有四种工具提取定义项的准确率最低为 99.96%,提取语句的准确率最低为 99.25%。提取速度最快的工具是 WDF,而最灵活的工具是 WDSub。在实验过程中,定义了多个子集使用案例,并对提取的子集进行了分析,从而获得了有关维基数据种类和质量的宝贵信息,而这些信息是无法通过公共维基数据 SPARQL 端点获得的。
{"title":"Wikidata subsetting: Approaches, tools, and evaluation","authors":"Seyed Amir Hosseini Beghaeiraveri, J. E. Labra Gayo, A. Waagmeester, Ammar Ammar, Carolina Gonzalez, D. Slenter, Sabah Ul-Hasan, E. Willighagen, Fiona McNeill, A. Gray","doi":"10.3233/sw-233491","DOIUrl":"https://doi.org/10.3233/sw-233491","url":null,"abstract":"Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hosting 100 GB of data. While Wikidata provides a public SPARQL endpoint, it can only be used for short-running queries. Often, researchers only require a limited range of data from Wikidata focusing on a particular topic for their use case. Subsetting is the process of defining and extracting the required data range from the KG; this process has received increasing attention in recent years. Specific tools and several approaches have been developed for subsetting, which have not been evaluated yet. In this paper, we survey the available subsetting approaches, introducing their general strengths and weaknesses, and evaluate four practical tools specific for Wikidata subsetting – WDSub, KGTK, WDumper, and WDF – in terms of execution performance, extraction accuracy, and flexibility in defining the subsets. Results show that all four tools have a minimum of 99.96% accuracy in extracting defined items and 99.25% in extracting statements. The fastest tool in extraction is WDF, while the most flexible tool is WDSub. During the experiments, multiple subset use cases have been defined and the extracted subsets have been analyzed, obtaining valuable information about the variety and quality of Wikidata, which would otherwise not be possible through the public Wikidata SPARQL endpoint.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"71 s1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139153989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An ontology of 3D environment where a simulated manipulation task takes place (ENVON) 模拟操作任务所在的三维环境本体(ENVON)
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-18 DOI: 10.3233/sw-233460
Yingshen Zhao, Arkopaul Sarkar, Linda Elmhadhbi, Mohamed-Hedi Karray, P. Fillatreau, B. Archimède
Thanks to the advent of robotics in shopfloor and warehouse environments, control rooms need to seamlessly exchange information regarding the dynamically changing 3D environment to facilitate tasks and path planning for the robots. Adding to the complexity, this type of environment is heterogeneous as it includes both free space and various types of rigid bodies (equipment, materials, humans etc.). At the same time, 3D environment-related information is also required by the virtual applications (e.g., VR techniques) for the behavioral study of CAD-based product models or simulation of CNC operations. In past research, information models for such heterogeneous 3D environments are often built without ensuring connection among different levels of abstractions required for different applications. For addressing such multiple points of view and modelling requirements for 3D objects and environments, this paper proposes an ontology model that integrates the contextual, topologic, and geometric information of both the rigid bodies and the free space. The ontology provides an evolvable knowledge model that can support simulated task-related information in general. This ontology aims to greatly improve interoperability as a path planning system (e.g., robot) and will be able to deal with different applications by simply updating the contextual semantics related to some targeted application while keeping the geometric and topological models intact by leveraging the semantic link among the models.
由于机器人技术在车间和仓库环境中的应用,控制室需要无缝交换有关动态变化的 3D 环境的信息,以方便机器人执行任务和规划路径。由于这类环境既包括自由空间,也包括各种类型的刚体(设备、材料、人体等),因此更加复杂。同时,虚拟应用(如 VR 技术)在对基于 CAD 的产品模型进行行为研究或对数控操作进行仿真时,也需要与 3D 环境相关的信息。在过去的研究中,为这种异构三维环境建立的信息模型通常无法确保不同应用所需的不同抽象层次之间的联系。为满足三维物体和环境的多视角和建模要求,本文提出了一种本体模型,它集成了刚体和自由空间的上下文、拓扑和几何信息。本体提供了一个可演化的知识模型,可支持模拟任务相关的一般信息。本体旨在大大提高路径规划系统(如机器人)的互操作性,只需更新与某些目标应用相关的上下文语义,就能处理不同的应用,同时利用模型之间的语义联系,保持几何和拓扑模型的完整性。
{"title":"An ontology of 3D environment where a simulated manipulation task takes place (ENVON)","authors":"Yingshen Zhao, Arkopaul Sarkar, Linda Elmhadhbi, Mohamed-Hedi Karray, P. Fillatreau, B. Archimède","doi":"10.3233/sw-233460","DOIUrl":"https://doi.org/10.3233/sw-233460","url":null,"abstract":"Thanks to the advent of robotics in shopfloor and warehouse environments, control rooms need to seamlessly exchange information regarding the dynamically changing 3D environment to facilitate tasks and path planning for the robots. Adding to the complexity, this type of environment is heterogeneous as it includes both free space and various types of rigid bodies (equipment, materials, humans etc.). At the same time, 3D environment-related information is also required by the virtual applications (e.g., VR techniques) for the behavioral study of CAD-based product models or simulation of CNC operations. In past research, information models for such heterogeneous 3D environments are often built without ensuring connection among different levels of abstractions required for different applications. For addressing such multiple points of view and modelling requirements for 3D objects and environments, this paper proposes an ontology model that integrates the contextual, topologic, and geometric information of both the rigid bodies and the free space. The ontology provides an evolvable knowledge model that can support simulated task-related information in general. This ontology aims to greatly improve interoperability as a path planning system (e.g., robot) and will be able to deal with different applications by simply updating the contextual semantics related to some targeted application while keeping the geometric and topological models intact by leveraging the semantic link among the models.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"308 ","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139173378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuSyRE: Neuro-symbolic visual understanding and reasoning framework based on scene graph enrichment NeuSyRE:基于场景图丰富化的神经符号视觉理解与推理框架
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-13 DOI: 10.3233/sw-233510
M. J. Khan, John G. Breslin, Edward Curry
Exploring the potential of neuro-symbolic hybrid approaches offers promising avenues for seamless high-level understanding and reasoning about visual scenes. Scene Graph Generation (SGG) is a symbolic image representation approach based on deep neural networks (DNN) that involves predicting objects, their attributes, and pairwise visual relationships in images to create scene graphs, which are utilized in downstream visual reasoning. The crowdsourced training datasets used in SGG are highly imbalanced, which results in biased SGG results. The vast number of possible triplets makes it challenging to collect sufficient training samples for every visual concept or relationship. To address these challenges, we propose augmenting the typical data-driven SGG approach with common sense knowledge to enhance the expressiveness and autonomy of visual understanding and reasoning. We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning. A comprehensive evaluation is performed on multiple standard datasets, including Visual Genome and Microsoft COCO, in which the proposed approach outperformed the state-of-the-art SGG methods in terms of relationship recall scores, i.e. Recall@K and mean Recall@K, as well as the state-of-the-art scene graph-based image captioning methods in terms of SPICE and CIDEr scores with comparable BLEU, ROGUE and METEOR scores. As a result of enrichment, the qualitative results showed improved expressiveness of scene graphs, resulting in more intuitive and meaningful caption generation using scene graphs. Our results validate the effectiveness of enriching scene graphs with common sense knowledge using heterogeneous knowledge graphs. This work provides a baseline for future research in knowledge-enhanced visual understanding and reasoning. The source code is available at https://github.com/jaleedkhan/neusire.
探索神经符号混合方法的潜力为无缝理解和推理视觉场景提供了一条大有可为的途径。场景图生成(SGG)是一种基于深度神经网络(DNN)的符号图像表示方法,涉及预测图像中的对象、属性和成对视觉关系,以创建场景图,并将其用于下游视觉推理。SGG 中使用的众包训练数据集高度不平衡,导致 SGG 结果存在偏差。由于可能的三元组数量庞大,要为每个视觉概念或关系收集足够的训练样本具有挑战性。为了应对这些挑战,我们建议用常识知识来增强典型的数据驱动 SGG 方法,以提高视觉理解和推理的表现力和自主性。我们提出了一种松散耦合的神经符号视觉理解和推理框架,该框架采用基于 DNN 的管道进行对象检测和多模态配对关系预测,以生成场景图,并利用异源知识图中的常识知识来丰富场景图,从而改进下游推理。在包括 Visual Genome 和 Microsoft COCO 在内的多个标准数据集上进行了综合评估,结果表明所提出的方法在关系召回分数(即 Recall@K 和平均 Recall@K)方面优于最先进的 SGG 方法,在 SPICE 和 CIDEr 分数方面优于最先进的基于场景图的图像标题制作方法,在 BLEU、ROGUE 和 METEOR 分数方面具有可比性。经过丰富后,定性结果显示场景图的表达能力得到了提高,从而使使用场景图生成的标题更直观、更有意义。我们的结果验证了使用异构知识图谱用常识性知识丰富场景图谱的有效性。这项工作为知识增强型视觉理解和推理的未来研究提供了基础。源代码见 https://github.com/jaleedkhan/neusire。
{"title":"NeuSyRE: Neuro-symbolic visual understanding and reasoning framework based on scene graph enrichment","authors":"M. J. Khan, John G. Breslin, Edward Curry","doi":"10.3233/sw-233510","DOIUrl":"https://doi.org/10.3233/sw-233510","url":null,"abstract":"Exploring the potential of neuro-symbolic hybrid approaches offers promising avenues for seamless high-level understanding and reasoning about visual scenes. Scene Graph Generation (SGG) is a symbolic image representation approach based on deep neural networks (DNN) that involves predicting objects, their attributes, and pairwise visual relationships in images to create scene graphs, which are utilized in downstream visual reasoning. The crowdsourced training datasets used in SGG are highly imbalanced, which results in biased SGG results. The vast number of possible triplets makes it challenging to collect sufficient training samples for every visual concept or relationship. To address these challenges, we propose augmenting the typical data-driven SGG approach with common sense knowledge to enhance the expressiveness and autonomy of visual understanding and reasoning. We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning. A comprehensive evaluation is performed on multiple standard datasets, including Visual Genome and Microsoft COCO, in which the proposed approach outperformed the state-of-the-art SGG methods in terms of relationship recall scores, i.e. Recall@K and mean Recall@K, as well as the state-of-the-art scene graph-based image captioning methods in terms of SPICE and CIDEr scores with comparable BLEU, ROGUE and METEOR scores. As a result of enrichment, the qualitative results showed improved expressiveness of scene graphs, resulting in more intuitive and meaningful caption generation using scene graphs. Our results validate the effectiveness of enriching scene graphs with common sense knowledge using heterogeneous knowledge graphs. This work provides a baseline for future research in knowledge-enhanced visual understanding and reasoning. The source code is available at https://github.com/jaleedkhan/neusire.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"68 11","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139004636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using semantic story maps to describe a territory beyond its map 使用语义故事地图描述地图之外的领土
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-13 DOI: 10.3233/sw-233485
Valentina Bartalesi, Gianpaolo Coro, Emanuele Lenzi, Nicolò Pratelli, Pasquale Pagano, Francesco Felici, Michele Moretti, Gianluca Brunori

Abstract

The paper presents the Story Map Building and Visualizing Tool (SMBVT) that allows users to create story maps within a collaborative environment and a usable Web interface. It is entirely open-source and published as a free-to-use solution. It uses Semantic Web technologies in the back-end system to represent stories through a reference ontology for representing narratives. It builds up a user-shared semantic knowledge base that automatically interconnects all stories and seamlessly enables collaborative story building. Finally, it operates within an Open-Science oriented e-Infrastructure, which enables data and information sharing within communities of narrators, and adds multi-tenancy, multi-user, security, and access-control facilities. SMBVT represents narratives as a network of spatiotemporal events related by semantic relations and standardizes the event descriptions by assigning internationalized resource identifiers (IRIs) to the event components, i.e., the entities that take part in the event (e.g., persons, objects, places, concepts). The tool automatically saves the collected knowledge as a Web Ontology Language (OWL) graph and openly publishes it as Linked Open Data. This feature allows connecting the story events to other knowledge bases. To evaluate and demonstrate our tool, we used it to describe the Apuan Alps territory in Tuscany (Italy). Based on a user-test evaluation, we assessed the tool’s effectiveness at building story maps and the ability of the produced story to describe the territory beyond the map.

本文介绍了故事地图构建和可视化工具(SMBVT),它允许用户在协作环境和可用的网络界面中创建故事地图。该工具完全开源,并作为免费使用的解决方案发布。它在后端系统中使用语义网技术,通过用于表示叙事的参考本体来表示故事。它建立了一个用户共享的语义知识库,可自动将所有故事相互连接起来,实现无缝协作式故事构建。最后,该系统在面向开放科学的电子基础设施内运行,实现了叙述者社区内的数据和信息共享,并增加了多租户、多用户、安全和访问控制设施。SMBVT 将叙事表示为通过语义关系相关联的时空事件网络,并通过为事件组件,即参与事件的实体(如人、物、地点、概念)分配国际化资源标识符(IRI)来实现事件描述的标准化。该工具会自动将收集到的知识保存为网络本体语言(OWL)图,并将其作为关联开放数据公开发布。这一功能可以将故事事件与其他知识库连接起来。为了评估和展示我们的工具,我们用它来描述托斯卡纳(意大利)的阿普安阿尔卑斯山地区。在用户测试评估的基础上,我们评估了该工具在构建故事地图方面的有效性,以及所生成的故事在地图之外描述该地区的能力。
{"title":"Using semantic story maps to describe a territory beyond its map","authors":"Valentina Bartalesi, Gianpaolo Coro, Emanuele Lenzi, Nicolò Pratelli, Pasquale Pagano, Francesco Felici, Michele Moretti, Gianluca Brunori","doi":"10.3233/sw-233485","DOIUrl":"https://doi.org/10.3233/sw-233485","url":null,"abstract":"<h4><span>Abstract</span></h4><p>The paper presents the Story Map Building and Visualizing Tool (SMBVT) that allows users to create story maps within a collaborative environment and a usable Web interface. It is entirely open-source and published as a free-to-use solution. It uses Semantic Web technologies in the back-end system to represent stories through a reference ontology for representing narratives. It builds up a user-shared semantic knowledge base that automatically interconnects all stories and seamlessly enables collaborative story building. Finally, it operates within an Open-Science oriented e-Infrastructure, which enables data and information sharing within communities of narrators, and adds multi-tenancy, multi-user, security, and access-control facilities. SMBVT represents narratives as a network of spatiotemporal events related by semantic relations and standardizes the event descriptions by assigning internationalized resource identifiers (IRIs) to the event <i>components</i>, i.e., the entities that take part in the event (e.g., persons, objects, places, concepts). The tool automatically saves the collected knowledge as a Web Ontology Language (OWL) graph and openly publishes it as Linked Open Data. This feature allows connecting the story events to other knowledge bases. To evaluate and demonstrate our tool, we used it to describe the Apuan Alps territory in Tuscany (Italy). Based on a user-test evaluation, we assessed the tool’s effectiveness at building story maps and the ability of the produced story to describe the territory beyond the map.</p>","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"33 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138689467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sem@ K: Is my knowledge graph embedding model semantic-aware? Sem@ K:我的知识图谱嵌入模型是语义感知的吗?
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-13 DOI: 10.3233/sw-233508
Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo

Abstract

Using knowledge graph embedding models (KGEMs) is a popular approach for predicting links in knowledge graphs (KGs). Traditionally, the performance of KGEMs for link prediction is assessed using rank-based metrics, which evaluate their ability to give high scores to ground-truth entities. However, the literature claims that the KGEM evaluation procedure would benefit from adding supplementary dimensions to assess. That is why, in this paper, we extend our previously introduced metric Sem@K that measures the capability of models to predict valid entities w.r.t. domain and range constraints. In particular, we consider a broad range of KGs and take their respective characteristics into account to propose different versions of Sem@K. We also perform an extensive study to qualify the abilities of KGEMs as measured by our metric. Our experiments show that Sem@K provides a new perspective on KGEM quality. Its joint analysis with rank-based metrics offers different conclusions on the predictive power of models. Regarding Sem@K, some KGEMs are inherently better than others, but this semantic superiority is not indicative of their performance w.r.t. rank-based metrics. In this work, we generalize conclusions about the relative performance of KGEMs w.r.t. rank-based and semantic-oriented metrics at the level of families of models. The joint analysis of the aforementioned metrics gives more insight into the peculiarities of each model. This work paves the way for a more comprehensive evaluation of KGEM adequacy for specific downstream tasks.

摘要使用知识图嵌入模型(KGEM)预测知识图(KG)中的链接是一种流行的方法。传统上,KGEM 在链接预测方面的性能是通过基于等级的指标来评估的,这些指标评估的是 KGEM 给地面实况实体打高分的能力。然而,有文献称,KGEM 评估程序将受益于增加补充评估维度。因此,在本文中,我们扩展了之前引入的指标 Sem@K,该指标用于衡量模型在领域和范围限制下预测有效实体的能力。特别是,我们考虑了范围广泛的 KG,并将它们各自的特点考虑在内,提出了不同版本的 Sem@K。我们还开展了一项广泛的研究,通过我们的度量标准来鉴定 KGEM 的能力。我们的实验表明,Sem@K为KGEM质量提供了一个新的视角。它与基于等级的度量标准的联合分析为模型的预测能力提供了不同的结论。就 Sem@K 而言,有些 KGEM 本身就比其他 KGEM 好,但这种语义上的优势并不能说明它们在基于等级的指标方面的表现。在这项工作中,我们在模型族的层面上归纳了 KGEM 在与基于等级的指标和面向语义的指标比较时的相对性能结论。通过对上述指标的联合分析,我们可以更深入地了解每个模型的特殊性。这项工作为更全面地评估 KGEM 对特定下游任务的适当性铺平了道路。
{"title":"Sem@ K: Is my knowledge graph embedding model semantic-aware?","authors":"Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo","doi":"10.3233/sw-233508","DOIUrl":"https://doi.org/10.3233/sw-233508","url":null,"abstract":"<h4><span>Abstract</span></h4><p>Using knowledge graph embedding models (KGEMs) is a popular approach for predicting links in knowledge graphs (KGs). Traditionally, the performance of KGEMs for link prediction is assessed using rank-based metrics, which evaluate their ability to give high scores to ground-truth entities. However, the literature claims that the KGEM evaluation procedure would benefit from adding supplementary dimensions to assess. That is why, in this paper, we extend our previously introduced metric Sem@<i>K</i> that measures the capability of models to predict valid entities w.r.t. domain and range constraints. In particular, we consider a broad range of KGs and take their respective characteristics into account to propose different versions of Sem@<i>K</i>. We also perform an extensive study to qualify the abilities of KGEMs as measured by our metric. Our experiments show that Sem@<i>K</i> provides a new perspective on KGEM quality. Its joint analysis with rank-based metrics offers different conclusions on the predictive power of models. Regarding Sem@<i>K</i>, some KGEMs are inherently better than others, but this semantic superiority is not indicative of their performance w.r.t. rank-based metrics. In this work, we generalize conclusions about the relative performance of KGEMs w.r.t. rank-based and semantic-oriented metrics at the level of families of models. The joint analysis of the aforementioned metrics gives more insight into the peculiarities of each model. This work paves the way for a more comprehensive evaluation of KGEM adequacy for specific downstream tasks.</p>","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"27 15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138689409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential privacy and SPARQL 差异隐私和 SPARQL
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-11 DOI: 10.3233/sw-233474
C. Buil-Aranda, Jorge Lobo, Federico Olmedo
Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer statistical queries with quantifiable accuracy and privacy guarantees. The notions of differential privacy are defined independently of the data model and the query language at steak. Most differential privacy results have been obtained on aggregation queries such as counting or finding maximum or average values, and on grouping queries over aggregations such as the creation of histograms. So far, the data model used by the framework research has typically been the relational model and the query language SQL. However, effective realizations of differential privacy for SQL queries that required joins had been limited. This has imposed severe restrictions on applying differential privacy in RDF knowledge graphs and SPARQL queries. By the simple nature of RDF data, most useful queries accessing RDF graphs will require intensive use of joins. Recently, new differential privacy techniques have been developed that can be applied to many types of joins in SQL with reasonable results. This opened the question of whether these new results carry over to RDF and SPARQL. In this paper we provide a positive answer to this question by presenting an algorithm that can answer counting queries over a large class of SPARQL queries that guarantees differential privacy, if the RDF graph is accompanied with semantic information about its structure. We have implemented our algorithm and conducted several experiments, showing the feasibility of our approach for large graph databases. Our aim has been to present an approach that can be used as a stepping stone towards extensions and other realizations of differential privacy for SPARQL and RDF.
差分隐私是一个框架,它提供了正式的工具,用于开发访问数据库和回答统计查询的算法,并具有可量化的准确性和隐私保证。差分隐私概念的定义与数据模型和查询语言无关。大多数差分隐私结果是在聚合查询(如计数或查找最大值或平均值)和聚合分组查询(如创建直方图)中获得的。迄今为止,框架研究使用的数据模型通常是关系模型和查询语言 SQL。然而,对于需要连接的 SQL 查询,有效实现差异隐私的方法非常有限。这严重限制了在 RDF 知识图谱和 SPARQL 查询中应用差分隐私。由于 RDF 数据的简单性质,访问 RDF 图的大多数有用查询都需要大量使用连接。最近,新的差分隐私技术已经开发出来,可以应用于 SQL 中的多种类型连接,并取得了合理的结果。这就提出了一个问题:这些新成果是否可以应用到 RDF 和 SPARQL 中。在本文中,我们提出了一种算法,该算法可以回答一大类 SPARQL 查询中的计数查询,如果 RDF 图附带有关其结构的语义信息,则该算法可以保证差分隐私。我们已经实现了我们的算法并进行了多次实验,证明了我们的方法在大型图数据库中的可行性。我们的目标是提出一种方法,将其作为实现 SPARQL 和 RDF 差异隐私的扩展和其他实现的垫脚石。
{"title":"Differential privacy and SPARQL","authors":"C. Buil-Aranda, Jorge Lobo, Federico Olmedo","doi":"10.3233/sw-233474","DOIUrl":"https://doi.org/10.3233/sw-233474","url":null,"abstract":"Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer statistical queries with quantifiable accuracy and privacy guarantees. The notions of differential privacy are defined independently of the data model and the query language at steak. Most differential privacy results have been obtained on aggregation queries such as counting or finding maximum or average values, and on grouping queries over aggregations such as the creation of histograms. So far, the data model used by the framework research has typically been the relational model and the query language SQL. However, effective realizations of differential privacy for SQL queries that required joins had been limited. This has imposed severe restrictions on applying differential privacy in RDF knowledge graphs and SPARQL queries. By the simple nature of RDF data, most useful queries accessing RDF graphs will require intensive use of joins. Recently, new differential privacy techniques have been developed that can be applied to many types of joins in SQL with reasonable results. This opened the question of whether these new results carry over to RDF and SPARQL. In this paper we provide a positive answer to this question by presenting an algorithm that can answer counting queries over a large class of SPARQL queries that guarantees differential privacy, if the RDF graph is accompanied with semantic information about its structure. We have implemented our algorithm and conducted several experiments, showing the feasibility of our approach for large graph databases. Our aim has been to present an approach that can be used as a stepping stone towards extensions and other realizations of differential privacy for SPARQL and RDF.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"8 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138980226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
QALD-10 – The 10th challenge on question answering over linked data QALD-10 - 第 10 届关联数据问题解答挑战赛
IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-28 DOI: 10.3233/sw-233471
Ricardo Usbeck, Xiongliang Yan, A. Perevalov, Longquan Jiang, Julius Schulz, Angelie Kraft, Cedric Möller, Junbo Huang, Jan Reineke, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, Andreas Both
Knowledge Graph Question Answering (KGQA) has gained attention from both industry and academia over the past decade. Researchers proposed a substantial amount of benchmarking datasets with different properties, pushing the development in this field forward. Many of these benchmarks depend on Freebase, DBpedia, or Wikidata. However, KGQA benchmarks that depend on Freebase and DBpedia are gradually less studied and used, because Freebase is defunct and DBpedia lacks the structural validity of Wikidata. Therefore, research is gravitating toward Wikidata-based benchmarks. That is, new KGQA benchmarks are created on the basis of Wikidata and existing ones are migrated. We present a new, multilingual, complex KGQA benchmarking dataset as the 10th part of the Question Answering over Linked Data (QALD) benchmark series. This corpus formerly depended on DBpedia. Since QALD serves as a base for many machine-generated benchmarks, we increased the size and adjusted the benchmark to Wikidata and its ranking mechanism of properties. These measures foster novel KGQA developments by more demanding benchmarks. Creating a benchmark from scratch or migrating it from DBpedia to Wikidata is non-trivial due to the complexity of the Wikidata knowledge graph, mapping issues between different languages, and the ranking mechanism of properties using qualifiers. We present our creation strategy and the challenges we faced that will assist other researchers in their future work. Our case study, in the form of a conference challenge, is accompanied by an in-depth analysis of the created benchmark.
过去十年来,知识图谱问题解答(KGQA)受到了业界和学术界的关注。研究人员提出了大量具有不同属性的基准数据集,推动了这一领域的发展。其中许多基准都依赖于 Freebase、DBpedia 或 Wikidata。然而,依赖 Freebase 和 DBpedia 的 KGQA 基准的研究和使用逐渐减少,因为 Freebase 已经失效,而 DBpedia 缺乏 Wikidata 的结构有效性。因此,研究工作开始转向基于维基数据的基准。也就是说,在维基数据的基础上创建新的 KGQA 基准,并迁移现有基准。作为关联数据问答(QALD)基准系列的第10部分,我们提出了一个新的、多语种、复杂的KGQA基准数据集。该语料库以前依赖于 DBpedia。由于QALD是许多机器生成基准的基础,我们增加了其规模,并根据维基数据及其属性排序机制调整了基准。这些措施通过要求更高的基准促进了 KGQA 的新发展。由于维基数据知识图谱的复杂性、不同语言之间的映射问题以及使用限定词的属性排序机制,从零开始创建基准或将其从DBpedia迁移到维基数据并非易事。我们将介绍我们的创建策略和面临的挑战,这将有助于其他研究人员今后的工作。我们以会议挑战的形式进行了案例研究,并对创建的基准进行了深入分析。
{"title":"QALD-10 – The 10th challenge on question answering over linked data","authors":"Ricardo Usbeck, Xiongliang Yan, A. Perevalov, Longquan Jiang, Julius Schulz, Angelie Kraft, Cedric Möller, Junbo Huang, Jan Reineke, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, Andreas Both","doi":"10.3233/sw-233471","DOIUrl":"https://doi.org/10.3233/sw-233471","url":null,"abstract":"Knowledge Graph Question Answering (KGQA) has gained attention from both industry and academia over the past decade. Researchers proposed a substantial amount of benchmarking datasets with different properties, pushing the development in this field forward. Many of these benchmarks depend on Freebase, DBpedia, or Wikidata. However, KGQA benchmarks that depend on Freebase and DBpedia are gradually less studied and used, because Freebase is defunct and DBpedia lacks the structural validity of Wikidata. Therefore, research is gravitating toward Wikidata-based benchmarks. That is, new KGQA benchmarks are created on the basis of Wikidata and existing ones are migrated. We present a new, multilingual, complex KGQA benchmarking dataset as the 10th part of the Question Answering over Linked Data (QALD) benchmark series. This corpus formerly depended on DBpedia. Since QALD serves as a base for many machine-generated benchmarks, we increased the size and adjusted the benchmark to Wikidata and its ranking mechanism of properties. These measures foster novel KGQA developments by more demanding benchmarks. Creating a benchmark from scratch or migrating it from DBpedia to Wikidata is non-trivial due to the complexity of the Wikidata knowledge graph, mapping issues between different languages, and the ranking mechanism of properties using qualifiers. We present our creation strategy and the challenges we faced that will assist other researchers in their future work. Our case study, in the form of a conference challenge, is accompanied by an in-depth analysis of the created benchmark.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"44 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139219641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semantic framework for condition monitoring in Industry 4.0 based on evolving knowledge bases 基于不断发展的知识库的工业4.0状态监测语义框架
3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-05 DOI: 10.3233/sw-233481
Franco Giustozzi, Julien Saunier, Cecilia Zanni-Merk
In Industry 4.0, factory assets and machines are equipped with sensors that collect data for effective condition monitoring. This is a difficult task since it requires the integration and processing of heterogeneous data from different sources, with different temporal resolutions and underlying meanings. Ontologies have emerged as a pertinent method to deal with data integration and to represent manufacturing knowledge in a machine-interpretable way through the construction of semantic models. Ontologies are used to structure knowledge in knowledge bases, which also contain instances and information about these data. Thus, a knowledge base provides a sort of virtual representation of the different elements involved in a manufacturing process. Moreover, the monitoring of industrial processes depends on the dynamic context of their execution. Under these circumstances, the semantic model must provide a way to represent this evolution in order to represent in which situation(s) a resource is in during the execution of its tasks to support decision making. This paper proposes a semantic framework to address the evolution of knowledge bases for condition monitoring in Industry 4.0. To this end, firstly we propose a semantic model (the COInd4 ontology) for the manufacturing domain that represents the resources and processes that are part of a factory, with special emphasis on the context of these resources and processes. Relevant situations that combine sensor observations with domain knowledge are also represented in the model. Secondly, an approach that uses stream reasoning to detect these situations that lead to potential failures is introduced. This approach enriches data collected from sensors with contextual information using the proposed semantic model. The use of stream reasoning facilitates the integration of data from different data sources, different temporal resolutions as well as the processing of these data in real time. This allows to derive high-level situations from lower-level context and sensor information. Detecting situations can trigger actions to adapt the process behavior, and in turn, this change in behavior can lead to the generation of new contexts leading to new situations. These situations can have different levels of severity, and can be nested in different ways. Dealing with the rich relations among situations requires an efficient approach to organize them. Therefore, we propose a method to build a lattice, ordering those situations depending on the constraints they rely on. This lattice represents a road-map of all the situations that can be reached from a given one, normal or abnormal. This helps in decision support, by allowing the identification of the actions that can be taken to correct the abnormality avoiding in this way the interruption of the manufacturing processes. Finally, an industrial application scenario for the proposed approach is described.
在工业4.0中,工厂资产和机器配备了传感器,可以收集数据以进行有效的状态监测。这是一项困难的任务,因为它需要集成和处理来自不同来源的异构数据,这些数据具有不同的时间分辨率和潜在含义。本体论已经成为处理数据集成和通过构建语义模型以机器可解释的方式表示制造知识的相关方法。本体用于构建知识库中的知识,知识库还包含有关这些数据的实例和信息。因此,知识库提供了制造过程中涉及的不同元素的一种虚拟表示。此外,工业过程的监测取决于其执行的动态环境。在这些情况下,语义模型必须提供一种表示这种演变的方法,以便表示资源在执行其支持决策的任务期间所处的情况。本文提出了一个语义框架来解决工业4.0中状态监测知识库的演变问题。为此,我们首先为制造领域提出了一个语义模型(COInd4本体),该模型表示作为工厂一部分的资源和过程,特别强调这些资源和过程的上下文。将传感器观测与领域知识相结合的相关情况也在模型中表示。其次,介绍了一种使用流推理来检测导致潜在故障的情况的方法。该方法使用所提出的语义模型丰富了从传感器收集的具有上下文信息的数据。流推理的使用有助于不同数据源、不同时间分辨率的数据的集成以及对这些数据的实时处理。这允许从低级上下文和传感器信息派生高级情况。检测情况可以触发调整流程行为的操作,反过来,这种行为的更改可以导致生成导致新情况的新上下文。这些情况可以具有不同的严重程度,并且可以以不同的方式嵌套。处理各种情况之间丰富的关系需要一种有效的方法来组织它们。因此,我们提出了一种建立格的方法,根据它们所依赖的约束对这些情况进行排序。这个点阵表示了从给定的一个点阵可以到达的所有情况(正常或异常)的路线图。这有助于决策支持,通过允许识别可以采取的行动来纠正异常,避免以这种方式中断生产过程。最后,描述了该方法的工业应用场景。
{"title":"A semantic framework for condition monitoring in Industry 4.0 based on evolving knowledge bases","authors":"Franco Giustozzi, Julien Saunier, Cecilia Zanni-Merk","doi":"10.3233/sw-233481","DOIUrl":"https://doi.org/10.3233/sw-233481","url":null,"abstract":"In Industry 4.0, factory assets and machines are equipped with sensors that collect data for effective condition monitoring. This is a difficult task since it requires the integration and processing of heterogeneous data from different sources, with different temporal resolutions and underlying meanings. Ontologies have emerged as a pertinent method to deal with data integration and to represent manufacturing knowledge in a machine-interpretable way through the construction of semantic models. Ontologies are used to structure knowledge in knowledge bases, which also contain instances and information about these data. Thus, a knowledge base provides a sort of virtual representation of the different elements involved in a manufacturing process. Moreover, the monitoring of industrial processes depends on the dynamic context of their execution. Under these circumstances, the semantic model must provide a way to represent this evolution in order to represent in which situation(s) a resource is in during the execution of its tasks to support decision making. This paper proposes a semantic framework to address the evolution of knowledge bases for condition monitoring in Industry 4.0. To this end, firstly we propose a semantic model (the COInd4 ontology) for the manufacturing domain that represents the resources and processes that are part of a factory, with special emphasis on the context of these resources and processes. Relevant situations that combine sensor observations with domain knowledge are also represented in the model. Secondly, an approach that uses stream reasoning to detect these situations that lead to potential failures is introduced. This approach enriches data collected from sensors with contextual information using the proposed semantic model. The use of stream reasoning facilitates the integration of data from different data sources, different temporal resolutions as well as the processing of these data in real time. This allows to derive high-level situations from lower-level context and sensor information. Detecting situations can trigger actions to adapt the process behavior, and in turn, this change in behavior can lead to the generation of new contexts leading to new situations. These situations can have different levels of severity, and can be nested in different ways. Dealing with the rich relations among situations requires an efficient approach to organize them. Therefore, we propose a method to build a lattice, ordering those situations depending on the constraints they rely on. This lattice represents a road-map of all the situations that can be reached from a given one, normal or abnormal. This helps in decision support, by allowing the identification of the actions that can be taken to correct the abnormality avoiding in this way the interruption of the manufacturing processes. Finally, an industrial application scenario for the proposed approach is described.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134947158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
INK: Knowledge graph representation for efficient and performant rule mining INK:用于高效和高性能规则挖掘的知识图表示
3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-02 DOI: 10.3233/sw-233495
Bram Steenwinckel, Filip De Turck, Femke Ongenae
Semantic rule mining can be used for both deriving task-agnostic or task-specific information within a Knowledge Graph (KG). Underlying logical inferences to summarise the KG or fully interpretable binary classifiers predicting future events are common results of such a rule mining process. The current methods to perform task-agnostic or task-specific semantic rule mining operate, however, a completely different KG representation, making them less suitable to perform both tasks or incorporate each other’s optimizations. This also results in the need to master multiple techniques for both exploring and mining rules within KGs, as well losing time and resources when converting one KG format into another. In this paper, we use INK, a KG representation based on neighbourhood nodes of interest to mine rules for improved decision support. By selecting one or two sets of nodes of interest, the rule miner created on top of the INK representation will either mine task-agnostic or task-specific rules. In both subfields, the INK miner is competitive to the currently state-of-the-art semantic rule miners on 14 different benchmark datasets within multiple domains.
语义规则挖掘可用于在知识图(KG)中派生与任务无关或特定于任务的信息。总结KG或预测未来事件的完全可解释二元分类器的底层逻辑推理是这种规则挖掘过程的常见结果。然而,当前用于执行任务不可知或特定于任务的语义规则挖掘的方法使用了完全不同的KG表示,这使得它们不太适合执行两个任务或合并彼此的优化。这也导致需要掌握在KG中探索和挖掘规则的多种技术,并且在将一种KG格式转换为另一种格式时浪费时间和资源。在本文中,我们使用基于邻居感兴趣节点的KG表示INK来挖掘规则以改进决策支持。通过选择一组或两组感兴趣的节点,在INK表示上创建的规则挖掘器将挖掘与任务无关或特定于任务的规则。在这两个子领域中,INK挖掘器在多个领域的14个不同基准数据集上与当前最先进的语义规则挖掘器竞争。
{"title":"INK: Knowledge graph representation for efficient and performant rule mining","authors":"Bram Steenwinckel, Filip De Turck, Femke Ongenae","doi":"10.3233/sw-233495","DOIUrl":"https://doi.org/10.3233/sw-233495","url":null,"abstract":"Semantic rule mining can be used for both deriving task-agnostic or task-specific information within a Knowledge Graph (KG). Underlying logical inferences to summarise the KG or fully interpretable binary classifiers predicting future events are common results of such a rule mining process. The current methods to perform task-agnostic or task-specific semantic rule mining operate, however, a completely different KG representation, making them less suitable to perform both tasks or incorporate each other’s optimizations. This also results in the need to master multiple techniques for both exploring and mining rules within KGs, as well losing time and resources when converting one KG format into another. In this paper, we use INK, a KG representation based on neighbourhood nodes of interest to mine rules for improved decision support. By selecting one or two sets of nodes of interest, the rule miner created on top of the INK representation will either mine task-agnostic or task-specific rules. In both subfields, the INK miner is competitive to the currently state-of-the-art semantic rule miners on 14 different benchmark datasets within multiple domains.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135898174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
LinkedDataOps:quality oriented end-to-end geospatial linked data production governance LinkedDataOps:面向质量的端到端地理空间关联数据生产治理
3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-15 DOI: 10.3233/sw-233293
Beyza Yaman, Kevin Thompson, Fergus Fahey, Rob Brennan
This work describes the application of semantic web standards to data quality governance of data production pipelines in the architectural, engineering, and construction (AEC) domain for Ordnance Survey Ireland (OSi). It illustrates a new approach to data quality governance based on establishing a unified knowledge graph for data quality measurements across a complex, heterogeneous, quality-centric data production pipeline. It provides the first comprehensive formal mappings between semantic models of data quality dimensions defined by the four International Organization for Standardization (ISO) and World Wide Web Consortium (W3C) data quality standards applied by different tools and stakeholders. It provides an approach to uplift rule-based data quality reports into quality metrics suitable for aggregation and end-to-end analysis. Current industrial practice tends towards stove-piped, vendor-specific and domain-dependent tools to process data quality observations however there is a lack of open techniques and methodologies for combining quality measurements derived from different data quality standards to provide end-to-end data quality reporting, root cause analysis or visualisation. This work demonstrated that it is effective to use a knowledge graph and semantic web standards to unify distributed data quality monitoring in an organisation and present the results in an end-to-end data dashboard in a data quality standards-agnostic fashion for the Ordnance Survey Ireland data publishing pipeline.
这项工作描述了语义网标准在爱尔兰测绘局(OSi)架构、工程和施工(AEC)领域的数据生产管道数据质量治理中的应用。它说明了一种数据质量治理的新方法,该方法基于为跨复杂、异构、以质量为中心的数据生产管道的数据质量度量建立统一的知识图谱。它提供了由四个国际标准化组织(ISO)和万维网联盟(W3C)数据质量标准定义的数据质量维度的语义模型之间的第一个全面的形式化映射,这些标准由不同的工具和利益相关者应用。它提供了一种方法,可以将基于规则的数据质量报告提升为适合聚合和端到端分析的质量度量。目前的工业实践倾向于使用炉管式、特定于供应商和领域的工具来处理数据质量观察,然而,缺乏将来自不同数据质量标准的质量测量相结合的开放技术和方法,以提供端到端的数据质量报告、根本原因分析或可视化。这项工作表明,使用知识图谱和语义网标准来统一组织中的分布式数据质量监控,并以数据质量标准不可知的方式在端到端数据仪表板中呈现结果,对于爱尔兰地形测量局数据发布管道是有效的。
{"title":"LinkedDataOps:quality oriented end-to-end geospatial linked data production governance","authors":"Beyza Yaman, Kevin Thompson, Fergus Fahey, Rob Brennan","doi":"10.3233/sw-233293","DOIUrl":"https://doi.org/10.3233/sw-233293","url":null,"abstract":"This work describes the application of semantic web standards to data quality governance of data production pipelines in the architectural, engineering, and construction (AEC) domain for Ordnance Survey Ireland (OSi). It illustrates a new approach to data quality governance based on establishing a unified knowledge graph for data quality measurements across a complex, heterogeneous, quality-centric data production pipeline. It provides the first comprehensive formal mappings between semantic models of data quality dimensions defined by the four International Organization for Standardization (ISO) and World Wide Web Consortium (W3C) data quality standards applied by different tools and stakeholders. It provides an approach to uplift rule-based data quality reports into quality metrics suitable for aggregation and end-to-end analysis. Current industrial practice tends towards stove-piped, vendor-specific and domain-dependent tools to process data quality observations however there is a lack of open techniques and methodologies for combining quality measurements derived from different data quality standards to provide end-to-end data quality reporting, root cause analysis or visualisation. This work demonstrated that it is effective to use a knowledge graph and semantic web standards to unify distributed data quality monitoring in an organisation and present the results in an end-to-end data dashboard in a data quality standards-agnostic fashion for the Ordnance Survey Ireland data publishing pipeline.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135395686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Semantic Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1