Journal of Web Semantics最新文献

英文中文

SemanticHadith: An ontology-driven knowledge graph for the hadith corpus semantic chadith:一个本体驱动的圣训语料库知识图

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-06-01 DOI: 10.1016/j.websem.2023.100797

Amna Binte Kamran, Bushra Abro, A. Basharat

引用次数: 0

Solving the SPARQL query containment problem with SpeCS 使用spec解决SPARQL查询包含问题

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100770

Mirko Spasić , Milena Vujošević Janičić

The query containment problem is a fundamental computer science problem which was originally defined for relational queries. With the growing popularity of the sparql query language, it became relevant and important in this new context: reliable and efficient sparql query containment solvers may have various applications within static analysis of queries, especially in the area of query optimizations and refactoring. In this paper, we present a new approach for solving the query containment problem in sparql. The approach is based on reducing the query containment problem to the satisfiability problem in first order logic. It covers a wide range of the sparql language constructs, including union of conjunctive queries, blank nodes, projections, subqueries, clauses from, filter, optional, graph, etc. It also covers containment under rdf schema entailment regime, and it can deal with the subsumption relation. We describe an implementation of the approach, an open source solver SpeCS and its thorough experimental evaluation on two relevant benchmarks, Query Containment Benchmark and SQCFramework. As a side result, SpeCS identified incorrect test cases within both benchmarks, which were manually checked, confirmed and fixed, resulting in better and more reliable benchmarks. The evaluation also shows that SpeCS is highly efficient and that compared to the state-of-the-art solvers, it gives more precise results in a shorter amount of time. In addition, SpeCS has the highest coverage of the supported language constructs.

查询包含问题是一个基本的计算机科学问题，最初是为关系查询定义的。随着sparql查询语言的日益流行，它在这个新的环境中变得越来越重要：可靠高效的sparql查询包含求解器可能在查询的静态分析中具有各种应用，特别是在查询优化和重构领域。在本文中，我们提出了一种新的方法来解决sparql中的查询包含问题。该方法基于将查询包含问题简化为一阶逻辑中的可满足性问题。它涵盖了广泛的sparql语言结构，包括联合查询、空白节点、投影、子查询、子句from、filter、optional、graph等。它还涵盖了rdf模式蕴涵机制下的包容，并且可以处理包容关系。我们描述了该方法的实现，一个开源求解器SpeCS，以及它在两个相关的基准测试，查询包含基准测试和SQCFramework上的全面实验评估。副作用是，SpeCS在两个基准测试中都发现了不正确的测试用例，并对其进行了手动检查、确认和修复，从而产生了更好、更可靠的基准测试。评估还表明，SpeCS是高效的，与最先进的求解器相比，它在更短的时间内给出了更精确的结果。此外，SpeCS在支持的语言结构中覆盖率最高。

{"title":"Solving the SPARQL query containment problem with SpeCS","authors":"Mirko Spasić , Milena Vujošević Janičić","doi":"10.1016/j.websem.2022.100770","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100770","url":null,"abstract":"<div>The query containment problem is a fundamental computer science problem which was originally defined for relational queries. With the growing popularity of the sparql query language, it became relevant and important in this new context: reliable and efficient sparql query containment solvers may have various applications within static analysis of queries, especially in the area of query optimizations and refactoring. In this paper, we present a new approach for solving the query containment problem in sparql. The approach is based on reducing the query containment problem to the satisfiability problem in first order logic. It covers a wide range of the sparql language constructs, including union of conjunctive queries, blank nodes, projections, subqueries, clauses from, filter, optional, graph, etc. It also covers containment under rdf schema entailment regime, and it can deal with the subsumption relation. We describe an implementation of the approach, an open source solver SpeCS and its thorough experimental evaluation on two relevant benchmarks, Query Containment Benchmark and SQCFramework. As a side result, SpeCS identified incorrect test cases within both benchmarks, which were manually checked, confirmed and fixed, resulting in better and more reliable benchmarks. The evaluation also shows that SpeCS is highly efficient and that compared to the state-of-the-art solvers, it gives more precise results in a shorter amount of time. In addition, SpeCS has the highest coverage of the supported language constructs.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100770"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits indexx:使用基于sparql的测试套件为RDF知识图建立索引的模型和框架

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100775

Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel

In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.

In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.

近年来，在语言学或生命科学等不同领域，以及DBpedia或Wikidata等通用数据集，已经在Web上构建和发布了大量RDF数据集。联合利用这些数据集需要对它们的内容、访问点和共性有特定的了解。然而，并不是所有的数据集都包含自我描述，也不是所有的访问点都能处理用于生成这种描述的复杂查询。在本文中，我们提供了一种基于标准的方法来生成数据集的描述。生成的描述及其计算过程使用标准词汇和语言表示。我们将我们的方法实现到一个名为IndeGx的框架中，其中每个索引特性及其计算都是在GitHub存储库中以协作和声明的方式定义的。我们在一组339个RDF数据集上进行了indexx实验，这些数据集的端点列在公共目录中，耗时8个月。结果表明，我们可以根据数据集的可用性和容量尽可能多地收集数据集的重要特征。生成的索引捕获了所提供内容和服务的共性、多样性和差异性，它为任何设计用于查询RDF数据集的应用程序提供了重要的支持。

{"title":"IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits","authors":"Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel","doi":"10.1016/j.websem.2023.100775","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100775","url":null,"abstract":"<div>In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100775"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Stream reasoning with DatalogMTL 使用DatalogMTL进行流推理

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100776

Przemysław A. Wałęga, Mark Kaminski, Dingmin Wang, Bernardo Cuenca Grau

We study stream reasoning in $DatalogMTL$ —an extension of Datalog with metric temporal operators. We propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating $DatalogMTL$ programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set. We have implemented our approach as an extension of the $DatalogMTL$ reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.

我们研究了datalogmtl中的流推理，datalogmtl是Datalog的一个扩展，带有度量时间算子。我们提出了一种适用于前向传播DatalogMTL程序的健全和完整的流推理算法，其中排除了向过去时间点传播派生信息的可能性。我们的通用算法中的内存消耗取决于规则集和输入数据流的属性;特别是，它取决于数据中出现的时间戳之间的距离。在某些实际场景中，这可能是不可取的，因为这些距离可能非常小，在这种情况下，算法可能需要大量内存。为了解决这个问题，我们提出了第二种算法，其中所需内存的大小与数据中的时间戳无关，代价是不允许规则集中的准时间隔。我们已经将我们的方法作为DatalogMTL推理器MeTeoR的扩展来实现，并进行了实验测试。所得结果支持了该方法在实践中的可行性。

{"title":"Stream reasoning with DatalogMTL","authors":"Przemysław A. Wałęga, Mark Kaminski, Dingmin Wang, Bernardo Cuenca Grau","doi":"10.1016/j.websem.2023.100776","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100776","url":null,"abstract":"<div>We study stream reasoning in <math><mtext>DatalogMTL</mtext></math>—an extension of Datalog with metric temporal operators. We propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating <math><mtext>DatalogMTL</mtext></math> programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set. We have implemented our approach as an extension of the <math><mtext>DatalogMTL</mtext></math> reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100776"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49876692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Decentralized semantic provision of personal health streams 个人健康流的去中心化语义提供

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100774

Jean-Paul Calbimonte , Orfeas Aidonopoulos , Fabien Dubosson , Benjamin Pocklington , Ilia Kebets , Pierre-Mikael Legris , Michael Schumacher

Personalized healthcare is nowadays driven by the increasing volumes of patient data, observed and produced continuously thanks to medical devices, mobile sensors, patient-reported outcomes, among other data sources. This data is made available as streams, due to their dynamic nature, which represents an important challenge for processing, querying and interpreting the incoming information. In addition, the sensitive nature of healthcare data poses significant restrictions regarding privacy, which has led to the emergence of decentralized personal data management systems. Data semantics play a key role in order to enable both decentralization and integration of personal health data, as they introduce the capability to represent knowledge and information using ontologies and semantic vocabularies. In this paper we describe the SemPryv system, which provides the means to manage personal health data streams enriched with semantic information. SemPryv is designed as a decentralized system, so that users have the possibility of hosting their personal data at different sites, while keeping control of access rights. The semantization of data in SemPryv is implemented through different strategies, ranging from rule-based annotation to machine learning-based suggestions, fed from third-party specialized healthcare metadata providers. The system has been made available as Open Source, and is integrated as part of the Pryv.io platform used and commercialized in the healthcare and personal data management industry.

如今，个性化医疗保健是由越来越多的患者数据驱动的，由于医疗设备、移动传感器、患者报告的结果以及其他数据源，这些数据不断被观察和生成。由于这些数据的动态特性，它们以流的形式提供，这对于处理、查询和解释传入的信息来说是一个重要的挑战。此外，医疗保健数据的敏感性对隐私构成了重大限制，这导致了分散的个人数据管理系统的出现。数据语义在实现个人健康数据的去中心化和集成方面发挥着关键作用，因为它们引入了使用本体和语义词汇表表示知识和信息的能力。在本文中，我们描述了SemPryv系统，它提供了管理富含语义信息的个人健康数据流的方法。SemPryv被设计为一个分散的系统，因此用户可以在不同的站点托管他们的个人数据，同时保持访问权限的控制。SemPryv中的数据语义是通过不同的策略实现的，从基于规则的注释到基于机器学习的建议，这些策略都来自第三方专业医疗保健元数据提供商。该系统已作为开放源代码提供，并作为Pryv的一部分集成。IO平台用于医疗保健和个人数据管理行业并实现商业化。

{"title":"Decentralized semantic provision of personal health streams","authors":"Jean-Paul Calbimonte , Orfeas Aidonopoulos , Fabien Dubosson , Benjamin Pocklington , Ilia Kebets , Pierre-Mikael Legris , Michael Schumacher","doi":"10.1016/j.websem.2023.100774","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100774","url":null,"abstract":"<div>Personalized healthcare is nowadays driven by the increasing volumes of patient data, observed and produced continuously thanks to medical devices, mobile sensors, patient-reported outcomes, among other data sources. This data is made available as streams, due to their dynamic nature, which represents an important challenge for processing, querying and interpreting the incoming information. In addition, the sensitive nature of healthcare data poses significant restrictions regarding privacy, which has led to the emergence of decentralized personal data management systems. Data semantics play a key role in order to enable both decentralization and integration of personal health data, as they introduce the capability to represent knowledge and information using ontologies and semantic vocabularies. In this paper we describe the SemPryv system, which provides the means to manage personal health data streams enriched with semantic information. SemPryv is designed as a decentralized system, so that users have the possibility of hosting their personal data at different sites, while keeping control of access rights. The semantization of data in SemPryv is implemented through different strategies, ranging from rule-based annotation to machine learning-based suggestions, fed from third-party specialized healthcare metadata providers. The system has been made available as Open Source, and is integrated as part of the Pryv.io platform used and commercialized in the healthcare and personal data management industry.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100774"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100773

Antonio De Nicola , Anna Formica , Michele Missikoff , Elaheh Pourabbas , Francesco Taglino

We present the parametric method SemSim^p aimed at measuring semantic similarity of digital resources. SemSim^p is based on the notion of information content, and it leverages a reference ontology and taxonomic reasoning, encompassing different approaches for weighting the concepts of the ontology. In particular, weights can be computed by considering either the available digital resources or the structure of the reference ontology of a given domain. SemSim^p is assessed against six representative semantic similarity methods for comparing sets of concepts proposed in the literature, by carrying out an experimentation that includes both a statistical analysis and an expert judgment evaluation. To the purpose of achieving a reliable assessment, we used a real-world large dataset based on the Digital Library of the Association for Computing Machinery (ACM), and a reference ontology derived from the ACM Computing Classification System (ACM-CCS). For each method, we considered two indicators. The first concerns the degree of confidence to identify the similarity among the papers belonging to some special issues selected from the ACM Transactions on Information Systems journal, the second the Pearson correlation with human judgment. The results reveal that one of the configurations of SemSim^p outperforms the other assessed methods. An additional experiment performed in the domain of physics shows that, in general, SemSim^p provides better results than the other similarity methods.

针对数字资源的语义相似度度量问题，提出了参数化方法SemSimp。SemSimp基于信息内容的概念，它利用了参考本体和分类推理，包含了对本体概念进行加权的不同方法。特别是，可以通过考虑给定领域的可用数字资源或参考本体的结构来计算权重。通过进行包括统计分析和专家判断评估的实验，对六种具有代表性的语义相似度方法进行评估，以比较文献中提出的概念集。为了实现可靠的评估，我们使用了基于计算机协会(ACM)数字图书馆的真实大型数据集，以及来自ACM计算分类系统(ACM- ccs)的参考本体。对于每种方法，我们考虑了两个指标。第一个问题是确定从ACM信息系统学报中选择的一些特殊问题的论文之间的相似性的置信度，第二个问题是与人类判断的Pearson相关性。结果表明，SemSimp的一种配置优于其他评估方法。在物理领域进行的另一个实验表明，SemSimp通常比其他相似方法提供更好的结果。

{"title":"A parametric similarity method: Comparative experiments based on semantically annotated large datasets","authors":"Antonio De Nicola , Anna Formica , Michele Missikoff , Elaheh Pourabbas , Francesco Taglino","doi":"10.1016/j.websem.2023.100773","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100773","url":null,"abstract":"<div>We present the parametric method SemSimp aimed at measuring semantic similarity of digital resources. SemSimp is based on the notion of information content, and it leverages a reference ontology and taxonomic reasoning, encompassing different approaches for weighting the concepts of the ontology. In particular, weights can be computed by considering either the available digital resources or the structure of the reference ontology of a given domain. SemSimp is assessed against six representative semantic similarity methods for comparing sets of concepts proposed in the literature, by carrying out an experimentation that includes both a statistical analysis and an expert judgment evaluation. To the purpose of achieving a reliable assessment, we used a real-world large dataset based on the Digital Library of the Association for Computing Machinery (ACM), and a reference ontology derived from the ACM Computing Classification System (ACM-CCS). For each method, we considered two indicators. The first concerns the degree of confidence to identify the similarity among the papers belonging to some special issues selected from the ACM Transactions on Information Systems journal, the second the Pearson correlation with human judgment. The results reveal that one of the configurations of SemSimp outperforms the other assessed methods. An additional experiment performed in the domain of physics shows that, in general, SemSimp provides better results than the other similarity methods.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100773"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Answering Count Questions with Structured Answers from Text 回答计数问题与结构化的答案从文本

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100769

Shrestha Ghosh , Simon Razniewski , Gerhard Weikum

In this work we address the challenging case of answering count queries in web search, such as number of songs by John Lennon. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike previous systems, our method infers final answers from multiple observations, supports semantic qualifiers for the counts, and provides evidence by enumerating representative instances. Experiments with a wide variety of queries, including existing benchmark show the benefits of our method, and the influence of specific parameter settings. Our code, data and an interactive system demonstration are publicly available at https://github.com/ghoshs/CoQEx and https://nlcounqer.mpi-inf.mpg.de/.

在这项工作中，我们解决了在网络搜索中回答计数查询的挑战性案例，例如约翰·列侬的歌曲数量。以前的方法只是用一个数字来回答这些问题，有时甚至令人费解，或者返回一个不同数字的文本片段的排序列表。本文提出了一种利用推理、情境化和解释性证据回答计数查询的方法。与以前的系统不同，我们的方法从多个观察中推断出最终答案，支持计数的语义限定符，并通过枚举具有代表性的实例来提供证据。对各种查询的实验，包括现有的基准测试，显示了我们的方法的好处，以及特定参数设置的影响。我们的代码、数据和交互式系统演示可在https://github.com/ghoshs/CoQEx和https://nlcounqer.mpi-inf.mpg.de/.

引用次数: 4

Building a Knowledge Graph for the History of Vienna with Semantic MediaWiki 利用语义媒体维基构建维也纳历史知识图谱

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100771

Bernhard Krabina

While research on semantic wikis is declining, Semantic MediaWiki (SMW) can still play an important role in the emerging field of knowledge graph curation.

The Vienna History Wiki, a large knowledge base curated by the city government in collaboration with other institutions and the general public, provides an ideal use case for demonstrating strengths and weaknesses of SMW as well as discussing the challenges of co-curation in a cultural heritage setting. This paper describes processes like collaborative editing, interlinking unique identifiers on the web, sharing data with Wikidata, making use of Schema.org, and other ontologies. It presents insights from a user survey, access statistics, and a knowledge graph analysis.

This work contributes to the scarce research in wiki usage outside of the Wikipedia ecosystem as well as to the field of community-based knowledge graph curation. The availability of a now significantly improved RDF representation indicates future directions for research and practice.

虽然对语义维基的研究正在减少，但语义媒体wiki (semantic MediaWiki, SMW)仍然可以在新兴的知识图谱管理领域发挥重要作用。维也纳历史维基是一个由市政府与其他机构和公众合作管理的大型知识库，它提供了一个理想的用例，展示了维也纳历史博物馆的优势和劣势，并讨论了在文化遗产环境中共同管理的挑战。本文描述了协同编辑、在网络上连接唯一标识符、与维基数据共享数据、利用Schema.org和其他本体等过程。它提供了来自用户调查、访问统计和知识图分析的见解。这项工作有助于维基百科生态系统之外的维基使用方面的稀缺研究，以及基于社区的知识图谱管理领域。现在显著改进的RDF表示的可用性表明了研究和实践的未来方向。

引用次数: 5

From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods 从表格数据到知识图谱:语义表解释任务和方法综述

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100761

Jixiong Liu , Yoan Chabot , Raphaël Troncy , Viet-Phi Huynh , Thomas Labbé , Pierre Monnin

Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.

表格数据通常指在具有行和列的表中组织的数据。我们观察到，这种数据格式在Web和企业数据存储库中被广泛使用。表可能包含仍需解释的丰富语义信息。从表格数据中提取关于语义人工制品（如本体或知识图）的有意义信息的过程通常被称为语义表解释（STI）或语义表注释。在这份调查文件中，我们旨在对迄今为止提出的执行STI的不同任务和方法进行全面和最新的最新审查。首先，我们提出了一种新的分类，它反映了可能遇到的表类型的异构性，揭示了需要解决的不同挑战。接下来，我们定义了STI处理的五个主要子任务，即使到目前为止文献主要集中在三个子任务上。我们将已经提出的许多方法分为三个宏观家族进行审查和分组，并讨论它们相对于社区提出的各种数据集和基准的性能和局限性。最后，我们详细介绍了能够真正自动解释野生网络中任何类型的表的剩余科学障碍。

{"title":"From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods","authors":"Jixiong Liu , Yoan Chabot , Raphaël Troncy , Viet-Phi Huynh , Thomas Labbé , Pierre Monnin","doi":"10.1016/j.websem.2022.100761","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100761","url":null,"abstract":"<div>Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.</div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100761"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Explainable argumentation as a service 可解释的论证服务

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics

Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100772

Nikolaos I. Spanoudakis , Georgios Gligoris , Adamos Koumi , Antonis C. Kakas

Gorgias Cloud offers an integrated application development environment that facilitates the development of argumentation-based systems over the internet. Argumentation is offered as a service in a way that this allows application systems to remotely access the argumentation service and utilize the results of the argumentative computation. Moreover, the service results include the explanation of the decision in both human and machine-readable formats. The first is useful for allowing the application validation to be done by experts, while the second is useful for development. It appears that this is the first case where argumentation is offered to developers in such an open and distributed way.

Gorgias Cloud提供了一个集成的应用程序开发环境，促进了基于论证的系统在互联网上的开发。论证作为一种服务提供，允许应用程序系统远程访问论证服务并利用论证计算的结果。此外，服务结果以人类和机器可读的格式包括对决策的解释。第一个对于让专家完成应用程序验证是有用的，而第二个对于开发是有用的。这似乎是第一次以如此开放和分布式的方式向开发人员提供论证。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Web Semantics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀