首页 > 最新文献

Journal of Web Semantics最新文献

英文 中文
IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits indexx:使用基于sparql的测试套件为RDF知识图建立索引的模型和框架
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100775
Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel

In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.

In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.

近年来,在语言学或生命科学等不同领域,以及DBpedia或Wikidata等通用数据集,已经在Web上构建和发布了大量RDF数据集。联合利用这些数据集需要对它们的内容、访问点和共性有特定的了解。然而,并不是所有的数据集都包含自我描述,也不是所有的访问点都能处理用于生成这种描述的复杂查询。在本文中,我们提供了一种基于标准的方法来生成数据集的描述。生成的描述及其计算过程使用标准词汇和语言表示。我们将我们的方法实现到一个名为IndeGx的框架中,其中每个索引特性及其计算都是在GitHub存储库中以协作和声明的方式定义的。我们在一组339个RDF数据集上进行了indexx实验,这些数据集的端点列在公共目录中,耗时8个月。结果表明,我们可以根据数据集的可用性和容量尽可能多地收集数据集的重要特征。生成的索引捕获了所提供内容和服务的共性、多样性和差异性,它为任何设计用于查询RDF数据集的应用程序提供了重要的支持。
{"title":"IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits","authors":"Pierre Maillot,&nbsp;Olivier Corby,&nbsp;Catherine Faron,&nbsp;Fabien Gandon,&nbsp;Franck Michel","doi":"10.1016/j.websem.2023.100775","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100775","url":null,"abstract":"<div><p>In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.</p><p>In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Stream reasoning with DatalogMTL 使用DatalogMTL进行流推理
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100776
Przemysław A. Wałęga, Mark Kaminski, Dingmin Wang, Bernardo Cuenca Grau

We study stream reasoning in DatalogMTL—an extension of Datalog with metric temporal operators. We propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating DatalogMTL programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set. We have implemented our approach as an extension of the DatalogMTL reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.

我们研究了datalogmtl中的流推理,datalogmtl是Datalog的一个扩展,带有度量时间算子。我们提出了一种适用于前向传播DatalogMTL程序的健全和完整的流推理算法,其中排除了向过去时间点传播派生信息的可能性。我们的通用算法中的内存消耗取决于规则集和输入数据流的属性;特别是,它取决于数据中出现的时间戳之间的距离。在某些实际场景中,这可能是不可取的,因为这些距离可能非常小,在这种情况下,算法可能需要大量内存。为了解决这个问题,我们提出了第二种算法,其中所需内存的大小与数据中的时间戳无关,代价是不允许规则集中的准时间隔。我们已经将我们的方法作为DatalogMTL推理器MeTeoR的扩展来实现,并进行了实验测试。所得结果支持了该方法在实践中的可行性。
{"title":"Stream reasoning with DatalogMTL","authors":"Przemysław A. Wałęga,&nbsp;Mark Kaminski,&nbsp;Dingmin Wang,&nbsp;Bernardo Cuenca Grau","doi":"10.1016/j.websem.2023.100776","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100776","url":null,"abstract":"<div><p>We study stream reasoning in <span><math><mtext>DatalogMTL</mtext></math></span>—an extension of Datalog with metric temporal operators. We propose a sound and complete stream reasoning algorithm that is applicable to forward-propagating <span><math><mtext>DatalogMTL</mtext></math></span> programs, in which propagation of derived information towards past time points is precluded. Memory consumption in our generic algorithm depends both on the properties of the rule set and the input data stream; in particular, it depends on the distances between timestamps occurring in data. This may be undesirable in certain practical scenarios since these distances can be very small, in which case the algorithm may require large amounts of memory. To address this issue, we propose a second algorithm, where the size of the required memory becomes independent on the timestamps in the data at the expense of disallowing punctual intervals in the rule set. We have implemented our approach as an extension of the <span><math><mtext>DatalogMTL</mtext></math></span> reasoner MeTeoR and tested it experimentally. The obtained results support the feasibility of our approach in practice.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49876692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Decentralized semantic provision of personal health streams 个人健康流的去中心化语义提供
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100774
Jean-Paul Calbimonte , Orfeas Aidonopoulos , Fabien Dubosson , Benjamin Pocklington , Ilia Kebets , Pierre-Mikael Legris , Michael Schumacher

Personalized healthcare is nowadays driven by the increasing volumes of patient data, observed and produced continuously thanks to medical devices, mobile sensors, patient-reported outcomes, among other data sources. This data is made available as streams, due to their dynamic nature, which represents an important challenge for processing, querying and interpreting the incoming information. In addition, the sensitive nature of healthcare data poses significant restrictions regarding privacy, which has led to the emergence of decentralized personal data management systems. Data semantics play a key role in order to enable both decentralization and integration of personal health data, as they introduce the capability to represent knowledge and information using ontologies and semantic vocabularies. In this paper we describe the SemPryv system, which provides the means to manage personal health data streams enriched with semantic information. SemPryv is designed as a decentralized system, so that users have the possibility of hosting their personal data at different sites, while keeping control of access rights. The semantization of data in SemPryv is implemented through different strategies, ranging from rule-based annotation to machine learning-based suggestions, fed from third-party specialized healthcare metadata providers. The system has been made available as Open Source, and is integrated as part of the Pryv.io platform used and commercialized in the healthcare and personal data management industry.

如今,个性化医疗保健是由越来越多的患者数据驱动的,由于医疗设备、移动传感器、患者报告的结果以及其他数据源,这些数据不断被观察和生成。由于这些数据的动态特性,它们以流的形式提供,这对于处理、查询和解释传入的信息来说是一个重要的挑战。此外,医疗保健数据的敏感性对隐私构成了重大限制,这导致了分散的个人数据管理系统的出现。数据语义在实现个人健康数据的去中心化和集成方面发挥着关键作用,因为它们引入了使用本体和语义词汇表表示知识和信息的能力。在本文中,我们描述了SemPryv系统,它提供了管理富含语义信息的个人健康数据流的方法。SemPryv被设计为一个分散的系统,因此用户可以在不同的站点托管他们的个人数据,同时保持访问权限的控制。SemPryv中的数据语义是通过不同的策略实现的,从基于规则的注释到基于机器学习的建议,这些策略都来自第三方专业医疗保健元数据提供商。该系统已作为开放源代码提供,并作为Pryv的一部分集成。IO平台用于医疗保健和个人数据管理行业并实现商业化。
{"title":"Decentralized semantic provision of personal health streams","authors":"Jean-Paul Calbimonte ,&nbsp;Orfeas Aidonopoulos ,&nbsp;Fabien Dubosson ,&nbsp;Benjamin Pocklington ,&nbsp;Ilia Kebets ,&nbsp;Pierre-Mikael Legris ,&nbsp;Michael Schumacher","doi":"10.1016/j.websem.2023.100774","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100774","url":null,"abstract":"<div><p>Personalized healthcare is nowadays driven by the increasing volumes of patient data, observed and produced continuously thanks to medical devices, mobile sensors, patient-reported outcomes, among other data sources. This data is made available as streams, due to their dynamic nature, which represents an important challenge for processing, querying and interpreting the incoming information. In addition, the sensitive nature of healthcare data poses significant restrictions regarding privacy, which has led to the emergence of decentralized personal data management systems. Data semantics play a key role in order to enable both decentralization and integration of personal health data, as they introduce the capability to represent knowledge and information using ontologies and semantic vocabularies. In this paper we describe the SemPryv system, which provides the means to manage personal health data streams enriched with semantic information. SemPryv is designed as a decentralized system, so that users have the possibility of hosting their personal data at different sites, while keeping control of access rights. The semantization of data in SemPryv is implemented through different strategies, ranging from rule-based annotation to machine learning-based suggestions, fed from third-party specialized healthcare metadata providers. The system has been made available as Open Source, and is integrated as part of the Pryv.io platform used and commercialized in the healthcare and personal data management industry.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A parametric similarity method: Comparative experiments based on semantically annotated large datasets 一种参数相似度方法:基于语义标注大数据集的对比实验
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100773
Antonio De Nicola , Anna Formica , Michele Missikoff , Elaheh Pourabbas , Francesco Taglino

We present the parametric method SemSimp aimed at measuring semantic similarity of digital resources. SemSimp is based on the notion of information content, and it leverages a reference ontology and taxonomic reasoning, encompassing different approaches for weighting the concepts of the ontology. In particular, weights can be computed by considering either the available digital resources or the structure of the reference ontology of a given domain. SemSimp is assessed against six representative semantic similarity methods for comparing sets of concepts proposed in the literature, by carrying out an experimentation that includes both a statistical analysis and an expert judgment evaluation. To the purpose of achieving a reliable assessment, we used a real-world large dataset based on the Digital Library of the Association for Computing Machinery (ACM), and a reference ontology derived from the ACM Computing Classification System (ACM-CCS). For each method, we considered two indicators. The first concerns the degree of confidence to identify the similarity among the papers belonging to some special issues selected from the ACM Transactions on Information Systems journal, the second the Pearson correlation with human judgment. The results reveal that one of the configurations of SemSimp outperforms the other assessed methods. An additional experiment performed in the domain of physics shows that, in general, SemSimp provides better results than the other similarity methods.

针对数字资源的语义相似度度量问题,提出了参数化方法SemSimp。SemSimp基于信息内容的概念,它利用了参考本体和分类推理,包含了对本体概念进行加权的不同方法。特别是,可以通过考虑给定领域的可用数字资源或参考本体的结构来计算权重。通过进行包括统计分析和专家判断评估的实验,对六种具有代表性的语义相似度方法进行评估,以比较文献中提出的概念集。为了实现可靠的评估,我们使用了基于计算机协会(ACM)数字图书馆的真实大型数据集,以及来自ACM计算分类系统(ACM- ccs)的参考本体。对于每种方法,我们考虑了两个指标。第一个问题是确定从ACM信息系统学报中选择的一些特殊问题的论文之间的相似性的置信度,第二个问题是与人类判断的Pearson相关性。结果表明,SemSimp的一种配置优于其他评估方法。在物理领域进行的另一个实验表明,SemSimp通常比其他相似方法提供更好的结果。
{"title":"A parametric similarity method: Comparative experiments based on semantically annotated large datasets","authors":"Antonio De Nicola ,&nbsp;Anna Formica ,&nbsp;Michele Missikoff ,&nbsp;Elaheh Pourabbas ,&nbsp;Francesco Taglino","doi":"10.1016/j.websem.2023.100773","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100773","url":null,"abstract":"<div><p>We present the parametric method <em>SemSim<sup>p</sup></em><span> aimed at measuring semantic similarity of digital resources. </span><em>SemSim<sup>p</sup></em> is based on the notion of <em>information content</em>, and it leverages a reference ontology and taxonomic reasoning, encompassing different approaches for weighting the concepts of the ontology. In particular, weights can be computed by considering either the available digital resources or the structure of the reference ontology of a given domain. <em>SemSim<sup>p</sup></em><span> is assessed against six representative semantic similarity methods for comparing sets of concepts proposed in the literature, by carrying out an experimentation that includes both a statistical analysis and an expert judgment evaluation. To the purpose of achieving a reliable assessment, we used a real-world large dataset based on the Digital Library of the Association for Computing Machinery<span> (ACM), and a reference ontology derived from the ACM Computing Classification System (ACM-CCS). For each method, we considered two indicators. The first concerns the degree of confidence to identify the similarity among the papers belonging to some special issues selected from the ACM Transactions on Information Systems journal, the second the Pearson correlation with human judgment. The results reveal that one of the configurations of </span></span><em>SemSim<sup>p</sup></em> outperforms the other assessed methods. An additional experiment performed in the domain of physics shows that, in general, <em>SemSim<sup>p</sup></em> provides better results than the other similarity methods.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Answering Count Questions with Structured Answers from Text 回答计数问题与结构化的答案从文本
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100769
Shrestha Ghosh , Simon Razniewski , Gerhard Weikum

In this work we address the challenging case of answering count queries in web search, such as number of songs by John Lennon. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike previous systems, our method infers final answers from multiple observations, supports semantic qualifiers for the counts, and provides evidence by enumerating representative instances. Experiments with a wide variety of queries, including existing benchmark show the benefits of our method, and the influence of specific parameter settings. Our code, data and an interactive system demonstration are publicly available at https://github.com/ghoshs/CoQEx and https://nlcounqer.mpi-inf.mpg.de/.

在这项工作中,我们解决了在网络搜索中回答计数查询的挑战性案例,例如约翰·列侬的歌曲数量。以前的方法只是用一个数字来回答这些问题,有时甚至令人费解,或者返回一个不同数字的文本片段的排序列表。本文提出了一种利用推理、情境化和解释性证据回答计数查询的方法。与以前的系统不同,我们的方法从多个观察中推断出最终答案,支持计数的语义限定符,并通过枚举具有代表性的实例来提供证据。对各种查询的实验,包括现有的基准测试,显示了我们的方法的好处,以及特定参数设置的影响。我们的代码、数据和交互式系统演示可在https://github.com/ghoshs/CoQEx和https://nlcounqer.mpi-inf.mpg.de/.
{"title":"Answering Count Questions with Structured Answers from Text","authors":"Shrestha Ghosh ,&nbsp;Simon Razniewski ,&nbsp;Gerhard Weikum","doi":"10.1016/j.websem.2022.100769","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100769","url":null,"abstract":"<div><p><span>In this work we address the challenging case of answering count queries in web search, such as </span><em>number of songs by John Lennon</em><span>. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike previous systems, our method infers final answers from multiple observations, supports semantic qualifiers for the counts, and provides evidence by enumerating representative instances. Experiments with a wide variety of queries, including existing benchmark show the benefits of our method, and the influence of specific parameter settings. Our code, data and an interactive system demonstration are publicly available at </span><span>https://github.com/ghoshs/CoQEx</span><svg><path></path></svg> and <span>https://nlcounqer.mpi-inf.mpg.de/</span><svg><path></path></svg>.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Building a Knowledge Graph for the History of Vienna with Semantic MediaWiki 利用语义媒体维基构建维也纳历史知识图谱
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100771
Bernhard Krabina

While research on semantic wikis is declining, Semantic MediaWiki (SMW) can still play an important role in the emerging field of knowledge graph curation.

The Vienna History Wiki, a large knowledge base curated by the city government in collaboration with other institutions and the general public, provides an ideal use case for demonstrating strengths and weaknesses of SMW as well as discussing the challenges of co-curation in a cultural heritage setting. This paper describes processes like collaborative editing, interlinking unique identifiers on the web, sharing data with Wikidata, making use of Schema.org, and other ontologies. It presents insights from a user survey, access statistics, and a knowledge graph analysis.

This work contributes to the scarce research in wiki usage outside of the Wikipedia ecosystem as well as to the field of community-based knowledge graph curation. The availability of a now significantly improved RDF representation indicates future directions for research and practice.

虽然对语义维基的研究正在减少,但语义媒体wiki (semantic MediaWiki, SMW)仍然可以在新兴的知识图谱管理领域发挥重要作用。维也纳历史维基是一个由市政府与其他机构和公众合作管理的大型知识库,它提供了一个理想的用例,展示了维也纳历史博物馆的优势和劣势,并讨论了在文化遗产环境中共同管理的挑战。本文描述了协同编辑、在网络上连接唯一标识符、与维基数据共享数据、利用Schema.org和其他本体等过程。它提供了来自用户调查、访问统计和知识图分析的见解。这项工作有助于维基百科生态系统之外的维基使用方面的稀缺研究,以及基于社区的知识图谱管理领域。现在显著改进的RDF表示的可用性表明了研究和实践的未来方向。
{"title":"Building a Knowledge Graph for the History of Vienna with Semantic MediaWiki","authors":"Bernhard Krabina","doi":"10.1016/j.websem.2022.100771","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100771","url":null,"abstract":"<div><p>While research on semantic wikis is declining, Semantic MediaWiki (SMW) can still play an important role in the emerging field of knowledge graph curation.</p><p>The Vienna History Wiki, a large knowledge base curated by the city government in collaboration with other institutions and the general public, provides an ideal use case for demonstrating strengths and weaknesses of SMW as well as discussing the challenges of co-curation in a cultural heritage setting. This paper describes processes like collaborative editing, interlinking unique identifiers on the web, sharing data with Wikidata, making use of Schema.org, and other ontologies. It presents insights from a user survey, access statistics, and a knowledge graph analysis.</p><p>This work contributes to the scarce research in wiki usage outside of the Wikipedia ecosystem as well as to the field of community-based knowledge graph curation. The availability of a now significantly improved RDF representation indicates future directions for research and practice.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods 从表格数据到知识图谱:语义表解释任务和方法综述
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2022.100761
Jixiong Liu , Yoan Chabot , Raphaël Troncy , Viet-Phi Huynh , Thomas Labbé , Pierre Monnin

Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.

表格数据通常指在具有行和列的表中组织的数据。我们观察到,这种数据格式在Web和企业数据存储库中被广泛使用。表可能包含仍需解释的丰富语义信息。从表格数据中提取关于语义人工制品(如本体或知识图)的有意义信息的过程通常被称为语义表解释(STI)或语义表注释。在这份调查文件中,我们旨在对迄今为止提出的执行STI的不同任务和方法进行全面和最新的最新审查。首先,我们提出了一种新的分类,它反映了可能遇到的表类型的异构性,揭示了需要解决的不同挑战。接下来,我们定义了STI处理的五个主要子任务,即使到目前为止文献主要集中在三个子任务上。我们将已经提出的许多方法分为三个宏观家族进行审查和分组,并讨论它们相对于社区提出的各种数据集和基准的性能和局限性。最后,我们详细介绍了能够真正自动解释野生网络中任何类型的表的剩余科学障碍。
{"title":"From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods","authors":"Jixiong Liu ,&nbsp;Yoan Chabot ,&nbsp;Raphaël Troncy ,&nbsp;Viet-Phi Huynh ,&nbsp;Thomas Labbé ,&nbsp;Pierre Monnin","doi":"10.1016/j.websem.2022.100761","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100761","url":null,"abstract":"<div><p>Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format<span> is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Explainable argumentation as a service 可解释的论证服务
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-04-01 DOI: 10.1016/j.websem.2023.100772
Nikolaos I. Spanoudakis , Georgios Gligoris , Adamos Koumi , Antonis C. Kakas

Gorgias Cloud offers an integrated application development environment that facilitates the development of argumentation-based systems over the internet. Argumentation is offered as a service in a way that this allows application systems to remotely access the argumentation service and utilize the results of the argumentative computation. Moreover, the service results include the explanation of the decision in both human and machine-readable formats. The first is useful for allowing the application validation to be done by experts, while the second is useful for development. It appears that this is the first case where argumentation is offered to developers in such an open and distributed way.

Gorgias Cloud提供了一个集成的应用程序开发环境,促进了基于论证的系统在互联网上的开发。论证作为一种服务提供,允许应用程序系统远程访问论证服务并利用论证计算的结果。此外,服务结果以人类和机器可读的格式包括对决策的解释。第一个对于让专家完成应用程序验证是有用的,而第二个对于开发是有用的。这似乎是第一次以如此开放和分布式的方式向开发人员提供论证。
{"title":"Explainable argumentation as a service","authors":"Nikolaos I. Spanoudakis ,&nbsp;Georgios Gligoris ,&nbsp;Adamos Koumi ,&nbsp;Antonis C. Kakas","doi":"10.1016/j.websem.2023.100772","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100772","url":null,"abstract":"<div><p><span>Gorgias Cloud</span><span> offers an integrated application development environment that facilitates the development of argumentation-based systems over the internet. Argumentation is offered as a service in a way that this allows application systems to remotely access the argumentation service and utilize the results of the argumentative computation. Moreover, the service results include the explanation of the decision in both human and machine-readable formats. The first is useful for allowing the application validation to be done by experts, while the second is useful for development. It appears that this is the first case where argumentation is offered to developers in such an open and distributed way.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49876693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Web of Musical Things: Achieving interoperability in the Internet of Musical Things 音乐物的语义网:实现音乐物联网的互操作性
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-01-01 DOI: 10.1016/j.websem.2022.100758
Luca Turchet , Francesco Antoniazzi

The Internet of Musical Things (IoMusT) refers to the extension of the Internet of Things paradigm to the musical domain. Interoperability represents a central issue within this domain, where heterogeneous Musical Things serving radically different purposes are envisioned to communicate between each other. Automatic discovery of resources is also a desirable feature in IoMusT ecosystems. However, the existing musical protocols are not adequate to support discoverability and interoperability across the wide heterogeneity of Musical Things, as they are typically not flexible, lack high resolution, are not equipped with inference mechanisms that could exploit on board the information on the whole application environment. Besides, they hardly ever support easy integration with the Web. In addition, IoMusT applications are often characterized by strict requirements in terms of latency of the exchanged messages. Semantic Web of Things technologies have the potential to overcome the limitations of existing musical protocols by enabling discoverability and interoperability across heterogeneous Musical Things. In this paper we propose the Musical Semantic Event Processing Architecture (MUSEPA), a semantically-based architecture designed to meet the IoMusT requirements of low-latency communication, discoverability, interoperability, and automatic inference. The architecture is based on the CoAP protocol, a semantic publish/subscribe broker, and the adoption of shared ontologies for describing Musical Things and their interactions. The code implementing MUSEPA can be accessed at: https://github.com/CIMIL/MUSEPA/.

音乐物联网(IoMusT)是指物联网范式向音乐领域的延伸。互操作性代表了该领域的一个核心问题,在该领域中,服务于完全不同目的的异构音乐事物被设想为在彼此之间进行通信。自动发现资源也是IoMusT生态系统的一个理想特征。然而,现有的音乐协议不足以支持跨音乐事物的广泛异构性的可发现性和互操作性,因为它们通常不灵活,缺乏高分辨率,没有配备可以利用整个应用环境中的信息的推理机制。此外,它们几乎从不支持与Web轻松集成。此外,IoMusT应用程序在交换消息的延迟方面通常具有严格的要求。语义物联网技术有可能通过实现异构音乐事物的可发现性和互操作性来克服现有音乐协议的局限性。在本文中,我们提出了音乐语义事件处理架构(MUSEPA),这是一种基于语义的架构,旨在满足IoMusT对低延迟通信、可发现性、互操作性和自动推理的要求。该体系结构基于CoAP协议、语义发布/订阅代理,以及采用共享本体来描述音乐事物及其交互。实现MUSEPA的代码可访问:https://github.com/CIMIL/MUSEPA/.
{"title":"Semantic Web of Musical Things: Achieving interoperability in the Internet of Musical Things","authors":"Luca Turchet ,&nbsp;Francesco Antoniazzi","doi":"10.1016/j.websem.2022.100758","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100758","url":null,"abstract":"<div><p><span>The Internet of Musical Things (IoMusT) refers to the extension of the Internet of Things<span><span> paradigm to the musical domain. Interoperability represents a central issue within this domain, where heterogeneous Musical Things serving radically different purposes are envisioned to communicate between each other. Automatic discovery of resources is also a desirable feature in IoMusT ecosystems. However, the existing musical protocols are not adequate to support discoverability and interoperability across the wide heterogeneity of Musical Things, as they are typically not flexible, lack high resolution, are not equipped with inference mechanisms that could exploit on board the information on the whole application environment. Besides, they hardly ever support easy integration with the Web. In addition, IoMusT applications are often characterized by strict requirements in terms of latency of the exchanged messages. Semantic Web of Things technologies have the potential to overcome the limitations of existing musical protocols by enabling discoverability and interoperability across heterogeneous Musical Things. In this paper we propose the Musical Semantic Event Processing Architecture (MUSEPA), a semantically-based architecture designed to meet the IoMusT requirements of low-latency communication, discoverability, interoperability, and automatic inference. The architecture is based on the </span>CoAP protocol, a semantic publish/subscribe broker, and the adoption of shared ontologies for describing Musical Things and their interactions. The code implementing MUSEPA can be accessed at: </span></span><span>https://github.com/CIMIL/MUSEPA/</span><svg><path></path></svg>.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50201116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards the Web of Embeddings: Integrating multiple knowledge graph embedding spaces with FedCoder 走向嵌入Web:将多个知识图嵌入空间与FedCoder集成
IF 2.5 3区 计算机科学 Q1 Computer Science Pub Date : 2023-01-01 DOI: 10.1016/j.websem.2022.100741
Matthias Baumgartner , Daniele Dell’Aglio , Heiko Paulheim , Abraham Bernstein

The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web resources, embedding methods made them applicable to machine learning tasks. Subsequently, embedding models for numerous tasks and structures have been developed, and embedding spaces for various resources have been published. The ecosystem of embedding spaces is distributed but not interoperable: Entity embeddings are not readily comparable across different spaces. To parallel the Web of Data with a Web of Embeddings, we must thus integrate available embedding spaces into a uniform space.

Current integration approaches are limited to two spaces and presume that both of them were embedded with the same method — both assumptions are unlikely to hold in the context of a Web of Embeddings. In this paper, we present FedCoder— an approach that integrates multiple embedding spaces via a latent space. We assert that linked entities have a similar representation in the latent space so that entities become comparable across embedding spaces. FedCoder employs an autoencoder to learn this latent space from linked as well as non-linked entities.

Our experiments show that FedCoder substantially outperforms state-of-the-art approaches when faced with different embedding models, that it scales better than previous methods in the number of embedding spaces, and that it improves with more graphs being integrated whilst performing comparably with current approaches that assumed joint learning of the embeddings and were, usually, limited to two sources. Our results demonstrate that FedCoder is well adapted to integrate the distributed, diverse, and large ecosystem of embeddings spaces into an interoperable Web of Embeddings.

语义网是分布式的,但可互操作:分布式的,因为资源是由各种生产者根据他们的具体需求和知识创建和发布的;可互操作,因为实体跨资源链接,允许协同使用来自不同提供商的资源。嵌入方法补充了语义网资源的明确使用,使其适用于机器学习任务。随后,开发了许多任务和结构的嵌入模型,并发布了各种资源的嵌入空间。嵌入空间的生态系统是分布式的,但不可互操作:实体嵌入在不同的空间中不容易进行比较。为了将数据网与嵌入网并行,我们必须将可用的嵌入空间集成到一个统一的空间中。当前的集成方法仅限于两个空间,并假设它们都是用同一方法嵌入的——这两种假设在嵌入Web的上下文中都不太可能成立。在本文中,我们提出了FedCoder——一种通过潜在空间集成多个嵌入空间的方法。我们断言,链接实体在潜在空间中具有相似的表示,因此实体在嵌入空间中变得可比。FedCoder使用了一个自动编码器来从链接和非链接实体中学习这个潜在空间。我们的实验表明,当面对不同的嵌入模型时,FedCoder显著优于最先进的方法,它在嵌入空间的数量上比以前的方法扩展得更好,并且它随着集成更多的图而改进,同时与当前假设嵌入的联合学习的方法相比,仅限于两个来源。我们的结果表明,FedCoder非常适合将分布式、多样化和大型的嵌入空间生态系统集成到可互操作的嵌入Web中。
{"title":"Towards the Web of Embeddings: Integrating multiple knowledge graph embedding spaces with FedCoder","authors":"Matthias Baumgartner ,&nbsp;Daniele Dell’Aglio ,&nbsp;Heiko Paulheim ,&nbsp;Abraham Bernstein","doi":"10.1016/j.websem.2022.100741","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100741","url":null,"abstract":"<div><p>The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web resources, embedding methods made them applicable to machine learning tasks. Subsequently, embedding models for numerous tasks and structures have been developed, and embedding spaces for various resources have been published. The ecosystem of embedding spaces is distributed but not interoperable: Entity embeddings are not readily comparable across different spaces. To parallel the Web of Data with a Web of Embeddings, we must thus integrate available embedding spaces into a uniform space.</p><p>Current integration approaches are limited to two spaces and presume that both of them were embedded with the same method — both assumptions are unlikely to hold in the context of a Web of Embeddings. In this paper, we present FedCoder— an approach that integrates multiple embedding spaces via a latent space. We assert that linked entities have a similar representation in the latent space so that entities become comparable across embedding spaces. FedCoder employs an autoencoder to learn this latent space from linked as well as non-linked entities.</p><p>Our experiments show that FedCoder substantially outperforms state-of-the-art approaches when faced with different embedding models, that it scales better than previous methods in the number of embedding spaces, and that it improves with more graphs being integrated whilst performing comparably with current approaches that assumed joint learning of the embeddings and were, usually, limited to two sources. Our results demonstrate that FedCoder is well adapted to integrate the distributed, diverse, and large ecosystem of embeddings spaces into an interoperable Web of Embeddings.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50201121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Web Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1