Pub Date : 2023-04-01DOI: 10.1016/j.websem.2022.100761
Jixiong Liu , Yoan Chabot , Raphaël Troncy , Viet-Phi Huynh , Thomas Labbé , Pierre Monnin
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.
{"title":"From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods","authors":"Jixiong Liu , Yoan Chabot , Raphaël Troncy , Viet-Phi Huynh , Thomas Labbé , Pierre Monnin","doi":"10.1016/j.websem.2022.100761","DOIUrl":"https://doi.org/10.1016/j.websem.2022.100761","url":null,"abstract":"<div><p>Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format<span> is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100761"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49903590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1016/j.websem.2023.100772
Nikolaos I. Spanoudakis , Georgios Gligoris , Adamos Koumi , Antonis C. Kakas
Gorgias Cloud offers an integrated application development environment that facilitates the development of argumentation-based systems over the internet. Argumentation is offered as a service in a way that this allows application systems to remotely access the argumentation service and utilize the results of the argumentative computation. Moreover, the service results include the explanation of the decision in both human and machine-readable formats. The first is useful for allowing the application validation to be done by experts, while the second is useful for development. It appears that this is the first case where argumentation is offered to developers in such an open and distributed way.
{"title":"Explainable argumentation as a service","authors":"Nikolaos I. Spanoudakis , Georgios Gligoris , Adamos Koumi , Antonis C. Kakas","doi":"10.1016/j.websem.2023.100772","DOIUrl":"https://doi.org/10.1016/j.websem.2023.100772","url":null,"abstract":"<div><p><span>Gorgias Cloud</span><span> offers an integrated application development environment that facilitates the development of argumentation-based systems over the internet. Argumentation is offered as a service in a way that this allows application systems to remotely access the argumentation service and utilize the results of the argumentative computation. Moreover, the service results include the explanation of the decision in both human and machine-readable formats. The first is useful for allowing the application validation to be done by experts, while the second is useful for development. It appears that this is the first case where argumentation is offered to developers in such an open and distributed way.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100772"},"PeriodicalIF":2.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49876693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing a tableau reasoner and its implementation in Prolog","authors":"Riccardo Zese, Giuseppe Cota","doi":"10.2139/ssrn.3945445","DOIUrl":"https://doi.org/10.2139/ssrn.3945445","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"63 1","pages":"100677"},"PeriodicalIF":2.5,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86499553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The heterogeneity of energy ontologies hinders the interoperability between ontology-based energy management applications to perform a large-scale energy management. Thus, there is the need for a global ontology that provides common vocabularies to represent the energy subdomains. A global energy ontology must provide a balance of reusability–usability to moderate the effort required to reuse it in different applications. This paper presents DABGEO: a reusable and usable global ontology for the energy domain that provides a common representation of energy domains represented by existing energy ontologies. DABGEO can be reused by ontology engineers to develop ontologies for specific energy management applications. In contrast to previous global energy ontologies, it follows a layered structure to provide a balance of reusability–usability. In this work, we provide an overview of the structure of DABGEO and we explain how to reuse it in a particular application case. In addition, the paper includes an evaluation of DABGEO to demonstrate that it provides a balance of reusability–usability.
{"title":"DABGEO: A Reusable and Usable Global Energy Ontology for the Energy Domain","authors":"Javier Cuenca, F. Larrinaga, E. Curry","doi":"10.2139/ssrn.3531214","DOIUrl":"https://doi.org/10.2139/ssrn.3531214","url":null,"abstract":"Abstract The heterogeneity of energy ontologies hinders the interoperability between ontology-based energy management applications to perform a large-scale energy management. Thus, there is the need for a global ontology that provides common vocabularies to represent the energy subdomains. A global energy ontology must provide a balance of reusability–usability to moderate the effort required to reuse it in different applications. This paper presents DABGEO: a reusable and usable global ontology for the energy domain that provides a common representation of energy domains represented by existing energy ontologies. DABGEO can be reused by ontology engineers to develop ontologies for specific energy management applications. In contrast to previous global energy ontologies, it follows a layered structure to provide a balance of reusability–usability. In this work, we provide an overview of the structure of DABGEO and we explain how to reuse it in a particular application case. In addition, the paper includes an evaluation of DABGEO to demonstrate that it provides a balance of reusability–usability.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2020-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45153825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, D. Gruhl, Steve Welch
Abstract The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such Knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.
{"title":"Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop","authors":"Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, D. Gruhl, Steve Welch","doi":"10.2139/ssrn.3502435","DOIUrl":"https://doi.org/10.2139/ssrn.3502435","url":null,"abstract":"Abstract The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such Knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"1 1","pages":"100546"},"PeriodicalIF":2.5,"publicationDate":"2019-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72711789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.
{"title":"A Bootstrapping Approach to Entity Linkage on the Semantic Web","authors":"Wei Hu, Cunxin Jia","doi":"10.2139/ssrn.3199193","DOIUrl":"https://doi.org/10.2139/ssrn.3199193","url":null,"abstract":"In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"30 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of the Semantic Web Challenge is to provide researchers and industry with a forum to showcase the best Semantic Web applications, to demonstrate practical progress towards achieving the vision of the Semantic Web, and to show the value of Semantic Web technologies within various application domains. The Semantic Web Challenge has been organised annually since 2003.The Semantic Web Challenge 2013 took place at the 13th International Semantic Web Conference held in Sydney, Australia, from 23-25 October, 2013. As in previous years, the challenge required that applications had to provide a practical value to web users or domain experts. Systems should also make use of heterogeneous information sources under diverse ownership or control, and the meaning of data should play a central role. The Semantic Web Challenge 2013 received 17 submissions. All submissions were evaluated rigorously by a jury composed of leading scientists and experts from industry in a 3-round knockout competition, according to a comprehensive set of challenge requirements. All 17 submissions were invited to present a poster and demonstration during the ISWC conference. Following this, nine finalists were chosen to give an oral presentation and live demo during a dedicated session, with the winners then being selected.
{"title":"Editorial: Special Issue Semantic Web Challenge 2013","authors":"A. Harth, S. Bechhofer","doi":"10.2139/ssrn.3199101","DOIUrl":"https://doi.org/10.2139/ssrn.3199101","url":null,"abstract":"The goal of the Semantic Web Challenge is to provide researchers and industry with a forum to showcase the best Semantic Web applications, to demonstrate practical progress towards achieving the vision of the Semantic Web, and to show the value of Semantic Web technologies within various application domains. The Semantic Web Challenge has been organised annually since 2003.The Semantic Web Challenge 2013 took place at the 13th International Semantic Web Conference held in Sydney, Australia, from 23-25 October, 2013. As in previous years, the challenge required that applications had to provide a practical value to web users or domain experts. Systems should also make use of heterogeneous information sources under diverse ownership or control, and the meaning of data should play a central role. The Semantic Web Challenge 2013 received 17 submissions. All submissions were evaluated rigorously by a jury composed of leading scientists and experts from industry in a 3-round knockout competition, according to a comprehensive set of challenge requirements. All 17 submissions were invited to present a poster and demonstration during the ISWC conference. Following this, nine finalists were chosen to give an oral presentation and live demo during a dedicated session, with the winners then being selected.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"1 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!
{"title":"Editorial: Special Issue on Data Linking","authors":"A. Ferrara, A. Nikolov, F. Scharffe","doi":"10.2139/ssrn.3199075","DOIUrl":"https://doi.org/10.2139/ssrn.3199075","url":null,"abstract":"In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"23 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial - Semantic Web Challange, 2010","authors":"Christian Bizer, D. Maynard","doi":"10.2139/SSRN.3199525","DOIUrl":"https://doi.org/10.2139/SSRN.3199525","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"9 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68573045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Semantic Web & Web 2.0","authors":"P. Mika, M. Greaves","doi":"10.2139/ssrn.3199374","DOIUrl":"https://doi.org/10.2139/ssrn.3199374","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"6 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68573137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}