Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, D. Gruhl, Steve Welch
Abstract The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such Knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.
{"title":"Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop","authors":"Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, D. Gruhl, Steve Welch","doi":"10.2139/ssrn.3502435","DOIUrl":"https://doi.org/10.2139/ssrn.3502435","url":null,"abstract":"Abstract The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such Knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"1 1","pages":"100546"},"PeriodicalIF":2.5,"publicationDate":"2019-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72711789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.
{"title":"A Bootstrapping Approach to Entity Linkage on the Semantic Web","authors":"Wei Hu, Cunxin Jia","doi":"10.2139/ssrn.3199193","DOIUrl":"https://doi.org/10.2139/ssrn.3199193","url":null,"abstract":"In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"30 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of the Semantic Web Challenge is to provide researchers and industry with a forum to showcase the best Semantic Web applications, to demonstrate practical progress towards achieving the vision of the Semantic Web, and to show the value of Semantic Web technologies within various application domains. The Semantic Web Challenge has been organised annually since 2003.The Semantic Web Challenge 2013 took place at the 13th International Semantic Web Conference held in Sydney, Australia, from 23-25 October, 2013. As in previous years, the challenge required that applications had to provide a practical value to web users or domain experts. Systems should also make use of heterogeneous information sources under diverse ownership or control, and the meaning of data should play a central role. The Semantic Web Challenge 2013 received 17 submissions. All submissions were evaluated rigorously by a jury composed of leading scientists and experts from industry in a 3-round knockout competition, according to a comprehensive set of challenge requirements. All 17 submissions were invited to present a poster and demonstration during the ISWC conference. Following this, nine finalists were chosen to give an oral presentation and live demo during a dedicated session, with the winners then being selected.
{"title":"Editorial: Special Issue Semantic Web Challenge 2013","authors":"A. Harth, S. Bechhofer","doi":"10.2139/ssrn.3199101","DOIUrl":"https://doi.org/10.2139/ssrn.3199101","url":null,"abstract":"The goal of the Semantic Web Challenge is to provide researchers and industry with a forum to showcase the best Semantic Web applications, to demonstrate practical progress towards achieving the vision of the Semantic Web, and to show the value of Semantic Web technologies within various application domains. The Semantic Web Challenge has been organised annually since 2003.The Semantic Web Challenge 2013 took place at the 13th International Semantic Web Conference held in Sydney, Australia, from 23-25 October, 2013. As in previous years, the challenge required that applications had to provide a practical value to web users or domain experts. Systems should also make use of heterogeneous information sources under diverse ownership or control, and the meaning of data should play a central role. The Semantic Web Challenge 2013 received 17 submissions. All submissions were evaluated rigorously by a jury composed of leading scientists and experts from industry in a 3-round knockout competition, according to a comprehensive set of challenge requirements. All 17 submissions were invited to present a poster and demonstration during the ISWC conference. Following this, nine finalists were chosen to give an oral presentation and live demo during a dedicated session, with the winners then being selected.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"1 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!
{"title":"Editorial: Special Issue on Data Linking","authors":"A. Ferrara, A. Nikolov, F. Scharffe","doi":"10.2139/ssrn.3199075","DOIUrl":"https://doi.org/10.2139/ssrn.3199075","url":null,"abstract":"In this special issue of the Journal of Web Semantics, we present two papers dealing both with one of the most important problem in the field of web data management: data interlinking. This field has gained significant interest over the last years, with the evolution of web technologies enabling the emergence of a web of data. The exponentially increasing number of data sources published as linked data or embedded in web pages through the use of dedicated schemas require techniques able to efficiently identify common entities appearing across these sources. Over the last years many systems were developed involving a wide range of techniques taking into account various information about the data sets involved in order to find the most accurate links between them. Vocabularies, existing links, data ranges, ontology alignments, and user input are combined for the best results. Most efficient systems are semiautomated as they require the user to input a linkage specification, indicating what to link with what and thus guiding the tool in the process. However, for web scale data interlinking, the amount of user input in a link specification is still too high. Most recent research thus focus on minimizing the user input. The two papers in this special issue are presenting research results going in this direction, each of them following a specific path to achieve a similar goal. In the first paper Active Learning of Expressive Linkage Rules using Genetic Programming, the authors of the interlinking tool Silk present a technique to automate the construction of linkage specifications through active learning and genetic algorithms. The resulting system only requires the user to validate a few links until an acceptable specification is reached. In the second paper An Automatic Key Discovery Approach for Data Linking, Fatiha SAIS, Nathalie Pernelle, and Danai Symeonidou propose a technique to automate the selection of predicates to be compared during the interlinking process. The method discovers sets of properties allowing to identify data resources uniquely in a given data set, similarly to the notion of keys in relational databases. Both articles have gone through a very rigorous selection process and were both improved since their first submission. It was an editorial choice to only retain articles meeting a very high standard, resulting in only two articles published. We believe this will ensure a stronger field of research. Enjoy reading!","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"23 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial - Semantic Web Challange, 2010","authors":"Christian Bizer, D. Maynard","doi":"10.2139/SSRN.3199525","DOIUrl":"https://doi.org/10.2139/SSRN.3199525","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"9 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68573045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Semantic Web & Web 2.0","authors":"P. Mika, M. Greaves","doi":"10.2139/ssrn.3199374","DOIUrl":"https://doi.org/10.2139/ssrn.3199374","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"6 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68573137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial - Special Issue \" Messiness of the Web of Data\"","authors":"S. Schlobach, Craig A. Knoblock","doi":"10.2139/ssrn.3198959","DOIUrl":"https://doi.org/10.2139/ssrn.3198959","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"14 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial - Special Issue \"The Semantic Web Challenge, 2011\"","authors":"Christian Bizer, D. Maynard","doi":"10.2139/ssrn.3198978","DOIUrl":"https://doi.org/10.2139/ssrn.3198978","url":null,"abstract":"","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"16 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Journal of Web Semantics is adding a new letters section as a place to publish comments on recent Journal of Web Semantics articles that have appeared either in print or online.
{"title":"Letters to the Journal","authors":"Timothy W. Finin, Ian Horrocks, Steffen Staab","doi":"10.2139/ssrn.3198972","DOIUrl":"https://doi.org/10.2139/ssrn.3198972","url":null,"abstract":"The Journal of Web Semantics is adding a new letters section as a place to publish comments on recent Journal of Web Semantics articles that have appeared either in print or online.","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"15 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response to comments on WebPIE","authors":"J. Urbani, S. Kotoulas, J. Maassen, F. V. Harmelen, H. Bal","doi":"10.2139/ssrn.3198974","DOIUrl":"https://doi.org/10.2139/ssrn.3198974","url":null,"abstract":"The authors respond to Dr. Patel-Schneider's comments on their article WebPIE: A Web-scale Parallel Inference Engine using MapReduce .","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"15 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68572845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}