Julian Eberius, Patrick Damme, Katrin Braunschweig, Maik Thiele, Wolfgang Lehner
Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the system's response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives.
{"title":"Publish-time data integration for open data platforms","authors":"Julian Eberius, Patrick Damme, Katrin Braunschweig, Maik Thiele, Wolfgang Lehner","doi":"10.1145/2500410.2500413","DOIUrl":"https://doi.org/10.1145/2500410.2500413","url":null,"abstract":"Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the system's response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123605033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Points of interest (POIs) in a city are specific locations that present some significance to people; examples include restaurants, museums, hotels, theatres and landmarks, just to name a few. Due to their role in our social and economic life, POIs have been increasingly gaining the attention of location-based applications, such as online maps and social networking sites. While it is relatively easy to find on the Web basic information about a POI, such as its geographic location, telephone number and opening hours, it is more challenging to have a deeper knowledge as to what other people say about it. What if a person wants to know all the restaurants in Paris that serve good seafood and provide a kind service? Typically, the answer to this question has to be looked for on websites that let people leave comments and opinions on POIs, a time-consuming manual task that few are willing to do. This search would be better supported by search engines if information mined from opinions were available in a structured form, such as RDF. In this position paper, we describe a general approach to enrich an existing RDF repository about POIs with data obtained from social networking sites.
{"title":"On the enrichment of a RDF repository of city points of interest based on social data","authors":"Zied Sellami, Gianluca Quercini, C. Reynaud","doi":"10.1145/2500410.2500411","DOIUrl":"https://doi.org/10.1145/2500410.2500411","url":null,"abstract":"Points of interest (POIs) in a city are specific locations that present some significance to people; examples include restaurants, museums, hotels, theatres and landmarks, just to name a few. Due to their role in our social and economic life, POIs have been increasingly gaining the attention of location-based applications, such as online maps and social networking sites. While it is relatively easy to find on the Web basic information about a POI, such as its geographic location, telephone number and opening hours, it is more challenging to have a deeper knowledge as to what other people say about it. What if a person wants to know all the restaurants in Paris that serve good seafood and provide a kind service? Typically, the answer to this question has to be looked for on websites that let people leave comments and opinions on POIs, a time-consuming manual task that few are willing to do. This search would be better supported by search engines if information mined from opinions were available in a structured form, such as RDF. In this position paper, we describe a general approach to enrich an existing RDF repository about POIs with data obtained from social networking sites.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131526024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes one of the specific functionalities of the data.bnf.fr library discovery service: the use of semantic Web technologies to create Web pages around "named entities" from the authority files.
{"title":"The 'intellectual network': linking writers in the data.bnf.fr project","authors":"Romain Wenz","doi":"10.1145/2500410.2500418","DOIUrl":"https://doi.org/10.1145/2500410.2500418","url":null,"abstract":"This paper describes one of the specific functionalities of the data.bnf.fr library discovery service: the use of semantic Web technologies to create Web pages around \"named entities\" from the authority files.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128392824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irene Petrou, George Papastefanatos, Theodore Dalamagas
In this paper we present a case study on publishing statistical data as Linked Open Data. Statistical or fact-based data are maintained by statistical agencies and organizations, harvested via surveys or aggregated from other sources and mainly concern to observations of socioeconomic indicators. In this case study, we present the publishing as LOD of the preliminary results of Greece's resident population census, conducted in 2011. We have employed the Data Cube vocabulary and the Google Refine tool for modelling and publishing the census results.
{"title":"Publishing census as linked open data: a case study","authors":"Irene Petrou, George Papastefanatos, Theodore Dalamagas","doi":"10.1145/2500410.2500412","DOIUrl":"https://doi.org/10.1145/2500410.2500412","url":null,"abstract":"In this paper we present a case study on publishing statistical data as Linked Open Data. Statistical or fact-based data are maintained by statistical agencies and organizations, harvested via surveys or aggregated from other sources and mainly concern to observations of socioeconomic indicators. In this case study, we present the publishing as LOD of the preliminary results of Greece's resident population census, conducted in 2011. We have employed the Data Cube vocabulary and the Google Refine tool for modelling and publishing the census results.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123154557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Castanier, Rémi Coletta, P. Valduriez, Christian Frisch
Working with open data sources can yield high value information but raises major problems in terms of metadata extraction, data source integration and visualization. In this paper we describe a demonstration of WebSmatch, a flexible environment for Web data integration, based on a real, end-to-end data integration scenario over public data from Data Publica. The demonstration focuses on poorly structured input data sources (XLS files).
{"title":"WebSmatch: a tool for open data","authors":"E. Castanier, Rémi Coletta, P. Valduriez, Christian Frisch","doi":"10.1145/2500410.2500420","DOIUrl":"https://doi.org/10.1145/2500410.2500420","url":null,"abstract":"Working with open data sources can yield high value information but raises major problems in terms of metadata extraction, data source integration and visualization. In this paper we describe a demonstration of WebSmatch, a flexible environment for Web data integration, based on a real, end-to-end data integration scenario over public data from Data Publica. The demonstration focuses on poorly structured input data sources (XLS files).","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133238242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need to better integrate and link various isolated data sources on the web has been widely recognized and is tackled by the Linked Open Data (LOD) initiative. One of the problems to address is the issue of publishing and subsequently exploiting the data as LOD, due to reasons of data size and performance of the respective queries and to the publication complexity. This work addresses the size and performance issues by adapting the cloud as a hosting platform for LOD publication services so as to exploit its scalability and elasticity capabilities. The publication complexity issue is addressed by proposing a Linked Open Data-as-a-Service approach offering an integrated service based API for (semi)automatic publication of relational data as LOD and subsequent querying and updating capabilities.
{"title":"Linked open GeoData management in the cloud","authors":"K. Kritikos, Yannis Rousakis, D. Kotzinos","doi":"10.1145/2500410.2500414","DOIUrl":"https://doi.org/10.1145/2500410.2500414","url":null,"abstract":"The need to better integrate and link various isolated data sources on the web has been widely recognized and is tackled by the Linked Open Data (LOD) initiative. One of the problems to address is the issue of publishing and subsequently exploiting the data as LOD, due to reasons of data size and performance of the respective queries and to the publication complexity. This work addresses the size and performance issues by adapting the cloud as a hosting platform for LOD publication services so as to exploit its scalability and elasticity capabilities. The publication complexity issue is addressed by proposing a Linked Open Data-as-a-Service approach offering an integrated service based API for (semi)automatic publication of relational data as LOD and subsequent querying and updating capabilities.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123860420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We deal in this paper with the problem of creating an interactive and visual map for a large collection of Open datasets. We first describe how to define a representation space for such data, using text mining techniques to create features. Then, with a similarity measure between Open datasets, we use the K-nearest neighbors method for building a proximity graph between datasets. We use a force-directed layout method to visualize the graph (Tulip Software). We present the results with a collection of 300,000 datasets from the French Open data web site, in which the display of the graph is limited to 150,000 datasets. We study the discovered clusters and we show how they can be used to browse this large collection.
{"title":"Visualizing a large collection of open datasets: an experiment with proximity graphs","authors":"Tianyang Liu, D. Ahmed, F. Bouali, G. Venturini","doi":"10.1145/2500410.2500417","DOIUrl":"https://doi.org/10.1145/2500410.2500417","url":null,"abstract":"We deal in this paper with the problem of creating an interactive and visual map for a large collection of Open datasets. We first describe how to define a representation space for such data, using text mining techniques to create features. Then, with a similarity measure between Open datasets, we use the K-nearest neighbors method for building a proximity graph between datasets. We use a force-directed layout method to visualize the graph (Tulip Software). We present the results with a collection of 300,000 datasets from the French Open data web site, in which the display of the graph is limited to 150,000 datasets. We study the discovered clusters and we show how they can be used to browse this large collection.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122123273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe the development of the first ontology module for observation of pest attacks in crop production. We applied the NeOn methodology and more particularly the ontology engineering method based on Ontology Design Pattern.
{"title":"Agronomic taxon","authors":"C. Roussey, J. Chanet, V. Cellier, Fabien Amarger","doi":"10.1145/2500410.2500415","DOIUrl":"https://doi.org/10.1145/2500410.2500415","url":null,"abstract":"In this paper, we describe the development of the first ontology module for observation of pest attacks in crop production. We applied the NeOn methodology and more particularly the ontology engineering method based on Ontology Design Pattern.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116696811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
KD2R allows the automatic discovery of composite key constraints in RDF data sources that conform to a given ontology. We consider data sources for which the Unique Name Assumption is fulfilled. KD2R allows this discovery without having to scan all the data. Indeed, the proposed system looks for maximal non keys and derives minimal keys from this set of non keys. KD2R has been tested on several datasets available on the web of data and it has obtained promising results when the discovered keys are used to link data. In the demo, we will demonstrate the functionality of our tool and we will show on several datasets that the keys can be used in a datalinking tool.
{"title":"Discovering keys in RDF/OWL dataset with KD2R","authors":"Danai Symeonidou, N. Pernelle, Fatiha Saïs","doi":"10.1145/2500410.2500419","DOIUrl":"https://doi.org/10.1145/2500410.2500419","url":null,"abstract":"KD2R allows the automatic discovery of composite key constraints in RDF data sources that conform to a given ontology. We consider data sources for which the Unique Name Assumption is fulfilled. KD2R allows this discovery without having to scan all the data. Indeed, the proposed system looks for maximal non keys and derives minimal keys from this set of non keys. KD2R has been tested on several datasets available on the web of data and it has obtained promising results when the discovered keys are used to link data. In the demo, we will demonstrate the functionality of our tool and we will show on several datasets that the keys can be used in a datalinking tool.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129259833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Pernelle, Fatiha Saïs, B. Safar, Maria Koutraki, Tushar Ghosh
Thanks to the initiative of Linked Open Data, the RDF datasets that are published on the Web are more and more numerous. One active research field currently concerns the problem of finding links between entities. We focus in this paper on ontology-based data linking approaches which use linking rules based on the available schemas (or ontologies). This kind of systems assume to have beforehand a set of mappings between ontology elements. However, this set of mappings could be incomplete. We propose in this paper a data linking approach called N2R-Part. It is based on the computation of similarity scores by exploiting at the same time properties for which a mapping exists and those for which there is no mapping. We illustrate throughout an example how the exploitation of the unmapped properties improves the data linking results.
{"title":"N2R-part: identity link discovery using partially aligned ontologies","authors":"N. Pernelle, Fatiha Saïs, B. Safar, Maria Koutraki, Tushar Ghosh","doi":"10.1145/2500410.2500416","DOIUrl":"https://doi.org/10.1145/2500410.2500416","url":null,"abstract":"Thanks to the initiative of Linked Open Data, the RDF datasets that are published on the Web are more and more numerous. One active research field currently concerns the problem of finding links between entities. We focus in this paper on ontology-based data linking approaches which use linking rules based on the available schemas (or ontologies). This kind of systems assume to have beforehand a set of mappings between ontology elements. However, this set of mappings could be incomplete. We propose in this paper a data linking approach called N2R-Part. It is based on the computation of similarity scores by exploiting at the same time properties for which a mapping exists and those for which there is no mapping. We illustrate throughout an example how the exploitation of the unmapped properties improves the data linking results.","PeriodicalId":328711,"journal":{"name":"International Workshop on Open Data","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}