{"title":"Extracting linked data from statistic spreadsheets","authors":"Tien-Duc Cao, I. Manolescu, Xavier Tannier","doi":"10.1145/3066911.3066914","DOIUrl":null,"url":null,"abstract":"Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economy etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistics published by INSEE, the national French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD, to populate an instance of this model. We used our method to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Workshop on Semantic Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3066911.3066914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economy etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistics published by INSEE, the national French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD, to populate an instance of this model. We used our method to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.