Abdelghani Bakhtouchi, Ladjel Bellatreche, Stéphane Jean, Y. A. Ameur
{"title":"本体作为同时集成和协调数据源的解决方案","authors":"Abdelghani Bakhtouchi, Ladjel Bellatreche, Stéphane Jean, Y. A. Ameur","doi":"10.1109/RCIS.2012.6240431","DOIUrl":null,"url":null,"abstract":"With the increasing needs for the world wide enterprises to integrate, share and visualize data from various heterogeneous, autonomous and distributed sources data and Web data covering a given domain, the development of integration and reconciliation solutions becomes a challenging issue. The existing studies on data integration and reconciliation of results have been developed in an isolated way and did not consider the strong integration between these two processes. On one hand, ontologies were largely used for building automatic integration systems due to their ability to reduce schematic and semantic heterogeneities that may exist among sources. On the other hand, reconciliation of results is performed either by considering that all sources use the same identifier for an instance or by means of statistical methods that identify affinities between concepts. These reconciliation solutions are not usually suitable for real-world sensitive-applications where exact results are required and where each source may use a different identifier for the same concept. In this paper, we propose a methodology that simultaneously integrate source data and reconciliate their instances based on ontologies enriched with functional dependencies (FD) in a mediation architecture. The presence of FD gives more autonomy to sources when choosing their primary keys and facilitates the result reconciliation. This methodology is experimented using the Lehigh University Benchmark (LUBM) dataset to show its scalability and the quality of the reconciliation result phase.","PeriodicalId":130476,"journal":{"name":"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Ontologies as a solution for simultaneously integrating and reconciliating data sources\",\"authors\":\"Abdelghani Bakhtouchi, Ladjel Bellatreche, Stéphane Jean, Y. A. Ameur\",\"doi\":\"10.1109/RCIS.2012.6240431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing needs for the world wide enterprises to integrate, share and visualize data from various heterogeneous, autonomous and distributed sources data and Web data covering a given domain, the development of integration and reconciliation solutions becomes a challenging issue. The existing studies on data integration and reconciliation of results have been developed in an isolated way and did not consider the strong integration between these two processes. On one hand, ontologies were largely used for building automatic integration systems due to their ability to reduce schematic and semantic heterogeneities that may exist among sources. On the other hand, reconciliation of results is performed either by considering that all sources use the same identifier for an instance or by means of statistical methods that identify affinities between concepts. These reconciliation solutions are not usually suitable for real-world sensitive-applications where exact results are required and where each source may use a different identifier for the same concept. In this paper, we propose a methodology that simultaneously integrate source data and reconciliate their instances based on ontologies enriched with functional dependencies (FD) in a mediation architecture. The presence of FD gives more autonomy to sources when choosing their primary keys and facilitates the result reconciliation. This methodology is experimented using the Lehigh University Benchmark (LUBM) dataset to show its scalability and the quality of the reconciliation result phase.\",\"PeriodicalId\":130476,\"journal\":{\"name\":\"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RCIS.2012.6240431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Research Challenges in Information Science (RCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2012.6240431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
随着全球企业对集成、共享和可视化来自各种异构、自治和分布式源数据和覆盖给定域的Web数据的需求日益增长,集成和协调解决方案的开发成为一个具有挑战性的问题。现有的数据整合与结果协调的研究都是孤立发展的,没有考虑到这两个过程之间的强整合。一方面,本体主要用于构建自动集成系统,因为它们能够减少源之间可能存在的示意图和语义异构性。另一方面,通过考虑所有源对实例使用相同的标识符或通过识别概念之间的亲和力的统计方法来执行结果的协调。这些协调解决方案通常不适合现实世界中的敏感应用程序,这些应用程序需要精确的结果,并且每个源可能对相同的概念使用不同的标识符。在本文中,我们提出了一种方法,该方法可以同时集成源数据并基于中介体系结构中富含功能依赖关系(FD)的本体来协调它们的实例。FD的存在为源在选择主键时提供了更多的自主权,并促进了结果协调。使用Lehigh University Benchmark (LUBM)数据集对该方法进行了实验,以显示其可扩展性和协调结果阶段的质量。
Ontologies as a solution for simultaneously integrating and reconciliating data sources
With the increasing needs for the world wide enterprises to integrate, share and visualize data from various heterogeneous, autonomous and distributed sources data and Web data covering a given domain, the development of integration and reconciliation solutions becomes a challenging issue. The existing studies on data integration and reconciliation of results have been developed in an isolated way and did not consider the strong integration between these two processes. On one hand, ontologies were largely used for building automatic integration systems due to their ability to reduce schematic and semantic heterogeneities that may exist among sources. On the other hand, reconciliation of results is performed either by considering that all sources use the same identifier for an instance or by means of statistical methods that identify affinities between concepts. These reconciliation solutions are not usually suitable for real-world sensitive-applications where exact results are required and where each source may use a different identifier for the same concept. In this paper, we propose a methodology that simultaneously integrate source data and reconciliate their instances based on ontologies enriched with functional dependencies (FD) in a mediation architecture. The presence of FD gives more autonomy to sources when choosing their primary keys and facilitates the result reconciliation. This methodology is experimented using the Lehigh University Benchmark (LUBM) dataset to show its scalability and the quality of the reconciliation result phase.