{"title":"An Extensible Approach for Materialized Big Data Integration in Distributed Computation Environments","authors":"V. Sazontev, S. Stupnikov","doi":"10.1109/IVMEM.2019.00011","DOIUrl":null,"url":null,"abstract":"Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the world of big data with large number of heterogenous data sources, there are number of methods that address various aspects of integration, to make the system automatic and less user-dependent. This work proposes an extensible approach for development of data integration system to perform materialized integration of heterogenous sources in a distributed computation environment. A prototype of the system with implementation of advanced methods for big data integration has been developed. The system is applied in e-commerce domain.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVMEM.2019.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the world of big data with large number of heterogenous data sources, there are number of methods that address various aspects of integration, to make the system automatic and less user-dependent. This work proposes an extensible approach for development of data integration system to perform materialized integration of heterogenous sources in a distributed computation environment. A prototype of the system with implementation of advanced methods for big data integration has been developed. The system is applied in e-commerce domain.