{"title":"ETL-aware materialized view selection in semantic data stream warehouses","authors":"Nabila Berkani, Ladjel Bellatreche, C. Ordonez","doi":"10.1109/RCIS.2018.8406668","DOIUrl":null,"url":null,"abstract":"For 25 years, several companies spent a lot of efforts and money in building warehouse (DW) applications for data analytics purposes. This technology contributes to the success stories of several companies. Nowadays, companies are looking for real-time analytics for data issued from fresh data sources and external resources as knowledge bases and linked open data. The traditional life-cycle of designing DW applications has to be revisited to meet this requirement. Note that this life-cycle is composed of several well-connected phases. Integrating this requirement will seriously impact all phases in charge of data which are: ETL (Extract, Transform, Load) and the physical design phase, in which physical optimization structures are selected to speed up OLAP queries. In this paper, we propose a Near Real Time Data Warehouse design (NRTDW) dealing with semantic data sources, with a particular focus on ETL and physical design phases. Firstly, we propose a dynamic materialized view selection method based on a workload of Sparql queries. Secondly, optimized algorithms are proposed to orchestrate the ETL flows considering the selected materialized views. Thirdly, an incremental view maintenance strategy recomputing only the graphs that involve the updated data sources is proposed. Finally, our findings are validated through an intensive experimentation using a detailed cost model on a real DBMS.","PeriodicalId":408651,"journal":{"name":"2018 12th International Conference on Research Challenges in Information Science (RCIS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Research Challenges in Information Science (RCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2018.8406668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
For 25 years, several companies spent a lot of efforts and money in building warehouse (DW) applications for data analytics purposes. This technology contributes to the success stories of several companies. Nowadays, companies are looking for real-time analytics for data issued from fresh data sources and external resources as knowledge bases and linked open data. The traditional life-cycle of designing DW applications has to be revisited to meet this requirement. Note that this life-cycle is composed of several well-connected phases. Integrating this requirement will seriously impact all phases in charge of data which are: ETL (Extract, Transform, Load) and the physical design phase, in which physical optimization structures are selected to speed up OLAP queries. In this paper, we propose a Near Real Time Data Warehouse design (NRTDW) dealing with semantic data sources, with a particular focus on ETL and physical design phases. Firstly, we propose a dynamic materialized view selection method based on a workload of Sparql queries. Secondly, optimized algorithms are proposed to orchestrate the ETL flows considering the selected materialized views. Thirdly, an incremental view maintenance strategy recomputing only the graphs that involve the updated data sources is proposed. Finally, our findings are validated through an intensive experimentation using a detailed cost model on a real DBMS.