{"title":"Testing Extract-Transform-Load Process in Data Warehouse Systems","authors":"Hajar Homayouni","doi":"10.1109/ISSREW.2018.000-6","DOIUrl":null,"url":null,"abstract":"Enterprises use data warehouses to accumulate data from multiple sources for analysis and research. A data warehouse is populated using the Extract, Transform, and Load (ETL) process that (1) extracts data from various sources, (2) integrates, cleans, and transforms it into a common form, and (3) loads it into the data warehouse. Faults in the ETL implementation and execution can lead to incorrect data in the data warehouse, which renders it useless irrespective of the quality of the applications accessing it and the quality of the source data. Thus, ETL processes must be thoroughly tested to validate the correctness of the ETL implementation. This project develops and evaluates two types of functional testing approaches, namely data quality, and balancing tests. Data quality tests validate the data in the target data warehouse in isolation and balancing tests check for discrepancies between the source and target data. This paper describes the proposed approach, the work accomplished to date, and the expected contributions of this research.","PeriodicalId":321448,"journal":{"name":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW.2018.000-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Enterprises use data warehouses to accumulate data from multiple sources for analysis and research. A data warehouse is populated using the Extract, Transform, and Load (ETL) process that (1) extracts data from various sources, (2) integrates, cleans, and transforms it into a common form, and (3) loads it into the data warehouse. Faults in the ETL implementation and execution can lead to incorrect data in the data warehouse, which renders it useless irrespective of the quality of the applications accessing it and the quality of the source data. Thus, ETL processes must be thoroughly tested to validate the correctness of the ETL implementation. This project develops and evaluates two types of functional testing approaches, namely data quality, and balancing tests. Data quality tests validate the data in the target data warehouse in isolation and balancing tests check for discrepancies between the source and target data. This paper describes the proposed approach, the work accomplished to date, and the expected contributions of this research.