{"title":"A Course on Data Quality in Analytics","authors":"Hongwei Zhu","doi":"10.1145/3478432.3499100","DOIUrl":null,"url":null,"abstract":"Data quality is important to analytics; data preparation usually involves data cleaning and is often the most time-consuming part of analytics projects. When the topic is left to the discretion of individual courses in an analytics program, students often end up with light exposure to the topic. Instead, a course on data quality in analytics has been designed and implemented. Organized in eight modules, the first part of the course covers data preparation and preprocessing. This prepares students with the ability to tackle real datasets in other analytics courses. The second part covers analytics for data quality where algorithms for detecting and resolving data quality issues are covered. The third part addresses large scale and engineering issues of analytics practice where data collection needs to be managed and data quality tasks must be part of the pipeline.","PeriodicalId":113773,"journal":{"name":"Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 2","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 2","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3478432.3499100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data quality is important to analytics; data preparation usually involves data cleaning and is often the most time-consuming part of analytics projects. When the topic is left to the discretion of individual courses in an analytics program, students often end up with light exposure to the topic. Instead, a course on data quality in analytics has been designed and implemented. Organized in eight modules, the first part of the course covers data preparation and preprocessing. This prepares students with the ability to tackle real datasets in other analytics courses. The second part covers analytics for data quality where algorithms for detecting and resolving data quality issues are covered. The third part addresses large scale and engineering issues of analytics practice where data collection needs to be managed and data quality tasks must be part of the pipeline.