Reuben Borrison, Benjamin Kloepper, Jennifer Mullen
{"title":"Data Preparation for Data Mining in Chemical Plants using Big Data","authors":"Reuben Borrison, Benjamin Kloepper, Jennifer Mullen","doi":"10.1109/INDIN41052.2019.8972078","DOIUrl":null,"url":null,"abstract":"Data preparation for data mining in industrial applications is a key success factor which requires considerable repeated efforts. Although the required activities need to be repeated in very similar fashion across many projects, details of their implementation differ and require both application understanding and experience. As a result, data preparation is done by data mining experts with a strong domain background and a good understanding of the characteristics of the data to be analyzed. Experts with these profiles usually have an engineering background and no strong expertise in distributed programming or big data technology. Unfortunately, the amount of data can be so large that distributed algorithms are required to allow for inspection of results and iteration of preparation steps. This contribution introduces an interactive data preparation workflow for signal data from chemical plants enabling domain experts without background in distributed computing and extensive programming experience to leverage the power of big data technologies.","PeriodicalId":260220,"journal":{"name":"2019 IEEE 17th International Conference on Industrial Informatics (INDIN)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 17th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN41052.2019.8972078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data preparation for data mining in industrial applications is a key success factor which requires considerable repeated efforts. Although the required activities need to be repeated in very similar fashion across many projects, details of their implementation differ and require both application understanding and experience. As a result, data preparation is done by data mining experts with a strong domain background and a good understanding of the characteristics of the data to be analyzed. Experts with these profiles usually have an engineering background and no strong expertise in distributed programming or big data technology. Unfortunately, the amount of data can be so large that distributed algorithms are required to allow for inspection of results and iteration of preparation steps. This contribution introduces an interactive data preparation workflow for signal data from chemical plants enabling domain experts without background in distributed computing and extensive programming experience to leverage the power of big data technologies.