Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari
{"title":"Computerized Data-Preprocessing To Improve Data Quality","authors":"Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari","doi":"10.1109/ICPC2T53885.2022.9776676","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.","PeriodicalId":283298,"journal":{"name":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC2T53885.2022.9776676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.