计算机数据预处理提高数据质量

Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari
{"title":"计算机数据预处理提高数据质量","authors":"Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari","doi":"10.1109/ICPC2T53885.2022.9776676","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.","PeriodicalId":283298,"journal":{"name":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computerized Data-Preprocessing To Improve Data Quality\",\"authors\":\"Rohan Gawhade, Lokesh Ramdev Bohara, Jesvin Mathew, Poonam Bari\",\"doi\":\"10.1109/ICPC2T53885.2022.9776676\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.\",\"PeriodicalId\":283298,\"journal\":{\"name\":\"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPC2T53885.2022.9776676\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC2T53885.2022.9776676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在过去的几十年里,机器学习(ML)突然呈指数级增长。大量的资源和文档使人们能够成为ML实践者。公司从他们所做的分析和预测中获得巨额利润。机器学习工程师因其在该领域的知识而获得高薪。它变得很流行,也更容易理解。机器学习中最重要的一个阶段是数据预处理和特征提取。在数据预处理本身中,需要准确地执行各种任务才能提供数据。从处理缺失值到编码和规范化,每个步骤都有其重要性,因此专业人员必须熟练掌握这些步骤。数据预处理步骤取决于所提供的数据类型,即分类数据、连续数据、图像像素数组甚至图像本身。由于需要处理所有的清洁步骤,学习和成为专家变得相当艰苦。此外,它是耗时的,并不能保证预期的结果。因此,有必要处理这个问题。我们的目标是自动化这个完整的过程,以简化机器学习工程师的工作,使其更有效率。任何用户只需要提供数据集,而不必手动选择最新数据挖掘工具提供的处理技术。应用程序将自己观察数据集并应用合适的技术。因为所有的步骤都是自动化的,用户只需要提供数据集,即使是不熟悉机器学习概念的人也可以预处理数据集。这为希望执行机器学习操作的各个领域的人们提供了机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Computerized Data-Preprocessing To Improve Data Quality
Machine Learning (ML) has seen a sudden exponential rise in past decades. Numerous resources and documentation allow people to become ML practitioners. Companies make huge profits out of the analysis and predictions they make. ML Engineers are highly paid for their knowledge in this domain. It has become prevalent and much more comprehensible. One best out of the important stages in ML is Data preprocessing, and feature extraction. In Data Preprocessing itself, there are various tasks one needs to perform accurately to make the data provided. From handling missing values to encoding and normalization, each step has its importance and hence a professional must be adept with each of these steps. Data Preprocessing steps depend upon the type of data provided i.e. categorical data, continuous data, an array of images' pixels or even images themselves. With the requirement to deal with all the cleaning steps, it becomes quite strenuous to learn and become an expert. Moreover, it is time-consuming and does not guarantee expected results. Hence, there is a need to handle this issue. We aim to automate this complete process to ease the work of Machine Learning Engineers and make it more productive. Any user will only have to provide the dataset and does not have to manually select the processing techniques as provided by the latest Data Mining tools. The application will observe the dataset and apply the suitable techniques on its own. Since all the steps will be automated and the user will only have to provide the dataset, even the people who are not familiar with concepts of Machine Learning can pre-process the dataset. This allows the opening of opportunities for people from various domains who desire to perform Machine Learning operations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of a Single Inductor Based Two Input Two Output DC-DC Converter Power Management Scheme with Cascaded Complex Coefficient Filter Control for SyRG DG-SPV-BES Based Standalone System for Remote Areas Sentiment Analysis in Customer Experience in Philippine Courier Delivery Services using VADER Algorithm Thru Chatbot Interviews Design of Automatic Charging System for Electric Vehicles using Rigid-Flexible Manipulator Switched Capacitor Based High-Gain DC-DC Converter for Low-Voltage Power Generation Application
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1