Imputation of Missing Values using Improved K-Means Clustering Algorithm to Attain Data Quality

Stephin Philip, Pawan Vashisth, Anant Chaturvedi, Neha Gupta
{"title":"Imputation of Missing Values using Improved K-Means Clustering Algorithm to Attain Data Quality","authors":"Stephin Philip, Pawan Vashisth, Anant Chaturvedi, Neha Gupta","doi":"10.1109/ICIRCA51532.2021.9544855","DOIUrl":null,"url":null,"abstract":"A data warehouse aids in the management of large amounts of data that may be stored in order to handle user input during the computer process. The major issue with a data warehouse is to maintain the data that the user stores in good quality. Some traditional techniques can improve data quality while also increasing efficiency. Each unit of data has a unique feature that has been researched by many researchers and has an influence on data quality. This research article has enhanced the K-Means method by utilizing the Euclidean Distance metric to detect missing values from the gathered sources and replace them with closest values while maintaining the data's consistency, exactness, and quality. yThe improved data will assist developers in analysing data quality prior to data integration by allowing them to make informed decisions quickly in accordance with business requirements. Improved K-Means achieves better accuracy and requires less computational time for clustering data objects when compared to other related approaches.","PeriodicalId":245244,"journal":{"name":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIRCA51532.2021.9544855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A data warehouse aids in the management of large amounts of data that may be stored in order to handle user input during the computer process. The major issue with a data warehouse is to maintain the data that the user stores in good quality. Some traditional techniques can improve data quality while also increasing efficiency. Each unit of data has a unique feature that has been researched by many researchers and has an influence on data quality. This research article has enhanced the K-Means method by utilizing the Euclidean Distance metric to detect missing values from the gathered sources and replace them with closest values while maintaining the data's consistency, exactness, and quality. yThe improved data will assist developers in analysing data quality prior to data integration by allowing them to make informed decisions quickly in accordance with business requirements. Improved K-Means achieves better accuracy and requires less computational time for clustering data objects when compared to other related approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于改进k -均值聚类算法的缺失值补全方法
数据仓库有助于管理可能存储的大量数据,以便在计算机过程中处理用户输入。数据仓库的主要问题是维护用户存储的高质量数据。一些传统技术可以在提高效率的同时提高数据质量。每个数据单元都有一个独特的特征,这些特征已经被许多研究者研究过,并对数据质量产生影响。本文改进了K-Means方法,利用欧几里得距离度量来检测收集到的数据源中的缺失值,并用最接近的值替换它们,同时保持数据的一致性、准确性和质量。改进后的数据将帮助开发人员在数据集成之前分析数据质量,使他们能够根据业务需求快速做出明智的决策。与其他相关方法相比,改进的K-Means获得了更好的精度,并且需要更少的计算时间来聚类数据对象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Morse Code Detector and Decoder using Eye Blinks Detection of Social and Newsworthy events using Tweet Analysis An Efficient Workflow Management Model for Fog Computing Application Analysis of Image Enhancement Method in Deep Learning Image Recognition Scene Virtual Learning Assistance for Students
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1