A novel ranked k-nearest neighbors algorithm for missing data imputation.

IF 1.1 4区 数学 Q2 STATISTICS & PROBABILITY Journal of Applied Statistics Pub Date : 2024-10-11 eCollection Date: 2025-01-01 DOI:10.1080/02664763.2024.2414357
Yasir Khan, Said Farooq Shah, Syed Muhammad Asim
{"title":"A novel ranked <i>k</i>-nearest neighbors algorithm for missing data imputation.","authors":"Yasir Khan, Said Farooq Shah, Syed Muhammad Asim","doi":"10.1080/02664763.2024.2414357","DOIUrl":null,"url":null,"abstract":"<p><p>Missing data is a common problem in many domains that rely on data analysis. The <i>k</i> Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of <i>k</i>. In this study, we proposed a method, Ranked <i>k</i> Nearest Neighbors imputation that uses a similar approach to <i>k</i> Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard <i>k</i> nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of <i>k</i>. Thus, the proposed Ranked <i>k</i> Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 5","pages":"1103-1127"},"PeriodicalIF":1.1000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951327/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2024.2414357","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Missing data is a common problem in many domains that rely on data analysis. The k Nearest Neighbors imputation method has been widely used to address this issue, but it has limitations in accurately imputing missing values, especially for datasets with small pairwise correlations and small values of k. In this study, we proposed a method, Ranked k Nearest Neighbors imputation that uses a similar approach to k Nearest Neighbor, but utilizing the concept of Ranked set sampling to select the most relevant neighbors for imputation. Our results show that the proposed method outperforms the standard k nearest neighbor method in terms of imputation accuracy both in case of Missing Completely at Random and Missing at Random mechanism, as demonstrated by consistently lower MSIE and MAIE values across all datasets. This suggests that the proposed method is a promising alternative for imputing missing values in datasets with small pairwise correlations and small values of k. Thus, the proposed Ranked k Nearest Neighbor method has important implications for data imputation in various domains and can contribute to the development of more efficient and accurate imputation methods without adding any computational complexity to an algorithm.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种新的k近邻排序缺失数据输入算法。
在许多依赖数据分析的领域中,丢失数据是一个常见的问题。k近邻归算方法已被广泛用于解决这一问题,但它在准确归算缺失值方面存在局限性,特别是对于具有小成对相关性和小k值的数据集。在本研究中,我们提出了一种方法,rank k Nearest Neighbors imputation,它使用类似于k近邻的方法,但利用rank集抽样的概念来选择最相关的邻居进行归算。我们的研究结果表明,在完全随机缺失和随机缺失机制的情况下,所提出的方法在imputation精度方面优于标准k近邻方法,所有数据集的MSIE和MAIE值都始终较低。这表明,所提出的方法是一种有希望的替代方法,用于在具有小成对相关性和小k值的数据集中输入缺失值。因此,所提出的排名k最近邻方法对各个领域的数据输入具有重要意义,并且可以有助于开发更有效和准确的输入方法,而不会增加算法的计算复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Applied Statistics
Journal of Applied Statistics 数学-统计学与概率论
CiteScore
3.40
自引率
0.00%
发文量
126
审稿时长
6 months
期刊介绍: Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.
期刊最新文献
A review and comparison of methods of testing for heteroskedasticity in the linear regression model. A review and comparison of methods of parameter estimation and inference for heteroskedastic linear regression models. An empirical Bayes approach for constructing confidence intervals for clonality and entropy. Optimal distributed subsampling for accelerated failure time models with massive censored data. Inconsistency of three indices in measuring the association between the risk factor and the risk of a disease.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1