Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling

IgMin Research Pub Date : 2024-01-23 DOI:10.61927/igmin140
Ayub Hina, Jamil Harun
{"title":"Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling","authors":"Ayub Hina, Jamil Harun","doi":"10.61927/igmin140","DOIUrl":null,"url":null,"abstract":"This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss. Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis against traditional methods—zero, mean, and KNN imputation—consistently favors our transformer model. Importantly, LSTM validation further underscores the superior performance of our approach. In hourly data, our model achieves a remarkable R2 score of 0.96, surpassing KNN imputation by 0.195. For daily data, the R2 score of 0.806 outperforms KNN imputation by 0.015 and exhibits a notable superiority of 0.25 over mean imputation. Additionally, in monthly data, the proposed model’s R2 score of 0.796 excels, showcasing a significant improvement of 0.1 over mean imputation. These compelling results highlight the proposed model’s ability to capture underlying patterns, offering valuable insights for enhancing missing values imputation in data analyses.","PeriodicalId":509147,"journal":{"name":"IgMin Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IgMin Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61927/igmin140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss. Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis against traditional methods—zero, mean, and KNN imputation—consistently favors our transformer model. Importantly, LSTM validation further underscores the superior performance of our approach. In hourly data, our model achieves a remarkable R2 score of 0.96, surpassing KNN imputation by 0.195. For daily data, the R2 score of 0.806 outperforms KNN imputation by 0.015 and exhibits a notable superiority of 0.25 over mean imputation. Additionally, in monthly data, the proposed model’s R2 score of 0.796 excels, showcasing a significant improvement of 0.1 over mean imputation. These compelling results highlight the proposed model’s ability to capture underlying patterns, offering valuable insights for enhancing missing values imputation in data analyses.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过基于变换器的预测建模加强缺失值推算
本文探讨了数据预处理中缺失值估算的重要问题,传统的零值、均值和 KNN 估算技术无法捕捉到错综复杂的数据关系。这通常会导致次优结果,而丢弃缺失值记录会导致严重的信息丢失。我们的创新方法利用了在处理序列数据方面享有盛誉的先进转换器模型。所提出的预测框架可训练转换器模型来预测缺失值,从而显著提高估算的准确性。与传统方法--零值、均值和 KNN 估算--的比较分析一致看好我们的转换器模型。重要的是,LSTM 验证进一步强调了我们方法的卓越性能。在每小时数据中,我们的模型获得了 0.96 的出色 R2 分数,比 KNN 估算高出 0.195。在日数据中,我们的 R2 得分为 0.806,比 KNN 估算高出 0.015,比平均估算高出 0.25。此外,在月度数据中,拟议模型的 R2 得分为 0.796,比平均估算显著提高了 0.1。这些令人信服的结果凸显了拟议模型捕捉潜在模式的能力,为加强数据分析中的缺失值估算提供了宝贵的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Levetiracetam-induced Rhabdomyolysis - A Rare Complication Biomimetic Synthesis of Calcium Carbonate in Bile in the presence of Amino Acids The Influence of Low Pesticide Doses on Fusarium Molds Relationship between Sustainable Development, Economy and Poverty The Imperative for Modern Public Cloud Providers to Upgrade Their Data Centers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1