Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling

IgMin Research Pub Date : 2024-01-23 DOI:10.61927/igmin140

Ayub Hina, Jamil Harun

{"title":"Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling","authors":"Ayub Hina, Jamil Harun","doi":"10.61927/igmin140","DOIUrl":null,"url":null,"abstract":"This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss. Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis against traditional methods—zero, mean, and KNN imputation—consistently favors our transformer model. Importantly, LSTM validation further underscores the superior performance of our approach. In hourly data, our model achieves a remarkable R2 score of 0.96, surpassing KNN imputation by 0.195. For daily data, the R2 score of 0.806 outperforms KNN imputation by 0.015 and exhibits a notable superiority of 0.25 over mean imputation. Additionally, in monthly data, the proposed model’s R2 score of 0.796 excels, showcasing a significant improvement of 0.1 over mean imputation. These compelling results highlight the proposed model’s ability to capture underlying patterns, offering valuable insights for enhancing missing values imputation in data analyses.","PeriodicalId":509147,"journal":{"name":"IgMin Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IgMin Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61927/igmin140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss. Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis against traditional methods—zero, mean, and KNN imputation—consistently favors our transformer model. Importantly, LSTM validation further underscores the superior performance of our approach. In hourly data, our model achieves a remarkable R2 score of 0.96, surpassing KNN imputation by 0.195. For daily data, the R2 score of 0.806 outperforms KNN imputation by 0.015 and exhibits a notable superiority of 0.25 over mean imputation. Additionally, in monthly data, the proposed model’s R2 score of 0.796 excels, showcasing a significant improvement of 0.1 over mean imputation. These compelling results highlight the proposed model’s ability to capture underlying patterns, offering valuable insights for enhancing missing values imputation in data analyses.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过基于变换器的预测建模加强缺失值推算

本文探讨了数据预处理中缺失值估算的重要问题，传统的零值、均值和 KNN 估算技术无法捕捉到错综复杂的数据关系。这通常会导致次优结果，而丢弃缺失值记录会导致严重的信息丢失。我们的创新方法利用了在处理序列数据方面享有盛誉的先进转换器模型。所提出的预测框架可训练转换器模型来预测缺失值，从而显著提高估算的准确性。与传统方法--零值、均值和 KNN 估算--的比较分析一致看好我们的转换器模型。重要的是，LSTM 验证进一步强调了我们方法的卓越性能。在每小时数据中，我们的模型获得了 0.96 的出色 R2 分数，比 KNN 估算高出 0.195。在日数据中，我们的 R2 得分为 0.806，比 KNN 估算高出 0.015，比平均估算高出 0.25。此外，在月度数据中，拟议模型的 R2 得分为 0.796，比平均估算显著提高了 0.1。这些令人信服的结果凸显了拟议模型捕捉潜在模式的能力，为加强数据分析中的缺失值估算提供了宝贵的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IgMin Research

自引率

0.00%

发文量

期刊最新文献

Levetiracetam-induced Rhabdomyolysis - A Rare Complication Biomimetic Synthesis of Calcium Carbonate in Bile in the presence of Amino Acids The Influence of Low Pesticide Doses on Fusarium Molds Relationship between Sustainable Development, Economy and Poverty The Imperative for Modern Public Cloud Providers to Upgrade Their Data Centers