{"title":"A Method of Filling Missing Values in Data using Data Mining","authors":"Niyaz Sharifyanov, V. Latypova","doi":"10.1109/ITNT57377.2023.10139280","DOIUrl":null,"url":null,"abstract":"Organizations are now using information systems that interact with large amounts of data everywhere. The collected data is subjected to statistical and intellectual analysis. To improve the quality of input data, intelligent methods are actively used for data recovery, including for filling missing values in data. However, the task of improving the quality of the obtained data remains relevant. Also acute is the issue associated with resource intensity and high time costs when using existing solutions to fill in missing values in data. The article proposes a modified method of filling missing values in data, based on the k-nearest neighbors’ method, and its implementation that solves these problems. The method has been successfully tested on two data sets: on the generated data and on the data of the OpenWeatherMap service that provides weather data, on the example of the city of Ufa. The proposed method showed better results compared to existing methods: simple data recovery (based on the calculation of the mean, median, mode), k-nearest neighbors-based methods, random forest, predictive mean matching and the method of building regression models.","PeriodicalId":296438,"journal":{"name":"2023 IX International Conference on Information Technology and Nanotechnology (ITNT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IX International Conference on Information Technology and Nanotechnology (ITNT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNT57377.2023.10139280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Organizations are now using information systems that interact with large amounts of data everywhere. The collected data is subjected to statistical and intellectual analysis. To improve the quality of input data, intelligent methods are actively used for data recovery, including for filling missing values in data. However, the task of improving the quality of the obtained data remains relevant. Also acute is the issue associated with resource intensity and high time costs when using existing solutions to fill in missing values in data. The article proposes a modified method of filling missing values in data, based on the k-nearest neighbors’ method, and its implementation that solves these problems. The method has been successfully tested on two data sets: on the generated data and on the data of the OpenWeatherMap service that provides weather data, on the example of the city of Ufa. The proposed method showed better results compared to existing methods: simple data recovery (based on the calculation of the mean, median, mode), k-nearest neighbors-based methods, random forest, predictive mean matching and the method of building regression models.