经验:区分孤立数据和序列缺失数据

IF 2.9 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Journal of Data and Information Quality Pub Date : 2023-01-19 DOI:10.1145/3575809

Amal Tawakuli, Daniel Kaiser, T. Engel

{"title":"经验:区分孤立数据和序列缺失数据","authors":"Amal Tawakuli, Daniel Kaiser, T. Engel","doi":"10.1145/3575809","DOIUrl":null,"url":null,"abstract":"Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"22 1","pages":"1 - 15"},"PeriodicalIF":2.9000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Experience: Differentiating Between Isolated and Sequence Missing Data\",\"authors\":\"Amal Tawakuli, Daniel Kaiser, T. Engel\",\"doi\":\"10.1145/3575809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.\",\"PeriodicalId\":44355,\"journal\":{\"name\":\"ACM Journal of Data and Information Quality\",\"volume\":\"22 1\",\"pages\":\"1 - 15\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal of Data and Information Quality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 3

摘要

数据缺失是阻碍信息和价值提取的数据中最持久的问题之一。缺失数据处理是一项预处理任务，已被研究界广泛研究，由于其影响和普遍性，仍然是一个活跃的研究课题。已经进行了许多调查来评估传统和最先进的技术，然而，在没有区分孤立和序列缺失实例的情况下评估缺失数据插入技术的准确性。在本文中，我们重点介绍了这两种类型的缺失数据在实际时间序列数据集中的不同百分比。我们证明了现有的估算技术对孤立的和序列缺失的实例有不同的估计精度。然后，我们建议使用一种混合方法来区分两种类型的缺失数据，以提高总体插补精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Experience: Differentiating Between Isolated and Sequence Missing Data

Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊