{"title":"经验:区分孤立数据和序列缺失数据","authors":"Amal Tawakuli, Daniel Kaiser, T. Engel","doi":"10.1145/3575809","DOIUrl":null,"url":null,"abstract":"Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"22 1","pages":"1 - 15"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Experience: Differentiating Between Isolated and Sequence Missing Data\",\"authors\":\"Amal Tawakuli, Daniel Kaiser, T. Engel\",\"doi\":\"10.1145/3575809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.\",\"PeriodicalId\":44355,\"journal\":{\"name\":\"ACM Journal of Data and Information Quality\",\"volume\":\"22 1\",\"pages\":\"1 - 15\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal of Data and Information Quality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Experience: Differentiating Between Isolated and Sequence Missing Data
Missing data is one of the most persistent problems found in data that hinders information and value extraction. Handling missing data is a preprocessing task that has been extensively studied by the research community and remains an active research topic due to its impact and pervasiveness. Many surveys have been conducted to evaluate traditional and state-of-the-art techniques, however, the accuracy of missing data imputation techniques is evaluated without differentiating between isolated and sequence missing instances. In this article, we highlight the presence of both of these types of missing data at different percentages in real-world time-series datasets. We demonstrate that existing imputation techniques have different estimation accuracies for isolated and sequence missing instances. We then propose using a hybrid approach that differentiate between the two types of missing data to yield improved overall imputation accuracy.