Vinicius M. A. Souza, T. P. D. Silva, Gustavo E. A. P. A. Batista
{"title":"使用延迟标签信息评估流分类器","authors":"Vinicius M. A. Souza, T. P. D. Silva, Gustavo E. A. P. A. Batista","doi":"10.1109/BRACIS.2018.00077","DOIUrl":null,"url":null,"abstract":"In general, data stream classifiers consider that the actual label of every unlabeled instance is available immediately after it issues a classification. The immediate availability of class labels allows the supervised monitoring of the data distribution and the error rate to verify whether the current classifier is outdated. Further, if a change is detected, the classifier has access to all recent labeled data to update the model. However, this assumption is very optimistic for most (if not all) applications. Given the costs and labor involved to obtain labels, failures in data acquisition or restrictions of the classification problem, a more reasonable assumption would be to consider the delayed availability of class labels. In this paper, we experimentally analyze the impact of latency on the performance of stream classifiers and call the attention of the community for the need to consider this critical variable in the evaluation process. We also make suggestions to avoid possible biased conclusions due to ignoring the delayed nature of stream problems. These are relevant contributions since few studies consider this variable in new algorithms proposals. Also, we propose a new evaluation measure (Kappa-Latency) that takes into account the arrival delay of actual labels to evaluate and compare a set of classifiers.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Evaluating Stream Classifiers with Delayed Labels Information\",\"authors\":\"Vinicius M. A. Souza, T. P. D. Silva, Gustavo E. A. P. A. Batista\",\"doi\":\"10.1109/BRACIS.2018.00077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In general, data stream classifiers consider that the actual label of every unlabeled instance is available immediately after it issues a classification. The immediate availability of class labels allows the supervised monitoring of the data distribution and the error rate to verify whether the current classifier is outdated. Further, if a change is detected, the classifier has access to all recent labeled data to update the model. However, this assumption is very optimistic for most (if not all) applications. Given the costs and labor involved to obtain labels, failures in data acquisition or restrictions of the classification problem, a more reasonable assumption would be to consider the delayed availability of class labels. In this paper, we experimentally analyze the impact of latency on the performance of stream classifiers and call the attention of the community for the need to consider this critical variable in the evaluation process. We also make suggestions to avoid possible biased conclusions due to ignoring the delayed nature of stream problems. These are relevant contributions since few studies consider this variable in new algorithms proposals. Also, we propose a new evaluation measure (Kappa-Latency) that takes into account the arrival delay of actual labels to evaluate and compare a set of classifiers.\",\"PeriodicalId\":405190,\"journal\":{\"name\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRACIS.2018.00077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2018.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Stream Classifiers with Delayed Labels Information
In general, data stream classifiers consider that the actual label of every unlabeled instance is available immediately after it issues a classification. The immediate availability of class labels allows the supervised monitoring of the data distribution and the error rate to verify whether the current classifier is outdated. Further, if a change is detected, the classifier has access to all recent labeled data to update the model. However, this assumption is very optimistic for most (if not all) applications. Given the costs and labor involved to obtain labels, failures in data acquisition or restrictions of the classification problem, a more reasonable assumption would be to consider the delayed availability of class labels. In this paper, we experimentally analyze the impact of latency on the performance of stream classifiers and call the attention of the community for the need to consider this critical variable in the evaluation process. We also make suggestions to avoid possible biased conclusions due to ignoring the delayed nature of stream problems. These are relevant contributions since few studies consider this variable in new algorithms proposals. Also, we propose a new evaluation measure (Kappa-Latency) that takes into account the arrival delay of actual labels to evaluate and compare a set of classifiers.