U. Rebbapragada, L. Mandrake, K. Wagstaff, D. Gleeson, R. Castaño, Steve Ankuo Chien, C. Brodley
{"title":"通过过滤错误标记的训练数据示例,改进Hyperion图像的机载分析","authors":"U. Rebbapragada, L. Mandrake, K. Wagstaff, D. Gleeson, R. Castaño, Steve Ankuo Chien, C. Brodley","doi":"10.1109/AERO.2009.4839580","DOIUrl":null,"url":null,"abstract":"This paper presents PWEM, a technique for detecting class label noise in training data. PWEM detects mislabeled examples by assigning to each training example a probability that its label is correct. PWEM calculates this probability by clustering examples from pairs of classes together and analyzing the distribution of labels within each cluster to derive the probability of each label's correctness. We discuss how one can use the probabilities output by PWEM to filter, mitigate, or correct mislabeled training examples. We then provide an in-depth discussion of how we applied PWEM to a sulfur detector that labels pixels from Hyperion images of the Borup-Fiord pass in Northern Canada. PWEM assigned a large number of the sulfur training examples low probabilities, indicating severe mislabeling within the sulfur class. The filtering of those low confidence examples resulted in a cleaner training set and improved the median false positive rate of the classifier by at least 29%.","PeriodicalId":117250,"journal":{"name":"2009 IEEE Aerospace conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Improving onboard analysis of Hyperion images by filtering mislabeled training data examples\",\"authors\":\"U. Rebbapragada, L. Mandrake, K. Wagstaff, D. Gleeson, R. Castaño, Steve Ankuo Chien, C. Brodley\",\"doi\":\"10.1109/AERO.2009.4839580\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents PWEM, a technique for detecting class label noise in training data. PWEM detects mislabeled examples by assigning to each training example a probability that its label is correct. PWEM calculates this probability by clustering examples from pairs of classes together and analyzing the distribution of labels within each cluster to derive the probability of each label's correctness. We discuss how one can use the probabilities output by PWEM to filter, mitigate, or correct mislabeled training examples. We then provide an in-depth discussion of how we applied PWEM to a sulfur detector that labels pixels from Hyperion images of the Borup-Fiord pass in Northern Canada. PWEM assigned a large number of the sulfur training examples low probabilities, indicating severe mislabeling within the sulfur class. The filtering of those low confidence examples resulted in a cleaner training set and improved the median false positive rate of the classifier by at least 29%.\",\"PeriodicalId\":117250,\"journal\":{\"name\":\"2009 IEEE Aerospace conference\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Aerospace conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AERO.2009.4839580\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Aerospace conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2009.4839580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving onboard analysis of Hyperion images by filtering mislabeled training data examples
This paper presents PWEM, a technique for detecting class label noise in training data. PWEM detects mislabeled examples by assigning to each training example a probability that its label is correct. PWEM calculates this probability by clustering examples from pairs of classes together and analyzing the distribution of labels within each cluster to derive the probability of each label's correctness. We discuss how one can use the probabilities output by PWEM to filter, mitigate, or correct mislabeled training examples. We then provide an in-depth discussion of how we applied PWEM to a sulfur detector that labels pixels from Hyperion images of the Borup-Fiord pass in Northern Canada. PWEM assigned a large number of the sulfur training examples low probabilities, indicating severe mislabeling within the sulfur class. The filtering of those low confidence examples resulted in a cleaner training set and improved the median false positive rate of the classifier by at least 29%.