{"title":"不平衡数据集的欠采样和过采样学习方法","authors":"Chun-Wu Yeh, Der-Chiang Li, Liang-Sian Lin, Tung-I Tsai","doi":"10.1109/IIAI-AAI.2016.20","DOIUrl":null,"url":null,"abstract":"It is difficult for learning models to achieve high classification performance with imbalanced data sets. To conquer the problem, this study presents a strategy involving the reduction of size of majority data set and the generation of synthetic samples of minority data set. Parkinson's disease data set is used to examine and to compare the performance of classification methods. The paired t-tests are also used to show the effectiveness of the proposed method compari.ng with that of the other methods.","PeriodicalId":272739,"journal":{"name":"2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets\",\"authors\":\"Chun-Wu Yeh, Der-Chiang Li, Liang-Sian Lin, Tung-I Tsai\",\"doi\":\"10.1109/IIAI-AAI.2016.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is difficult for learning models to achieve high classification performance with imbalanced data sets. To conquer the problem, this study presents a strategy involving the reduction of size of majority data set and the generation of synthetic samples of minority data set. Parkinson's disease data set is used to examine and to compare the performance of classification methods. The paired t-tests are also used to show the effectiveness of the proposed method compari.ng with that of the other methods.\",\"PeriodicalId\":272739,\"journal\":{\"name\":\"2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIAI-AAI.2016.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2016.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets
It is difficult for learning models to achieve high classification performance with imbalanced data sets. To conquer the problem, this study presents a strategy involving the reduction of size of majority data set and the generation of synthetic samples of minority data set. Parkinson's disease data set is used to examine and to compare the performance of classification methods. The paired t-tests are also used to show the effectiveness of the proposed method compari.ng with that of the other methods.