{"title":"单独抽样下混合概率对交叉验证偏差的影响","authors":"A. Zollanvari, U. Braga-Neto, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735947","DOIUrl":null,"url":null,"abstract":"Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be “estimated” outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these “estimates”. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of mixing probabilities on the bias of cross-validation under separate sampling\",\"authors\":\"A. Zollanvari, U. Braga-Neto, E. Dougherty\",\"doi\":\"10.1109/GENSIPS.2013.6735947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be “estimated” outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these “estimates”. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.\",\"PeriodicalId\":336511,\"journal\":{\"name\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Workshop on Genomic Signal Processing and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2013.6735947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2013.6735947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effect of mixing probabilities on the bias of cross-validation under separate sampling
Cross-validation is commonly used to estimate the overall error rate of a designed classifier in a small-sample expression study. The true error of the classifier is a function of the prior probabilities of the classes. With random sampling these can be estimated consistently in terms of the class sample sizes, but when sampling is separate, meaning these sample sizes are determined prior to sampling, there are no reasonable estimates from the data and the prior probabilities must be “estimated” outside the experiment. We have conducted a set of simulations to study the bias of cross-validation as a function of these “estimates”. The results show that a poor choice for estimating these probabilities can significantly increase the bias of cross-validation as an estimator of the true error.