R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky
{"title":"Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications","authors":"R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky","doi":"10.1109/ICMLA.2006.45","DOIUrl":null,"url":null,"abstract":"The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model