使用交叉验证的稳健模型选择:在生物医学基因组学应用中开发稳健基因签名的简单迭代技术

2006 5th International Conference on Machine Learning and Applications (ICMLA'06) Pub Date : 2006-12-14 DOI:10.1109/ICMLA.2006.45

R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky

{"title":"使用交叉验证的稳健模型选择:在生物医学基因组学应用中开发稳健基因签名的简单迭代技术","authors":"R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky","doi":"10.1109/ICMLA.2006.45","DOIUrl":null,"url":null,"abstract":"The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications\",\"authors\":\"R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky\",\"doi\":\"10.1109/ICMLA.2006.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model\",\"PeriodicalId\":297071,\"journal\":{\"name\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2006.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文提出的迭代技术为在基因组学和基因表达研究等大数据环境中选择稳健模型提供了一种有效的方法，其中标记数大于样本数。当没有独立的测试集并使用交叉验证作为验证步骤时，此技术可能非常有用。它消除了围绕最终模型选择过程的许多歧义，提供了一种计算简单且透明的方法来选择鲁棒模型。鲁棒模型选择主要是利用重复交叉验证实验中选择的标记物的折叠频率，直接有效地完成模型选择。该技术在特征选择和分类方面都不是特定于方法的，因此可以与不同的特征选择和分类方法集一起使用。这种技术的有用性甚至扩展到独立测试集可用的情况。使用这种技术，可以从特征选择过程中挤出额外的性能，并增加在独立测试集中复制的几率。通常只有一个测试集可用，在这种情况下，使用此技术可以帮助避免重复使用测试集。本研究中描述的技术的可用性在开发生物医学基因组应用方面具有很大的实用价值，例如分子诊断测试。该技术成功地应用于一个复杂的真实世界数据集，并在模型的紧凑性、准确性和可泛化性方面得到了显著改善

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications

The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)

自引率

0.00%

发文量