使用交叉验证的稳健模型选择:在生物医学基因组学应用中开发稳健基因签名的简单迭代技术

R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky
{"title":"使用交叉验证的稳健模型选择:在生物医学基因组学应用中开发稳健基因签名的简单迭代技术","authors":"R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky","doi":"10.1109/ICMLA.2006.45","DOIUrl":null,"url":null,"abstract":"The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications\",\"authors\":\"R. Venkatesh, C. Rowland, Hongjin Huang, Olivia T. Abar, J. Sninsky\",\"doi\":\"10.1109/ICMLA.2006.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model\",\"PeriodicalId\":297071,\"journal\":{\"name\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2006.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

本文提出的迭代技术为在基因组学和基因表达研究等大数据环境中选择稳健模型提供了一种有效的方法,其中标记数大于样本数。当没有独立的测试集并使用交叉验证作为验证步骤时,此技术可能非常有用。它消除了围绕最终模型选择过程的许多歧义,提供了一种计算简单且透明的方法来选择鲁棒模型。鲁棒模型选择主要是利用重复交叉验证实验中选择的标记物的折叠频率,直接有效地完成模型选择。该技术在特征选择和分类方面都不是特定于方法的,因此可以与不同的特征选择和分类方法集一起使用。这种技术的有用性甚至扩展到独立测试集可用的情况。使用这种技术,可以从特征选择过程中挤出额外的性能,并增加在独立测试集中复制的几率。通常只有一个测试集可用,在这种情况下,使用此技术可以帮助避免重复使用测试集。本研究中描述的技术的可用性在开发生物医学基因组应用方面具有很大的实用价值,例如分子诊断测试。该技术成功地应用于一个复杂的真实世界数据集,并在模型的紧凑性、准确性和可泛化性方面得到了显著改善
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications
The iterative technique proposed in this paper provides an effective way to select a robust model in wide data settings such as in genomics and gene expression studies where number of markers Gt number of samples. This technique can be quite useful when an independent test set is not available and crossvalidation is used as a validation step. It removes many of the ambiguities surrounding the final model selection process giving a computationally simple and transparent way to choose a robust model. The robust model selection is mainly accomplished by utilizing the fold frequencies of markers selected in repeated crossvalidation experiments in a direct and effective manner. The technique, both in terms of feature selection and classification is not method specific and therefore can be used with different sets of feature selection and classification methods. The usefulness of this technique extends even to situations where independent test set is available. Using this technique it allows one to squeeze extra performance out of the feature selection procedure and increase the odds of replication in an independent test set. Frequently only one test set is available and in this case use of this technique can help avoid repeated use of the test set. Availability of techniques such as one described in this study can be of great practical value in developing biomedical genomic applications e.g., molecular diagnostic tests. The technique was successfully applied to a complex real world data set and significant improvements were demonstrated in terms of compactness, accuracy and generalizability of the model
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Efficient Heuristic for Discovering Multiple Ill-Defined Attributes in Datasets Robust Model Selection Using Cross Validation: A Simple Iterative Technique for Developing Robust Gene Signatures in Biomedical Genomics Applications Detecting Web Content Function Using Generalized Hidden Markov Model Naive Bayes Classification Given Probability Estimation Trees A New Machine Learning Technique Based on Straight Line Segments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1