贝叶斯MMSE估计在真实基因组数据上的分类误差及性能

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) Pub Date : 2010-11-01 DOI:10.1109/GENSIPS.2010.5719674

Lori A. Dalton, E. Dougherty

{"title":"贝叶斯MMSE估计在真实基因组数据上的分类误差及性能","authors":"Lori A. Dalton, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719674","DOIUrl":null,"url":null,"abstract":"Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bayesian MMSE estimation of classification error and performance on real genomic data\",\"authors\":\"Lori A. Dalton, E. Dougherty\",\"doi\":\"10.1109/GENSIPS.2010.5719674\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.\",\"PeriodicalId\":388703,\"journal\":{\"name\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2010.5719674\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2010.5719674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于高通量基因组学和蛋白质组学技术的发展，小样本分类器设计已经成为生物和医学界的一个主要问题。由于可用信息有限，分类器误差估计的问题已经受到限制，而重用训练数据进行误差估计的必要性进一步加剧了这一问题。由于误差估计的困难，目前流行的所有技术都是启发式设计，而不是基于统计推断和优化的严格设计。然而，最近提出的误差估计器将问题置于存在不确定性的最优均方误差(MSE)信号估计框架中。这就产生了基于参数化特征标签分布的贝叶斯误差估计方法。这些贝叶斯误差估计器在给定分布族上平均时是最优的，在给定分布族和所有样本上平均时是无偏的，并且在分析上解决了鲁棒性(建模假设)和准确性(最小均方误差)之间的权衡。对于两个重要的例子:离散分类问题和高斯分布的线性分类问题，已经给出了封闭形式的解。本文讨论了贝叶斯最小均方误差(MMSE)误差估计器，并在高斯建模假设下演示了其在真实生物数据上的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bayesian MMSE estimation of classification error and performance on real genomic data

Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

自引率

0.00%

发文量