首页 > 最新文献

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)最新文献

英文 中文
RMS bounds and sample size considerations for error estimation in linear discriminant analysis 线性判别分析中误差估计的RMS界和样本量考虑
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719691
A. Zollanvari, U. Braga-Neto, E. Dougherty
The validity of a classifier depends on the precision of the error estimator used to estimate its true error. This paper considers the necessary sample size to achieve a given validity measure, namely RMS, for resubstitution and leave-one-out error estimators in the context of LDA. It provides bounds for the RMS between the true error and both the resubstitution and leave-one-out error estimators in terms of sample size and dimensionality. These bounds can be used to determine the minimum sample size in order to obtain a desired estimation accuracy, relative to RMS. To show how these results can be used in practice, a microarray classification problem is presented.
分类器的有效性取决于用于估计其真实误差的误差估计器的精度。本文考虑了在LDA背景下,对于重替换和留一误差估计器,实现给定有效性度量(即RMS)所需的样本量。它根据样本量和维数提供了真实误差与重新替换和留一误差估计器之间的均方根的界限。这些界限可用于确定最小样本量,以便获得所需的估计精度,相对于均方根。为了展示这些结果如何在实践中使用,提出了一个微阵列分类问题。
{"title":"RMS bounds and sample size considerations for error estimation in linear discriminant analysis","authors":"A. Zollanvari, U. Braga-Neto, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719691","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719691","url":null,"abstract":"The validity of a classifier depends on the precision of the error estimator used to estimate its true error. This paper considers the necessary sample size to achieve a given validity measure, namely RMS, for resubstitution and leave-one-out error estimators in the context of LDA. It provides bounds for the RMS between the true error and both the resubstitution and leave-one-out error estimators in terms of sample size and dimensionality. These bounds can be used to determine the minimum sample size in order to obtain a desired estimation accuracy, relative to RMS. To show how these results can be used in practice, a microarray classification problem is presented.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122497947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of partial reporting of classification results 分类结果部分报告的影响
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719688
Mohammadmahdi R. Yousefi, Jianping Hua, Chao Sima, E. Dougherty
When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error estimation, resulting in highly imprecise reported performances. This suggests that the reported performance of the proposed scheme would be less correlated with and highly biased from the actual performance if only the best results are demonstrated. This paper confirms this by showing the behavior of the joint distributions of the minimum reported estimated errors and corresponding true errors as functions of the number of samples tested in a large simulation study using both modeled and real data.
当提出一种新的分类方案时,可能是以分类规则或特征选择方法的形式,生物信息学文献中的建模者通常会报告其在感兴趣的数据集(如基因表达微阵列)上的性能。这些数据集通常包含数千个特征,但样本点数量很少,这增加了特征选择和误差估计的可变性,导致报告的性能非常不精确。这表明,如果只展示最佳结果,所提出方案的报告性能与实际性能的相关性较小,并且与实际性能存在高度偏差。本文通过展示最小报告估计误差和相应的真实误差的联合分布的行为作为在使用模型和实际数据的大型模拟研究中测试的样本数量的函数来证实这一点。
{"title":"Effects of partial reporting of classification results","authors":"Mohammadmahdi R. Yousefi, Jianping Hua, Chao Sima, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719688","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719688","url":null,"abstract":"When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error estimation, resulting in highly imprecise reported performances. This suggests that the reported performance of the proposed scheme would be less correlated with and highly biased from the actual performance if only the best results are demonstrated. This paper confirms this by showing the behavior of the joint distributions of the minimum reported estimated errors and corresponding true errors as functions of the number of samples tested in a large simulation study using both modeled and real data.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based sequential base calling for Illumina sequencing 基于模型的序列碱基调用Illumina测序
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719675
Shreepriya Das, H. Vikalo, A. Hassibi
In this paper, we study the efficacy of a model-based base-calling approach for Illumina's sequencing platforms. In particular, we investigate Genome Analyzer I reads and provide a detailed biochemical model of the sequencing process, incorporating various non-idealities evident in such systems. Parameters of the model are estimated via a supervised learning based on the particle swarm optimization technique. A computationally efficient sequential decoding method is proposed for base-calling. It is demonstrated that the performance of the proposed approach is comparable to Illumina's base-calling method.
在本文中,我们研究了基于模型的碱基调用方法对Illumina测序平台的有效性。特别是,我们研究了基因组分析仪I的读取,并提供了测序过程的详细生化模型,其中包含了这些系统中明显的各种非理想性。通过基于粒子群优化技术的监督学习对模型参数进行估计。提出了一种计算效率高的基调用顺序解码方法。结果表明,该方法的性能与Illumina的碱基调用方法相当。
{"title":"Model-based sequential base calling for Illumina sequencing","authors":"Shreepriya Das, H. Vikalo, A. Hassibi","doi":"10.1109/GENSIPS.2010.5719675","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719675","url":null,"abstract":"In this paper, we study the efficacy of a model-based base-calling approach for Illumina's sequencing platforms. In particular, we investigate Genome Analyzer I reads and provide a detailed biochemical model of the sequencing process, incorporating various non-idealities evident in such systems. Parameters of the model are estimated via a supervised learning based on the particle swarm optimization technique. A computationally efficient sequential decoding method is proposed for base-calling. It is demonstrated that the performance of the proposed approach is comparable to Illumina's base-calling method.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129986131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Control of stochastic master equation models of genetic regulatory networks by approximating their average behavior 遗传调控网络随机主方程模型的平均行为近似控制
Pub Date : 2010-10-22 DOI: 10.1109/GENSIPS.2010.5719681
R. Pal, M. Caglar
Stochastic master equation (SME) models can provide detailed representation of genetic regulatory system but their use is restricted by the large data requirements for parameter inference and inherent computational complexity involved in its simulation. In this paper, we approximate the expected value of the output distribution of the SME by the output of a deterministic Differential Equation (DE) model. The mapping provides a technique to simulate the average behavior of the system in a computationally inexpensive manner and enables us to use existing tools for DE models to control the system. The effectiveness of the mapping and the subsequent intervention policy design was evaluated through a biological example.
随机主方程(SME)模型可以提供遗传调控系统的详细描述,但其应用受到参数推理的大数据需求和模拟过程中固有的计算复杂性的限制。本文用确定性微分方程(DE)模型的输出来近似估计中小企业的输出分布的期望值。映射提供了一种技术,以一种计算成本低廉的方式模拟系统的平均行为,并使我们能够使用现有的DE模型工具来控制系统。通过一个生物实例评估了绘制地图和随后干预政策设计的有效性。
{"title":"Control of stochastic master equation models of genetic regulatory networks by approximating their average behavior","authors":"R. Pal, M. Caglar","doi":"10.1109/GENSIPS.2010.5719681","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719681","url":null,"abstract":"Stochastic master equation (SME) models can provide detailed representation of genetic regulatory system but their use is restricted by the large data requirements for parameter inference and inherent computational complexity involved in its simulation. In this paper, we approximate the expected value of the output distribution of the SME by the output of a deterministic Differential Equation (DE) model. The mapping provides a technique to simulate the average behavior of the system in a computationally inexpensive manner and enables us to use existing tools for DE models to control the system. The effectiveness of the mapping and the subsequent intervention policy design was evaluated through a biological example.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122653135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Inference of gene predictor set using Boolean satisfiability 基于布尔可满足性的基因预测集推断
Pub Date : 2010-05-26 DOI: 10.1109/GENSIPS.2010.5719678
P. Lin, S. Khatri
The inference of gene predictors in the gene regulatory network (GRN) has become an important research area in the genomics and medical disciplines. Accurate predicators are necessary for constructing the GRN model and to enable targeted biological experiments that attempt to validate or control the regulation process. In this paper, we implement a SAT-based algorithm to determine the gene predictor set from steady state gene expression data (attractor states). Using the attractor states as input, the states are ordered into attractor cycles. For each attractor cycle ordering, all possible predictors are enumerated and a conjunctive normal form (CNF) expression is generated which encodes these predictors and their biological constraints. Each CNF is solved using a SAT solver to find candidate predictor sets. Statistical analysis of the resulting predictor sets selects the most likely predictor set of the GRN, corresponding to the attractor data. We demonstrate our algorithm on attractor state data from a melanoma study [1] and present our predictor set results.
基因调控网络(GRN)中基因预测因子的推断已成为基因组学和医学领域的重要研究领域。准确的预测器对于构建GRN模型和实现有针对性的生物学实验是必要的,这些实验试图验证或控制调节过程。在本文中,我们实现了一种基于sat的算法,从稳态基因表达数据(吸引子状态)中确定基因预测集。使用吸引子状态作为输入,将状态排序到吸引子循环中。对于每个吸引子循环排序,列举了所有可能的预测因子,并生成了一个合取范式(CNF)表达式,该表达式对这些预测因子及其生物学约束进行编码。使用SAT求解器求解每个CNF以找到候选预测集。对结果预测集进行统计分析,选择最可能的GRN预测集,对应于吸引子数据。我们在黑色素瘤研究b[1]的吸引子状态数据上展示了我们的算法,并展示了我们的预测集结果。
{"title":"Inference of gene predictor set using Boolean satisfiability","authors":"P. Lin, S. Khatri","doi":"10.1109/GENSIPS.2010.5719678","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719678","url":null,"abstract":"The inference of gene predictors in the gene regulatory network (GRN) has become an important research area in the genomics and medical disciplines. Accurate predicators are necessary for constructing the GRN model and to enable targeted biological experiments that attempt to validate or control the regulation process. In this paper, we implement a SAT-based algorithm to determine the gene predictor set from steady state gene expression data (attractor states). Using the attractor states as input, the states are ordered into attractor cycles. For each attractor cycle ordering, all possible predictors are enumerated and a conjunctive normal form (CNF) expression is generated which encodes these predictors and their biological constraints. Each CNF is solved using a SAT solver to find candidate predictor sets. Statistical analysis of the resulting predictor sets selects the most likely predictor set of the GRN, corresponding to the attractor data. We demonstrate our algorithm on attractor state data from a melanoma study [1] and present our predictor set results.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1