首页 > 最新文献

2013 IEEE International Workshop on Genomic Signal Processing and Statistics最新文献

英文 中文
Boolean model to experimental validation: A preliminary attempt 布尔模型的实验验证:初步尝试
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735945
Sriram Sridharan, A. Datta, Jijayanagaram Venkatraj
There are several publications detailing modeling of biological systems, especially in the post-genomic era. However there is a real dearth of work testing and validating/invalidating mathematical models with experimental data. This work is one of the first attempts trying to bridge this expanding gap. Oxidative stress is a consequence of both normal and abnormal cellular metabolism and is linked to cell proliferation, differentiation and apoptosis through both genetic and epigenetic changes leading to the development of human diseases. Oxidative stress itself is a consequence of the imbalance between pro and anti-oxidative factors generated by cells in response to internal and external cues. A common mechanism for chemotherapeutic agents inducing cell death is through the induction of the generation of free radicals leading to an excess of free radicals. Although the exact mechanism of the molecular signaling that it entails is still being worked upon, however it is clear that this varies with the stage and type of cancer and the drug and dosage used. Key genes in the oxidative stress response pathways were earlier modeled by us using the multivariate Boolean Network Modeling. Here we studied the response of well accepted progressive breast cancer cell lines, the MCF10A series in response to Adriamycin and Cyclophosphamide, two well-known and commonly used chemotherapeutic drugs. We provide evidence that the strategy of using Boolean modeling and laboratory testing of the model, although not a perfect match, is certainly a reasonable one.
有一些出版物详细描述了生物系统的建模,特别是在后基因组时代。然而,用实验数据测试和验证/无效数学模型的工作确实缺乏。这项工作是试图弥合这一不断扩大的差距的第一次尝试之一。氧化应激是正常和异常细胞代谢的结果,并通过导致人类疾病发展的遗传和表观遗传变化与细胞增殖、分化和凋亡有关。氧化应激本身是细胞在响应内部和外部信号时产生的促氧化因子和抗氧化因子之间不平衡的结果。化疗药物诱导细胞死亡的常见机制是通过诱导自由基的产生导致自由基过量。虽然分子信号的确切机制仍在研究中,但很明显,这随着癌症的阶段和类型以及所使用的药物和剂量而变化。氧化应激反应途径中的关键基因早期由我们使用多元布尔网络模型建模。在这里,我们研究了公认的进展性乳腺癌细胞系MCF10A系列对阿霉素和环磷酰胺这两种众所周知且常用的化疗药物的反应。我们提供的证据表明,使用布尔建模和实验室测试模型的策略,虽然不是一个完美的匹配,肯定是一个合理的。
{"title":"Boolean model to experimental validation: A preliminary attempt","authors":"Sriram Sridharan, A. Datta, Jijayanagaram Venkatraj","doi":"10.1109/GENSIPS.2013.6735945","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735945","url":null,"abstract":"There are several publications detailing modeling of biological systems, especially in the post-genomic era. However there is a real dearth of work testing and validating/invalidating mathematical models with experimental data. This work is one of the first attempts trying to bridge this expanding gap. Oxidative stress is a consequence of both normal and abnormal cellular metabolism and is linked to cell proliferation, differentiation and apoptosis through both genetic and epigenetic changes leading to the development of human diseases. Oxidative stress itself is a consequence of the imbalance between pro and anti-oxidative factors generated by cells in response to internal and external cues. A common mechanism for chemotherapeutic agents inducing cell death is through the induction of the generation of free radicals leading to an excess of free radicals. Although the exact mechanism of the molecular signaling that it entails is still being worked upon, however it is clear that this varies with the stage and type of cancer and the drug and dosage used. Key genes in the oxidative stress response pathways were earlier modeled by us using the multivariate Boolean Network Modeling. Here we studied the response of well accepted progressive breast cancer cell lines, the MCF10A series in response to Adriamycin and Cyclophosphamide, two well-known and commonly used chemotherapeutic drugs. We provide evidence that the strategy of using Boolean modeling and laboratory testing of the model, although not a perfect match, is certainly a reasonable one.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A structure-based approach to predicting in vitro transcription factor-DNA interaction 基于结构的体外转录因子- dna相互作用预测方法
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735915
Zhenzhu Gao, Jianhua Ruan
Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.
只提供摘要形式。理解转录调控的机制仍然是分子生物学的一个鼓舞人心的阶段。在目前流行的TFBS建模方法中,位置特定权重矩阵和基于k-mer的方法取得了很大的成功。然而,这两种方法都没有考虑到结合位点的结构特性。最近,Bauer等人(2010)提出了一种新的TFBS建模和预测方法,其中应用了DNA的序列特异性化学和结构特征。然而,在本研究中使用的ChIP-chip试验中观察到的体内蛋白质- dna相互作用并不一定是直接的,因为一些tf倾向于通过其他伙伴广泛地与dna相互作用。因此,对适当的体外数据集进行评估将更适合揭示此类物理化学特征在模拟TF-DNA相互作用中的益处。近年来,体外蛋白结合微阵列实验极大地提高了对转录因子- dna相互作用的认识。这是一种高通量实验,用于测量给定TF与探针阵列上序列的体外结合亲和力。由于消除了基于芯片的实验中存在的转录辅助因子等典型混淆因素,PBM数据为开发TF-DNA相互作用的结构模型提供了极好的信息源。另一方面,直接将3-聚体或4-聚体的元特征映射到候选DNA结合序列可能不能反映TF-DNA结合的性质,因为TFBS通常是8到12个碱基对。因此,传统的机器学习算法依赖于结构良好的特征向量和标签对,可能无法很好地建模PBM数据。在本文中,我们提出了一种新的方法来预测体外转录因子结合基于DNA的结构特性,使用所谓的多实例学习算法。与传统的(基于单实例的)学习算法相比,我们的基于多实例学习的算法不需要了解候选探针序列中实际结合位点的知识,但仍然可以充分利用建模和预测TF-DNA相互作用的物理化学性质。对20个小鼠tf的体外蛋白结合微阵列数据的评估表明,我们的新模型明显优于几种k-mer或基于结构的单实例学习算法。这表明将多实例学习与DNA结构特性相结合在生物调控网络研究中具有广阔的应用前景。
{"title":"A structure-based approach to predicting in vitro transcription factor-DNA interaction","authors":"Zhenzhu Gao, Jianhua Ruan","doi":"10.1109/GENSIPS.2013.6735915","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735915","url":null,"abstract":"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121705408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved branch-and-bound algorithm for U-curve optimization u曲线优化的改进分支定界算法
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735948
E. Atashpaz-Gargari, U. Braga-Neto, E. Dougherty
The U-curve branch-and-bound algorithm for optimization was introduced recently by Ris and collaborators. In this paper we introduce an improved algorithm for finding the optimal set of features based on the U-curve assumption. Synthetic experiments are used to asses the performance of the proposed algorithm, and compare it to exhaustive search and the original algorithm. The results show that the modified U-curve BB algorithm makes fewer evaluations and is more robust than the original algorithm.
u曲线分支定界优化算法是Ris及其合作者最近提出的。本文介绍了一种基于u型曲线假设的优化特征集算法。通过综合实验对所提算法的性能进行了评价,并与穷举搜索和原算法进行了比较。结果表明,改进的u曲线BB算法比原算法求值更少,鲁棒性更强。
{"title":"Improved branch-and-bound algorithm for U-curve optimization","authors":"E. Atashpaz-Gargari, U. Braga-Neto, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735948","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735948","url":null,"abstract":"The U-curve branch-and-bound algorithm for optimization was introduced recently by Ris and collaborators. In this paper we introduce an improved algorithm for finding the optimal set of features based on the U-curve assumption. Synthetic experiments are used to asses the performance of the proposed algorithm, and compare it to exhaustive search and the original algorithm. The results show that the modified U-curve BB algorithm makes fewer evaluations and is more robust than the original algorithm.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122365320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DBComposer: An R package for integrative analysis and management of gene expression microarray data 一个R包,用于基因表达微阵列数据的综合分析和管理
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735944
Lingjia Kong, Kaisa-Leena Aho, Kirsi J. Granberg, Christophe Roos, R. Autio
DBComposer is an R package with a graphical user interface (GUI) to analyze and integrate human gene expression microarray data. With DBComposer, the data can be easily annotated, preprocessed and analyzed in several ways. DBComposer can also serve as a personal expression microarray database allowing users to store multiple datasets together for later retrieval or data analysis. It takes advantage of many R packages for statistics and visualizations, and provides a flexible framework to implement custom workflows to extend the data analysis capabilities.
DBComposer是一个带有图形用户界面(GUI)的R包,用于分析和集成人类基因表达微阵列数据。使用DBComposer,可以通过多种方式轻松地对数据进行注释、预处理和分析。DBComposer还可以用作个人表情微阵列数据库,允许用户将多个数据集存储在一起,以便以后检索或数据分析。它利用了许多R包来进行统计和可视化,并提供了一个灵活的框架来实现自定义工作流,以扩展数据分析功能。
{"title":"DBComposer: An R package for integrative analysis and management of gene expression microarray data","authors":"Lingjia Kong, Kaisa-Leena Aho, Kirsi J. Granberg, Christophe Roos, R. Autio","doi":"10.1109/GENSIPS.2013.6735944","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735944","url":null,"abstract":"DBComposer is an R package with a graphical user interface (GUI) to analyze and integrate human gene expression microarray data. With DBComposer, the data can be easily annotated, preprocessed and analyzed in several ways. DBComposer can also serve as a personal expression microarray database allowing users to store multiple datasets together for later retrieval or data analysis. It takes advantage of many R packages for statistics and visualizations, and provides a flexible framework to implement custom workflows to extend the data analysis capabilities.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128754822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction 离散预测中决定系数的最优贝叶斯MMSE估计
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735933
Ting-Ju Chen, U. Braga-Neto
The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. In previous publications, we have studied several nonparametric CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators, and one parametric maximum-likelihood (ML) CoD estimator that allows the incorporation of available prior knowledge, from a frequentist perspective. However, none of these CoD estimators are rigorously optimized based on statistical inference across a family of possible distributions. Therefore, by following the idea of Bayesian error estimation for classification, we define a Bayesian CoD estimator that minimizes the mean-square error (MSE), based on a parametrized family of joint distributions between predictors and target as a function of random parameters characterized by assumed prior distributions. We derive an exact formulation of the sample-based Bayesian MMSE CoD estimator. Numerical experiments are carried out to estimate performance metrics of the Bayesian CoD estimator and compare them against those of resubstitution, leave-one-out, bootstrap and cross-validation CoD estimators over all the distributions, by employing the Monte Carlo sample method. Results show that the Bayesian CoD estimator has the best performance, displaying zero bias, small variance, and least root mean-square error (RMS).
决定系数(CoD)在基因组学中有重要的应用,例如基因调控网络的推断。在以前的出版物中,我们已经研究了几个非参数CoD估计器,基于重新替换、遗漏、交叉验证和自举误差估计器,以及一个参数最大似然(ML) CoD估计器,从频率论的角度来看,它允许合并可用的先验知识。然而,这些CoD估计器中没有一个是基于一组可能分布的统计推断严格优化的。因此,根据贝叶斯误差估计的分类思想,我们定义了一个贝叶斯CoD估计器,该估计器基于预测器和目标之间的参数化联合分布家族,作为假设先验分布特征的随机参数的函数,使均方误差(MSE)最小化。我们推导了基于样本的贝叶斯MMSE CoD估计的精确公式。采用蒙特卡罗样本方法,对贝叶斯CoD估计器的性能指标进行了数值实验,并将其与所有分布的重替换、留一、自举和交叉验证CoD估计器进行了比较。结果表明,贝叶斯CoD估计方法具有零偏差、方差小、均方根误差(RMS)最小的性能。
{"title":"Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction","authors":"Ting-Ju Chen, U. Braga-Neto","doi":"10.1109/GENSIPS.2013.6735933","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735933","url":null,"abstract":"The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. In previous publications, we have studied several nonparametric CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators, and one parametric maximum-likelihood (ML) CoD estimator that allows the incorporation of available prior knowledge, from a frequentist perspective. However, none of these CoD estimators are rigorously optimized based on statistical inference across a family of possible distributions. Therefore, by following the idea of Bayesian error estimation for classification, we define a Bayesian CoD estimator that minimizes the mean-square error (MSE), based on a parametrized family of joint distributions between predictors and target as a function of random parameters characterized by assumed prior distributions. We derive an exact formulation of the sample-based Bayesian MMSE CoD estimator. Numerical experiments are carried out to estimate performance metrics of the Bayesian CoD estimator and compare them against those of resubstitution, leave-one-out, bootstrap and cross-validation CoD estimators over all the distributions, by employing the Monte Carlo sample method. Results show that the Bayesian CoD estimator has the best performance, displaying zero bias, small variance, and least root mean-square error (RMS).","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Prediction of drug-target interactions using popular Collaborative Filtering methods 使用流行的协同过滤方法预测药物-靶标相互作用
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735931
A. Koohi
Computational approaches for predicting drug-protein interactions have gained more attention in recent years. The main reason is that a correct prediction based on screening a database of small molecules against a certain class of protein can potentially accelerate drug discovery. In this paper a popular prediction method, collaborative filtering in recommender systems, is evaluated for the prediction of drug-protein interaction. The interaction matrix for the drug-protein and the rating matrix of user-item are similar and in both cases only a small subset of the matrices are known. The CF (collaborative filtering) methods are evaluated on four classes of proteins and AUC (Area under receiver operating characteristic curve) and AUPR (Area under precision-recall curve) are reported. It is shown that collaborative filtering methods can be effective in the prediction of drug-target interaction based on the known interaction matrix. These results highlight the importance of using the known interaction matrix in order to achieve high accuracy and precision in prediction.
近年来,预测药物-蛋白质相互作用的计算方法得到了越来越多的关注。主要原因是,基于筛选针对某一类蛋白质的小分子数据库的正确预测可能会加速药物的发现。本文评价了推荐系统中常用的协同过滤预测方法对药物-蛋白质相互作用的预测效果。药物-蛋白质的相互作用矩阵和用户-物品的评级矩阵是相似的,在这两种情况下,只有一小部分矩阵是已知的。在四类蛋白质上对协同滤波方法进行了评价,并报道了接收者工作特征曲线下面积(AUC)和精确召回曲线下面积(AUPR)。研究结果表明,协同过滤方法可以有效地预测已知的药物-靶标相互作用矩阵。这些结果突出了利用已知的相互作用矩阵在预测中达到高精度和精密度的重要性。
{"title":"Prediction of drug-target interactions using popular Collaborative Filtering methods","authors":"A. Koohi","doi":"10.1109/GENSIPS.2013.6735931","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735931","url":null,"abstract":"Computational approaches for predicting drug-protein interactions have gained more attention in recent years. The main reason is that a correct prediction based on screening a database of small molecules against a certain class of protein can potentially accelerate drug discovery. In this paper a popular prediction method, collaborative filtering in recommender systems, is evaluated for the prediction of drug-protein interaction. The interaction matrix for the drug-protein and the rating matrix of user-item are similar and in both cases only a small subset of the matrices are known. The CF (collaborative filtering) methods are evaluated on four classes of proteins and AUC (Area under receiver operating characteristic curve) and AUPR (Area under precision-recall curve) are reported. It is shown that collaborative filtering methods can be effective in the prediction of drug-target interaction based on the known interaction matrix. These results highlight the importance of using the known interaction matrix in order to achieve high accuracy and precision in prediction.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128129725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A generic model of transcriptional regulatory networks: Application to plants under abiotic stress 转录调控网络的通用模型:在非生物胁迫下植物的应用
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735922
A. Tchagang, Sieu Phan, Fazel Famili, Youlian Pan, A. Cutler, Jitao Zou
Understanding the relationships between transcription factors (TFs) and genes in plants under abiotic stress responses, tolerance and adaptation to adverse environments is very important in developing resilient crop varieties. While experimental methods to characterize stress responsive TFs and their targets are highly accurate, identification and characterization of the role of a given gene in a given stress response event are often laborious and time consuming. Computational approaches, on the other hand, offer a platform to identify new knowledge by integrating high throughput omics data and mathematical methods/models. In this research, we have developed a generic linear model of transcriptional regulatory networks (TRNs) and a companion algorithm to identify and to characterize stress responsive genes and their roles in a given stress response event. The proposed methodology was applied to plants, by using Arabidopsis thaliana as an example, under abiotic stress. Well known interactions were inferred as well as putative novel ones that may play important roles in plants under abiotic stress conditions as confirmed by statistical and literature evidences.
了解植物在非生物胁迫响应、耐受和适应逆境条件下转录因子与基因之间的关系,对培育抗逆性作物品种具有重要意义。虽然表征应激反应性tf及其靶标的实验方法非常准确,但在给定的应激反应事件中识别和表征给定基因的作用通常是费力和耗时的。另一方面,计算方法通过集成高通量组学数据和数学方法/模型,提供了一个识别新知识的平台。在这项研究中,我们开发了一个转录调控网络(trn)的通用线性模型和一个伴随算法来识别和表征应激反应基因及其在给定应激反应事件中的作用。以拟南芥为例,将该方法应用于非生物胁迫下的植物研究。在非生物胁迫条件下,我们推断出了已知的相互作用,并推测出了新的相互作用,这些相互作用可能在植物中发挥重要作用,并得到了统计和文献证据的证实。
{"title":"A generic model of transcriptional regulatory networks: Application to plants under abiotic stress","authors":"A. Tchagang, Sieu Phan, Fazel Famili, Youlian Pan, A. Cutler, Jitao Zou","doi":"10.1109/GENSIPS.2013.6735922","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735922","url":null,"abstract":"Understanding the relationships between transcription factors (TFs) and genes in plants under abiotic stress responses, tolerance and adaptation to adverse environments is very important in developing resilient crop varieties. While experimental methods to characterize stress responsive TFs and their targets are highly accurate, identification and characterization of the role of a given gene in a given stress response event are often laborious and time consuming. Computational approaches, on the other hand, offer a platform to identify new knowledge by integrating high throughput omics data and mathematical methods/models. In this research, we have developed a generic linear model of transcriptional regulatory networks (TRNs) and a companion algorithm to identify and to characterize stress responsive genes and their roles in a given stress response event. The proposed methodology was applied to plants, by using Arabidopsis thaliana as an example, under abiotic stress. Well known interactions were inferred as well as putative novel ones that may play important roles in plants under abiotic stress conditions as confirmed by statistical and literature evidences.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134275984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bayesian multivariate Poisson model for RNA-seq classification RNA-seq分类的贝叶斯多元泊松模型
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735946
J. Knight, I. Ivanov, E. Dougherty
High dimensional data and small samples make genomic/proteomic classifier design and error estimation virtually impossible without the use of prior information [1]. Dalton and Dougherty utilize prior biological knowledge via a Bayesian approach that considers a prior distribution on an uncertainty class of feature-label distributions [2], [3]. While their general framework is very broad, the focus their attention on multinomial and Gaussian models, for which they derive closed-form solutions of the minimum mean squared error (MMSE) error estimate, the MSE of the error estimate, and an optimal Bayesian classifier (OBC) classifier relative to the prior distribution. Sequencing datasets consist of the number of reads found to map to specific regions of a reference genome. As such, they are often modeled with a discrete distribution, such as the Poisson. For this reason, Gaussian and multinomial distributions are not ideal for sequence-based datasets. Thus, we introduce a multivariate Poisson model (MP) and the associated MP OBC for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior classification performance for more complex synthetic datasets and comparable performance to the top classifiers in other simpler synthetic datasets.
高维数据和小样本使得基因组/蛋白质组学分类器的设计和误差估计几乎不可能不使用先验信息[1]。Dalton和Dougherty通过贝叶斯方法利用先验生物学知识,该方法考虑了特征标签分布的不确定性类的先验分布[2],[3]。虽然他们的总体框架非常广泛,但他们将注意力集中在多项和高斯模型上,为此他们推导了最小均方误差(MMSE)误差估计的封闭形式解,误差估计的MSE,以及相对于先验分布的最优贝叶斯分类器(OBC)分类器。测序数据集由发现的与参考基因组的特定区域相对应的读数组成。因此,它们通常用离散分布建模,如泊松分布。由于这个原因,高斯分布和多项分布对于基于序列的数据集来说不是理想的。因此,我们引入了一个多变量泊松模型(MP)和相关的MP OBC,用于使用测序数据对样本进行分类。由于缺乏封闭形式的解,我们采用蒙特卡洛马尔可夫链(MCMC)方法进行分类。我们在更复杂的合成数据集上展示了卓越的分类性能,并在其他更简单的合成数据集上展示了与顶级分类器相当的性能。
{"title":"Bayesian multivariate Poisson model for RNA-seq classification","authors":"J. Knight, I. Ivanov, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735946","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735946","url":null,"abstract":"High dimensional data and small samples make genomic/proteomic classifier design and error estimation virtually impossible without the use of prior information [1]. Dalton and Dougherty utilize prior biological knowledge via a Bayesian approach that considers a prior distribution on an uncertainty class of feature-label distributions [2], [3]. While their general framework is very broad, the focus their attention on multinomial and Gaussian models, for which they derive closed-form solutions of the minimum mean squared error (MMSE) error estimate, the MSE of the error estimate, and an optimal Bayesian classifier (OBC) classifier relative to the prior distribution. Sequencing datasets consist of the number of reads found to map to specific regions of a reference genome. As such, they are often modeled with a discrete distribution, such as the Poisson. For this reason, Gaussian and multinomial distributions are not ideal for sequence-based datasets. Thus, we introduce a multivariate Poisson model (MP) and the associated MP OBC for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior classification performance for more complex synthetic datasets and comparable performance to the top classifiers in other simpler synthetic datasets.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125172740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of separate sampling on classification and the minimax criterion 分离抽样对分类和极大极小准则的影响
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735935
M. S. Esfahani, E. Dougherty
It is commonplace in bioinformatics (and elsewhere) to build a classifier from sample data in which the sample sizes of the classes are not random; that is, they are selected prior to sampling. The result is that there is no estimate of the prior class probabilities available from the data. In this paper, we find an analytic result for the minimax solution for the class prior probabilities for a general Neyman-Pearson induced classifier. From that we derive Anderson's classical minimax prior probability “estimate.” Using synthetic and real data, we demonstrate the degradation in classifier performance from using inaccurate values for the prior probabilities.
在生物信息学(和其他领域)中,从样本数据中构建分类器是很常见的,其中类的样本大小不是随机的;也就是说,它们是在抽样之前被选择的。结果是无法从数据中获得先验类概率的估计。本文给出了一类广义内曼-皮尔逊诱导分类器类先验概率的极大极小解的解析结果。由此我们导出了Anderson经典的极小极大先验概率“估计”。使用合成数据和真实数据,我们证明了使用不准确的先验概率值会降低分类器的性能。
{"title":"Effect of separate sampling on classification and the minimax criterion","authors":"M. S. Esfahani, E. Dougherty","doi":"10.1109/GENSIPS.2013.6735935","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735935","url":null,"abstract":"It is commonplace in bioinformatics (and elsewhere) to build a classifier from sample data in which the sample sizes of the classes are not random; that is, they are selected prior to sampling. The result is that there is no estimate of the prior class probabilities available from the data. In this paper, we find an analytic result for the minimax solution for the class prior probabilities for a general Neyman-Pearson induced classifier. From that we derive Anderson's classical minimax prior probability “estimate.” Using synthetic and real data, we demonstrate the degradation in classifier performance from using inaccurate values for the prior probabilities.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121609436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Identifying cancer biomarkers through a network regularized Cox model 通过网络正则化Cox模型识别癌症生物标志物
Pub Date : 2013-11-01 DOI: 10.1109/GENSIPS.2013.6735924
Ying-Wooi Wan, John Nagorski, Genevera I. Allen, Zhaohui Li, Zhandong Liu
A central problem in cancer genomics is to identify interpretable biomarkers for better disease prognosis. Many of the biomarkers identified through Cox Proportional Hazard (PH) models are biologically uninterpretable. We propose the use of graph Laplacian regularized Cox PH model to integrate biological networks into the feature selection problem in survival analysis. Simulation studies demonstrate that the performance of the proposed algorithm is superior to L1 and L1+L2 regularized Cox PH models. Utility of this algorithm is also validated by its ability to identify key known biomarkers such as p53 and myc in estrogen receptor positive breast cancer patients using genomic abberration data generated by the Cancer Genome Altas consortium. With the rapid expansion of our knowledge of biological networks, this approach will become increasingly useful for mining high-throughput genomic datasets.
癌症基因组学的一个核心问题是确定可解释的生物标志物,以获得更好的疾病预后。通过Cox比例风险(PH)模型确定的许多生物标志物在生物学上是不可解释的。我们提出使用图拉普拉斯正则化Cox PH模型将生物网络整合到生存分析中的特征选择问题中。仿真研究表明,该算法的性能优于L1和L1+L2正则化Cox PH模型。利用cancer Genome Altas联盟生成的基因组畸变数据,该算法能够识别雌激素受体阳性乳腺癌患者的关键已知生物标志物,如p53和myc,从而验证了该算法的实用性。随着我们对生物网络知识的快速扩展,这种方法对于挖掘高通量基因组数据集将变得越来越有用。
{"title":"Identifying cancer biomarkers through a network regularized Cox model","authors":"Ying-Wooi Wan, John Nagorski, Genevera I. Allen, Zhaohui Li, Zhandong Liu","doi":"10.1109/GENSIPS.2013.6735924","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735924","url":null,"abstract":"A central problem in cancer genomics is to identify interpretable biomarkers for better disease prognosis. Many of the biomarkers identified through Cox Proportional Hazard (PH) models are biologically uninterpretable. We propose the use of graph Laplacian regularized Cox PH model to integrate biological networks into the feature selection problem in survival analysis. Simulation studies demonstrate that the performance of the proposed algorithm is superior to L1 and L1+L2 regularized Cox PH models. Utility of this algorithm is also validated by its ability to identify key known biomarkers such as p53 and myc in estrogen receptor positive breast cancer patients using genomic abberration data generated by the Cancer Genome Altas consortium. With the rapid expansion of our knowledge of biological networks, this approach will become increasingly useful for mining high-throughput genomic datasets.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133838646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2013 IEEE International Workshop on Genomic Signal Processing and Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1