首页 > 最新文献

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)最新文献

英文 中文
Network propagation models for gene selection 基因选择的网络传播模型
Pub Date : 2010-12-01 DOI: 10.1109/GENSIPS.2010.5719689
Wei Zhang, Baryun Hwang, Baolin Wu, R. Kuang
In this paper, we explore several network propagation methods for gene selection from microarray gene expression datasets. The network propagation methods capture gene co-expression and differential expression with unified machine learning frameworks. Large scale experiments on five breast cancer datasets validated that the network propagation methods are capable of selecting genes that are more biologically interpretable and more consistent across multiple datasets, compared with the existing approaches.
在本文中,我们探索了几种从微阵列基因表达数据集中进行基因选择的网络传播方法。网络传播方法通过统一的机器学习框架捕获基因共表达和差异表达。在5个乳腺癌数据集上进行的大规模实验证实,与现有方法相比,网络传播方法能够选择更具生物学可解释性和跨多个数据集更具一致性的基因。
{"title":"Network propagation models for gene selection","authors":"Wei Zhang, Baryun Hwang, Baolin Wu, R. Kuang","doi":"10.1109/GENSIPS.2010.5719689","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719689","url":null,"abstract":"In this paper, we explore several network propagation methods for gene selection from microarray gene expression datasets. The network propagation methods capture gene co-expression and differential expression with unified machine learning frameworks. Large scale experiments on five breast cancer datasets validated that the network propagation methods are capable of selecting genes that are more biologically interpretable and more consistent across multiple datasets, compared with the existing approaches.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125507195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Subtype specific breast cancer event prediction 亚型特异性乳腺癌事件预测
Pub Date : 2010-11-10 DOI: 10.1109/GENSIPS.2010.5719684
Herman M. J. Sontrop, W. Verhaegh, R. Ham, M. Reinders, P. Moerland
We investigate the potential to enhance breast cancer event predictors by exploiting subtype information. We do this with a two-stage approach that first determines a sample's subtype using a recent module-driven approach, and secondly constructs a subtype-specific predictor to predict a metastasis event within five years. Our methodology is validated on a large compendium of microarray breast cancer datasets, including 43 replicate array pairs for assessing subtyping stability. Note that stratifying by subtype strongly reduces the training set sizes available to construct the individual predictors, which may decrease performance. Besides sample size, other factors like unequal class distributions and differences in the number of samples per subtype, easily obscure a fair comparison between subtype-specific predictors constructed on different subtypes, but also between subtype specific and subtype a-specific predictors. Therefore, we constructed a completely balanced experimental design, in which none of the above factors play a role and show that subtype-specific event predictors clearly outperform predictors that do not take subtype information into account.
我们通过利用亚型信息来研究增强乳腺癌事件预测因子的潜力。我们采用两阶段方法,首先使用最近的模块驱动方法确定样本的亚型,然后构建亚型特异性预测器来预测五年内的转移事件。我们的方法在一个大型的微阵列乳腺癌数据集上得到了验证,包括43个用于评估亚型稳定性的重复阵列对。注意,按亚型进行分层大大减少了用于构建单个预测器的训练集大小,这可能会降低性能。除了样本量之外,其他因素,如不平等的类别分布和每个亚型样本数量的差异,很容易模糊基于不同亚型构建的亚型特异性预测因子之间的公平比较,以及亚型特异性和亚型特异性预测因子之间的公平比较。因此,我们构建了一个完全平衡的实验设计,其中上述因素均不发挥作用,并表明亚型特异性事件预测因子明显优于不考虑亚型信息的预测因子。
{"title":"Subtype specific breast cancer event prediction","authors":"Herman M. J. Sontrop, W. Verhaegh, R. Ham, M. Reinders, P. Moerland","doi":"10.1109/GENSIPS.2010.5719684","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719684","url":null,"abstract":"We investigate the potential to enhance breast cancer event predictors by exploiting subtype information. We do this with a two-stage approach that first determines a sample's subtype using a recent module-driven approach, and secondly constructs a subtype-specific predictor to predict a metastasis event within five years. Our methodology is validated on a large compendium of microarray breast cancer datasets, including 43 replicate array pairs for assessing subtyping stability. Note that stratifying by subtype strongly reduces the training set sizes available to construct the individual predictors, which may decrease performance. Besides sample size, other factors like unequal class distributions and differences in the number of samples per subtype, easily obscure a fair comparison between subtype-specific predictors constructed on different subtypes, but also between subtype specific and subtype a-specific predictors. Therefore, we constructed a completely balanced experimental design, in which none of the above factors play a role and show that subtype-specific event predictors clearly outperform predictors that do not take subtype information into account.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122096057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inference of gene-regulatory networks using message-passing algorithms 利用信息传递算法推断基因调控网络
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719683
Manohar Shamaiah, Sang Hyun Lee, H. Vikalo
We present an application of message-passing techniques to gene regulatory network inference. The network inference is posed as a constrained linear regression problem, and solved by a distributed computationally efficient message-passing algorithm. Performance of the proposed algorithm is tested on gold standard data sets and evaluated using metrics provided by the DREAM2 challenge [1]. Performance of the proposed algorithm is comparable to that of the techniques which yielded the best results in the DREAM2 challenge competition.
我们提出了一种信息传递技术在基因调控网络推断中的应用。将网络推理作为一个有约束的线性回归问题,采用一种计算效率高的分布式消息传递算法进行求解。所提出算法的性能在金标准数据集上进行了测试,并使用DREAM2挑战提供的指标进行了评估[1]。该算法的性能可与DREAM2挑战赛中取得最佳成绩的技术相媲美。
{"title":"Inference of gene-regulatory networks using message-passing algorithms","authors":"Manohar Shamaiah, Sang Hyun Lee, H. Vikalo","doi":"10.1109/GENSIPS.2010.5719683","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719683","url":null,"abstract":"We present an application of message-passing techniques to gene regulatory network inference. The network inference is posed as a constrained linear regression problem, and solved by a distributed computationally efficient message-passing algorithm. Performance of the proposed algorithm is tested on gold standard data sets and evaluated using metrics provided by the DREAM2 challenge [1]. Performance of the proposed algorithm is comparable to that of the techniques which yielded the best results in the DREAM2 challenge competition.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129553699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Segregation-based subspace clustering for huge dimensional data 基于分离的大维数据子空间聚类
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719667
Majid I. Alsagabi, A. Tewfik
Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.
当数据点落在大维度空间中时,聚类算法就会失效。为了解决这一问题,提出了许多子空间聚类方法来构建数据点有效聚类的子空间。自底向上的方法被广泛用于选择一组候选特征,然后使用该集合的一部分逐步构建隐藏子空间。复杂度取决于所选特征的数量。本文提出了一种基于分离的子空间聚类方法SEGCLU,该方法显著减小了候选特征集的大小,并具有一定的三次复杂度。将该算法应用于两组自闭症儿童和正常发育儿童的DNA拷贝数的无噪声数据,以提取自闭症的潜在生物标志物。85%的个体在13维子空间中被正确分类。
{"title":"Segregation-based subspace clustering for huge dimensional data","authors":"Majid I. Alsagabi, A. Tewfik","doi":"10.1109/GENSIPS.2010.5719667","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719667","url":null,"abstract":"Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130053891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding steady states of large scale regulatory networks through partitioning 通过划分寻找大规模调控网络的稳定状态
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719669
F. Ay, G. Gülsoy, Tamer Kahveci
Identifying steady states that characterize the long term outcome of regulatory networks is crucial in understanding important biological processes such as cellular differentiation. Finding all possible steady states of regulatory networks is a computationally intensive task as it suffers from state space explosion problem. Here, we propose a method for finding steady states of large-scale Boolean regulatory networks. Our method exploits scale-freeness and weak connectivity of regulatory networks in order to speed up the steady state search through partitioning. In the trivial case where network has more than one component such that the components are disconnected from each other, steady states of each component are independent of those of the remaining components. When the size of at least one connected component of the network is still prohibitively large, further partitioning is necessary. In this case, we identify weakly dependent components (i.e., two components that have a small number of regulations from one to the other) and calculate the steady states of each such component independently. We then combine these steady states by taking into account the regulations connecting them. We show that this approach is much more efficient than calculating the steady states of the whole network at once when the number of edges connecting them is small. Since regulatory networks often have small in-degrees, this partitioning strategy can be used effectively in order to find their steady states. Our experimental results on real datasets demonstrate that our method leverages steady state identification to very large regulatory networks.
识别调控网络长期结果的稳定状态对于理解重要的生物过程(如细胞分化)至关重要。由于监管网络存在状态空间爆炸问题,因此寻找所有可能的稳定状态是一项计算密集型的任务。在这里,我们提出了一种寻找大规模布尔调节网络稳态的方法。我们的方法利用调节网络的无标度性和弱连通性,通过分割来加快稳态搜索的速度。在网络具有多个组件的平凡情况下,这些组件彼此断开,每个组件的稳态与其余组件的稳态无关。当网络中至少有一个连接组件的大小仍然非常大时,就需要进一步分区。在这种情况下,我们识别弱依赖组件(即,两个组件具有少量的规则从一个到另一个),并独立计算每个这样的组件的稳定状态。然后我们通过考虑连接它们的规则将这些稳定状态结合起来。我们表明,当连接网络的边数较少时,这种方法比一次性计算整个网络的稳定状态要有效得多。由于调节网络通常具有较小的in-degree,因此可以有效地使用这种划分策略来找到它们的稳定状态。我们在真实数据集上的实验结果表明,我们的方法利用稳态识别非常大的监管网络。
{"title":"Finding steady states of large scale regulatory networks through partitioning","authors":"F. Ay, G. Gülsoy, Tamer Kahveci","doi":"10.1109/GENSIPS.2010.5719669","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719669","url":null,"abstract":"Identifying steady states that characterize the long term outcome of regulatory networks is crucial in understanding important biological processes such as cellular differentiation. Finding all possible steady states of regulatory networks is a computationally intensive task as it suffers from state space explosion problem. Here, we propose a method for finding steady states of large-scale Boolean regulatory networks. Our method exploits scale-freeness and weak connectivity of regulatory networks in order to speed up the steady state search through partitioning. In the trivial case where network has more than one component such that the components are disconnected from each other, steady states of each component are independent of those of the remaining components. When the size of at least one connected component of the network is still prohibitively large, further partitioning is necessary. In this case, we identify weakly dependent components (i.e., two components that have a small number of regulations from one to the other) and calculate the steady states of each such component independently. We then combine these steady states by taking into account the regulations connecting them. We show that this approach is much more efficient than calculating the steady states of the whole network at once when the number of edges connecting them is small. Since regulatory networks often have small in-degrees, this partitioning strategy can be used effectively in order to find their steady states. Our experimental results on real datasets demonstrate that our method leverages steady state identification to very large regulatory networks.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133880389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Optimal perturbation control of gene regulatory networks 基因调控网络的最优摄动控制
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719672
N. Bouaynaya, R. Shterenberg, D. Schonfeld
We formulate the control problem in gene regulatory networks as an inverse perturbation problem, which provides the feasible set of perturbations that force the network to transition from an undesirable steady-state distribution to a desirable one. We derive a general characterization of such perturbations in an appropriate basis representation. We subsequently consider the optimal perturbation, which minimizes the overall energy of change between the original and controlled (perturbed) networks. The “energy” of change is characterized by the Euclidean-norm of the perturbation matrix. We cast the optimal control problem as a semi-definite programming (SDP) problem, thus providing a globally optimal solution which can be efficiently computed using standard SDP solvers. We apply the proposed control to the Human melanoma gene regulatory network and show that the steady-state probability mass is shifted from the undesirable high metastatic states to the chosen steady-state probability mass.
我们将基因调控网络中的控制问题表述为一个逆扰动问题,它提供了一组可行的扰动,迫使网络从不希望的稳态分布过渡到理想的稳态分布。我们在适当的基表示中推导出这种扰动的一般表征。我们随后考虑最优摄动,使原始网络和控制(摄动)网络之间的总能量变化最小化。变化的“能量”由扰动矩阵的欧几里得范数表征。我们将最优控制问题转化为一个半确定规划问题,从而提供了一个全局最优解,该解可以使用标准的半确定规划解有效地计算。我们将所提出的控制应用于人类黑色素瘤基因调控网络,并表明稳态概率质量从不希望的高转移状态转移到所选择的稳态概率质量。
{"title":"Optimal perturbation control of gene regulatory networks","authors":"N. Bouaynaya, R. Shterenberg, D. Schonfeld","doi":"10.1109/GENSIPS.2010.5719672","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719672","url":null,"abstract":"We formulate the control problem in gene regulatory networks as an inverse perturbation problem, which provides the feasible set of perturbations that force the network to transition from an undesirable steady-state distribution to a desirable one. We derive a general characterization of such perturbations in an appropriate basis representation. We subsequently consider the optimal perturbation, which minimizes the overall energy of change between the original and controlled (perturbed) networks. The “energy” of change is characterized by the Euclidean-norm of the perturbation matrix. We cast the optimal control problem as a semi-definite programming (SDP) problem, thus providing a globally optimal solution which can be efficiently computed using standard SDP solvers. We apply the proposed control to the Human melanoma gene regulatory network and show that the steady-state probability mass is shifted from the undesirable high metastatic states to the chosen steady-state probability mass.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134320980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A screening method for dimensionality reduction in biochemical reaction system calibration 生化反应系统标定降维的筛选方法
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719677
W. G. Jenkinson, J. Goutsias
Estimating the rate constants of a biochemical reaction model of cellular function is an important, albeit computationally intensive, problem in systems biology. In this paper, a variance-based sensitivity analysis approach is proposed, which can be used, as a pre-screening step, to identify parameters in a biochemical reaction system that do not appreciably influence the cost of estimation and, therefore, whose values cannot be precisely determined by parameter estimation. By only estimating the remaining parameters, appreciable qualitative and quantitative improvements can be achieved. A subset of a well-known biochemical reaction model of the EGF/ERK signaling pathway is used to illustrate the benefits achieved by the proposed method.
估计细胞功能生化反应模型的速率常数是系统生物学中一个重要的问题,尽管计算量很大。本文提出了一种基于方差的敏感性分析方法,该方法可作为预筛选步骤,用于识别生化反应系统中不会明显影响估计成本的参数,因此其值无法通过参数估计精确确定。仅通过估计剩余的参数,就可以实现可观的定性和定量改进。一个众所周知的EGF/ERK信号通路生化反应模型的子集被用来说明所提出的方法所带来的好处。
{"title":"A screening method for dimensionality reduction in biochemical reaction system calibration","authors":"W. G. Jenkinson, J. Goutsias","doi":"10.1109/GENSIPS.2010.5719677","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719677","url":null,"abstract":"Estimating the rate constants of a biochemical reaction model of cellular function is an important, albeit computationally intensive, problem in systems biology. In this paper, a variance-based sensitivity analysis approach is proposed, which can be used, as a pre-screening step, to identify parameters in a biochemical reaction system that do not appreciably influence the cost of estimation and, therefore, whose values cannot be precisely determined by parameter estimation. By only estimating the remaining parameters, appreciable qualitative and quantitative improvements can be achieved. A subset of a well-known biochemical reaction model of the EGF/ERK signaling pathway is used to illustrate the benefits achieved by the proposed method.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125864505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Graphlet alignment in protein interaction networks 蛋白质相互作用网络中的石墨烯排列
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719676
Mu-Fen Hsieh, S. Sze
With the increased availability of genome-scale data, it becomes possible to study functional relationships of genes across multiple biological networks. While most previous approaches for studying conservation of patterns in networks are through the application of network alignment algorithms or the identification of network motifs, we show that it is possible to exhaustively enumerate all graphlet alignments, which consist of subgraphs from each network that share a common topology and contain homologous proteins at the same position in the topology. We show that our algorithm is able to cover significantly more proteins than previous network alignment algorithms while achieving comparable specificity and higher sensitivity with respect to functional enrichment.
随着基因组尺度数据可用性的增加,跨多个生物网络研究基因的功能关系成为可能。虽然以前研究网络中模式守恒的大多数方法是通过应用网络对齐算法或识别网络基序,但我们表明有可能穷尽枚举所有的石墨烯对齐,这些石墨烯对齐由来自每个网络的子图组成,这些子图共享一个共同的拓扑结构,并在拓扑结构中包含相同位置的同源蛋白质。我们表明,我们的算法能够覆盖比以前的网络比对算法更多的蛋白质,同时在功能富集方面实现相当的特异性和更高的灵敏度。
{"title":"Graphlet alignment in protein interaction networks","authors":"Mu-Fen Hsieh, S. Sze","doi":"10.1109/GENSIPS.2010.5719676","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719676","url":null,"abstract":"With the increased availability of genome-scale data, it becomes possible to study functional relationships of genes across multiple biological networks. While most previous approaches for studying conservation of patterns in networks are through the application of network alignment algorithms or the identification of network motifs, we show that it is possible to exhaustively enumerate all graphlet alignments, which consist of subgraphs from each network that share a common topology and contain homologous proteins at the same position in the topology. We show that our algorithm is able to cover significantly more proteins than previous network alignment algorithms while achieving comparable specificity and higher sensitivity with respect to functional enrichment.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian MMSE estimation of classification error and performance on real genomic data 贝叶斯MMSE估计在真实基因组数据上的分类误差及性能
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719674
Lori A. Dalton, E. Dougherty
Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.
由于高通量基因组学和蛋白质组学技术的发展,小样本分类器设计已经成为生物和医学界的一个主要问题。由于可用信息有限,分类器误差估计的问题已经受到限制,而重用训练数据进行误差估计的必要性进一步加剧了这一问题。由于误差估计的困难,目前流行的所有技术都是启发式设计,而不是基于统计推断和优化的严格设计。然而,最近提出的误差估计器将问题置于存在不确定性的最优均方误差(MSE)信号估计框架中。这就产生了基于参数化特征标签分布的贝叶斯误差估计方法。这些贝叶斯误差估计器在给定分布族上平均时是最优的,在给定分布族和所有样本上平均时是无偏的,并且在分析上解决了鲁棒性(建模假设)和准确性(最小均方误差)之间的权衡。对于两个重要的例子:离散分类问题和高斯分布的线性分类问题,已经给出了封闭形式的解。本文讨论了贝叶斯最小均方误差(MMSE)误差估计器,并在高斯建模假设下演示了其在真实生物数据上的性能。
{"title":"Bayesian MMSE estimation of classification error and performance on real genomic data","authors":"Lori A. Dalton, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719674","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719674","url":null,"abstract":"Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114853772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Importance sampling method for efficient estimation of the probability of rare events in biochemical reaction systems 有效估计生化反应系统中罕见事件概率的重要抽样方法
Pub Date : 2010-11-01 DOI: 10.1109/GENSIPS.2010.5719686
Zhouyi Xu, Xiaodong Cai
The weighted stochastic simulation algorithm (wSSA) recently developed by Kuwahara and Mura and the refined wSSA proposed by Gillespie et al. based on the importance sampling technique open the door for efficient estimation of the probability of rare events in biochemical reaction systems. However, both the wSSA and the refined wSSA do not provide a systematic method for selecting the values of importance sampling parameters but require some initial guessing for those values. In this paper, we develop a systematic method for selecting the values of importance sampling parameters for the wSSA. Numerical results demonstrate that our parameter selection method can substantially improve the performance of the wSSA in terms of simulation efficiency and accuracy.
Kuwahara和Mura最近开发的加权随机模拟算法(wSSA)和Gillespie等人基于重要性抽样技术提出的改进wSSA为有效估计生化反应系统中罕见事件的概率打开了大门。然而,wSSA和改进wSSA都没有提供一个系统的方法来选择重要抽样参数的值,而是需要对这些值进行一些初步的猜测。在本文中,我们开发了一种系统的方法来选择wSSA的重要抽样参数的值。数值结果表明,我们的参数选择方法在仿真效率和精度方面都能显著提高wSSA的性能。
{"title":"Importance sampling method for efficient estimation of the probability of rare events in biochemical reaction systems","authors":"Zhouyi Xu, Xiaodong Cai","doi":"10.1109/GENSIPS.2010.5719686","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719686","url":null,"abstract":"The weighted stochastic simulation algorithm (wSSA) recently developed by Kuwahara and Mura and the refined wSSA proposed by Gillespie et al. based on the importance sampling technique open the door for efficient estimation of the probability of rare events in biochemical reaction systems. However, both the wSSA and the refined wSSA do not provide a systematic method for selecting the values of importance sampling parameters but require some initial guessing for those values. In this paper, we develop a systematic method for selecting the values of importance sampling parameters for the wSSA. Numerical results demonstrate that our parameter selection method can substantially improve the performance of the wSSA in terms of simulation efficiency and accuracy.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"69 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114046694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1