首页 > 最新文献

EURASIP journal on bioinformatics & systems biology最新文献

英文 中文
A hybrid technique for the periodicity characterization of genomic sequence data. 基因组序列数据周期性表征的混合技术。
Pub Date : 2009-01-01 Epub Date: 2009-04-08 DOI: 10.1155/2009/924601
Julien Epps

Many studies of biological sequence data have examined sequence structure in terms of periodicity, and various methods for measuring periodicity have been suggested for this purpose. This paper compares two such methods, autocorrelation and the Fourier transform, using synthetic periodic sequences, and explains the differences in periodicity estimates produced by each. A hybrid autocorrelation-integer period discrete Fourier transform is proposed that combines the advantages of both techniques. Collectively, this representation and a recently proposed variant on the discrete Fourier transform offer alternatives to the widely used autocorrelation for the periodicity characterization of sequence data. Finally, these methods are compared for various tetramers of interest in C. elegans chromosome I.

许多生物序列数据的研究都是从周期性的角度来考察序列结构的,并为此提出了各种测量周期性的方法。本文比较了使用合成周期序列的两种方法,即自相关和傅立叶变换,并解释了每种方法产生的周期估计的差异。结合两种方法的优点,提出了一种混合自相关-整数周期离散傅里叶变换。总的来说,这种表示和最近提出的离散傅立叶变换的变体为序列数据的周期性特征提供了广泛使用的自相关的替代方法。最后,这些方法比较了秀丽隐杆线虫I号染色体上各种感兴趣的四聚体。
{"title":"A hybrid technique for the periodicity characterization of genomic sequence data.","authors":"Julien Epps","doi":"10.1155/2009/924601","DOIUrl":"https://doi.org/10.1155/2009/924601","url":null,"abstract":"<p><p>Many studies of biological sequence data have examined sequence structure in terms of periodicity, and various methods for measuring periodicity have been suggested for this purpose. This paper compares two such methods, autocorrelation and the Fourier transform, using synthetic periodic sequences, and explains the differences in periodicity estimates produced by each. A hybrid autocorrelation-integer period discrete Fourier transform is proposed that combines the advantages of both techniques. Collectively, this representation and a recently proposed variant on the discrete Fourier transform offer alternatives to the widely used autocorrelation for the periodicity characterization of sequence data. Finally, these methods are compared for various tetramers of interest in C. elegans chromosome I.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"924601"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/924601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28183517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data. 从时间基因表达数据重构小鼠大脑转录调控的广义逻辑网络。
Pub Date : 2009-01-01 DOI: 10.1155/2009/545176
Mingzhou Joe Song, Chris K Lewis, Eric R Lance, Elissa J Chesler, Roumyana Kirova Yordanova, Michael A Langston, Kerrie H Lodowski, Susan E Bergeson

Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.

基因表达时间过程数据不仅可以用来检测差异表达基因,还可以用来发现基因之间的时间关联。重建广义逻辑网络的问题,以说明基因和环境刺激的转录组数据之间的时间依赖性。开发了一种网络重建算法,该算法使用统计显著性作为网络选择的标准,以避免纯粹偶然产生的假阳性相互作用。基于多项假设检验的网络重构允许对假阳性率进行明确的规范,这与所有现有的网络推理算法不同。仿真研究表明,该方法优于动态贝叶斯网络建模。在酒精分子反应分析中,酒精处理小鼠大脑的时间基因表达数据用于建模。来自主要神经通路的基因被认为是酒精反应机制的推定成分。据文献报道,其中9个基因与酒精有关。其他几个潜在的相关基因,与文献挖掘的独立结果相一致,可能在酒精反应中发挥作用。此外,先前未知的基因相互作用被发现,经过生物学验证,可能为寻找难以捉摸的酒精中毒分子机制提供新的线索。
{"title":"Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data.","authors":"Mingzhou Joe Song,&nbsp;Chris K Lewis,&nbsp;Eric R Lance,&nbsp;Elissa J Chesler,&nbsp;Roumyana Kirova Yordanova,&nbsp;Michael A Langston,&nbsp;Kerrie H Lodowski,&nbsp;Susan E Bergeson","doi":"10.1155/2009/545176","DOIUrl":"https://doi.org/10.1155/2009/545176","url":null,"abstract":"<p><p>Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"545176"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/545176","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9801481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Reverse engineering of gene regulatory networks: a comparative study. 基因调控网络的逆向工程:比较研究。
Pub Date : 2009-01-01 DOI: 10.1155/2009/617281
Hendrik Hache, Hans Lehrach, Ralf Herwig

Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.

基因调控网络的逆向工程一直是生物信息学研究的热点,因为它是从探索性基因表达分析到致病性基因表达分析的中间步骤。近年来提出了许多方法,导致了广泛的数学方法。在实践中,不同的数学方法会产生不同的网络结构,因此,用户评估这些算法的性能是非常重要的。我们对六种不同的逆向工程方法进行了比较研究,包括相关网络、神经网络和贝叶斯网络。我们的方法包括生成定义的基准数据,用不同的方法分析这些数据,以及通过统计分析评估算法的性能。性能是根据网络大小和噪音水平来判断的。对比研究结果表明,神经网络方法是目前研究中表现最好的方法。
{"title":"Reverse engineering of gene regulatory networks: a comparative study.","authors":"Hendrik Hache,&nbsp;Hans Lehrach,&nbsp;Ralf Herwig","doi":"10.1155/2009/617281","DOIUrl":"https://doi.org/10.1155/2009/617281","url":null,"abstract":"<p><p>Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"617281"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/617281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9433273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Efficient alignment of RNAs with pseudoknots using sequence alignment constraints. 利用序列比对约束对带有假结的rna进行有效比对。
Pub Date : 2009-01-01 Epub Date: 2009-04-14 DOI: 10.1155/2009/491074
Byung-Jun Yoon

When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the "seed" regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.

在对rna进行比对时,重要的是要同时考虑二级结构相似性和一级序列相似性,以找到准确的比对。然而,可以处理RNA二级结构的算法通常具有很高的计算复杂性,这限制了它们的实用性。由于这个原因,已经有许多尝试找到有用的对齐约束,可以在不牺牲对齐精度的情况下减少计算。在本文中,我们提出了一种新的方法来寻找有效的排列约束,以快速准确地定位rna的结构,包括假结。在提出的方法中,我们使用一个轮廓- hmm来识别可以高置信度对齐的“种子”区域。我们还估计了位于种子区域外的对齐碱基的位置范围。然后利用种子区域的位置和估计的比对位置范围来建立序列比对约束。我们将提出的约束纳入到基于背景敏感的HMM (profile- cshmm)的RNA结构比对算法中。实验表明,该方法在不降低RNA比对精度的前提下,可使RNA比对速度提高11倍。
{"title":"Efficient alignment of RNAs with pseudoknots using sequence alignment constraints.","authors":"Byung-Jun Yoon","doi":"10.1155/2009/491074","DOIUrl":"https://doi.org/10.1155/2009/491074","url":null,"abstract":"<p><p>When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the \"seed\" regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"491074"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/491074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28129009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Origins of stochasticity and burstiness in high-dimensional biochemical networks. 高维生化网络中随机性和突发性的起源。
Pub Date : 2009-01-01 Epub Date: 2008-10-16 DOI: 10.1155/2009/362309
Simon Rosenfeld

Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the "shot" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the "heavy-tailed" generalized Pareto process which is a direct mathematical expression of burstiness.

在细胞内生物化学网络的随机动力学领域有两种主要的方法。第一种观点将注意力集中在许多对网络功能至关重要的生化成分可能只在细胞内少量存在这一事实上,因此调节过程本质上是离散的,容易出现相对较大的波动。第二种方法认为监管过程本质上是连续的。在这类过程中,由于极限环内的多稳定性和振荡运动,可能会出现复杂的伪随机行为。在本文中,我们概述了监管过程中随机性的第三种情况。这种情况只有在高维高度非线性系统中才能想象得到。特别是,我们证明了基因表达生物学中一个众所周知的现象,即突发性,是高维和高非线性耦合的自然结果。在数学术语中,突发性与描述系统动力学的随机过程的重尾概率分布有关。我们演示了“射击”噪声是如何从潜在动力系统的纯粹确定性行为中产生的。我们得出了极限随机过程可以用“重尾”广义Pareto过程精确地逼近,该过程是突发性的直接数学表达式。
{"title":"Origins of stochasticity and burstiness in high-dimensional biochemical networks.","authors":"Simon Rosenfeld","doi":"10.1155/2009/362309","DOIUrl":"https://doi.org/10.1155/2009/362309","url":null,"abstract":"<p><p>Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the \"shot\" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the \"heavy-tailed\" generalized Pareto process which is a direct mathematical expression of burstiness.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"362309"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/362309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27814720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study. 缺失值输入对DNA微阵列基因表达数据分类的影响——基于模型的研究。
Pub Date : 2009-01-01 Epub Date: 2010-03-02 DOI: 10.1155/2009/504069
Youting Sun, Ulisses Braga-Neto, Edward R Dougherty

Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.

目前,针对微阵列数据已经开发了许多缺失值(MV)输入方法,但仅有少数研究探讨了缺失值输入与分类精度之间的关系。此外,这些研究在MV生成和分类器误差估计等基本步骤上存在问题。在这项工作中,我们开展了一项基于模型的研究,解决了以前研究中的一些问题。考虑了六种常用的插值算法、两种特征选择方法和三种分类规则。结果表明,当噪声水平高、方差小或基因簇相关性强时,在小到中等的MV率下,应用MV归算是有利的。在这些情况下,如果数据质量指标可用,那么将质量差的数据点视为缺失的数据点,并应用最健壮的输入算法之一,以基于可用的高质量数据点估计真实信号,可能会有所帮助。然而,在较大的毫伏率下,我们得出的结论是,不建议采用归算方法。在MV率方面,我们的结果表明存在峰值现象:随着MV率的增加,插补方法的性能实际上最初有所提高,但在最佳点之后,随着MV率的增加,性能迅速恶化。
{"title":"Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.","authors":"Youting Sun,&nbsp;Ulisses Braga-Neto,&nbsp;Edward R Dougherty","doi":"10.1155/2009/504069","DOIUrl":"https://doi.org/10.1155/2009/504069","url":null,"abstract":"<p><p>Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2009 ","pages":"504069"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/504069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28772066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Identifying genes involved in cyclic processes by combining gene expression analysis and prior knowledge. 结合基因表达分析和先验知识鉴定参与循环过程的基因。
Pub Date : 2009-01-01 Epub Date: 2009-04-15 DOI: 10.1155/2009/683463
Wentao Zhao, Erchin Serpedin, Edward R Dougherty

Based on time series gene expressions, cyclic genes can be recognized via spectral analysis and statistical periodicity detection tests. These cyclic genes are usually associated with cyclic biological processes, for example, cell cycle and circadian rhythm. The power of a scheme is practically measured by comparing the detected periodically expressed genes with experimentally verified genes participating in a cyclic process. However, in the above mentioned procedure the valuable prior knowledge only serves as an evaluation benchmark, and it is not fully exploited in the implementation of the algorithm. In addition, partial data sets are also disregarded due to their nonstationarity. This paper proposes a novel algorithm to identify cyclic-process-involved genes by integrating the prior knowledge with the gene expression analysis. The proposed algorithm is applied on data sets corresponding to Saccharomyces cerevisiae and Drosophila melanogaster, respectively. Biological evidences are found to validate the roles of the discovered genes in cell cycle and circadian rhythm. Dendrograms are presented to cluster the identified genes and to reveal expression patterns. It is corroborated that the proposed novel identification scheme provides a valuable technique for unveiling pathways related to cyclic processes.

基于基因的时间序列表达,通过谱分析和统计周期性检测测试可以识别出循环基因。这些循环基因通常与循环的生物过程有关,例如细胞周期和昼夜节律。通过将检测到的周期性表达基因与实验验证的参与循环过程的基因进行比较,可以实际测量方案的功率。然而,在上述过程中,有价值的先验知识只是作为一个评价基准,在算法的实现中没有得到充分的利用。此外,部分数据集由于其非平稳性也被忽略。本文提出了一种将先验知识与基因表达分析相结合的循环过程相关基因识别算法。该算法分别应用于酿酒酵母(Saccharomyces cerevisiae)和果蝇(Drosophila melanogaster)对应的数据集。生物学证据证实了所发现的基因在细胞周期和昼夜节律中的作用。树形图是用来聚类已鉴定的基因和揭示表达模式。这证实了所提出的新的识别方案提供了一个有价值的技术,揭示与循环过程相关的途径。
{"title":"Identifying genes involved in cyclic processes by combining gene expression analysis and prior knowledge.","authors":"Wentao Zhao,&nbsp;Erchin Serpedin,&nbsp;Edward R Dougherty","doi":"10.1155/2009/683463","DOIUrl":"https://doi.org/10.1155/2009/683463","url":null,"abstract":"<p><p>Based on time series gene expressions, cyclic genes can be recognized via spectral analysis and statistical periodicity detection tests. These cyclic genes are usually associated with cyclic biological processes, for example, cell cycle and circadian rhythm. The power of a scheme is practically measured by comparing the detected periodically expressed genes with experimentally verified genes participating in a cyclic process. However, in the above mentioned procedure the valuable prior knowledge only serves as an evaluation benchmark, and it is not fully exploited in the implementation of the algorithm. In addition, partial data sets are also disregarded due to their nonstationarity. This paper proposes a novel algorithm to identify cyclic-process-involved genes by integrating the prior knowledge with the gene expression analysis. The proposed algorithm is applied on data sets corresponding to Saccharomyces cerevisiae and Drosophila melanogaster, respectively. Biological evidences are found to validate the roles of the discovered genes in cell cycle and circadian rhythm. Dendrograms are presented to cluster the identified genes and to reveal expression patterns. It is corroborated that the proposed novel identification scheme provides a valuable technique for unveiling pathways related to cyclic processes.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"683463"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/683463","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28129936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Bayesian network view on nested effects models. 嵌套效应模型的贝叶斯网络视图。
Pub Date : 2009-01-01 DOI: 10.1155/2009/195272
Cordula Zeller, Holger Fröhlich, Achim Tresch

Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the R/Bioconductor package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.

嵌套效应模型(nem)是一类概率模型,旨在从主动干预信号通路引起的大量可观察到的效应中重建隐藏的信号结构。我们用贝叶斯网络的语言给出了一个更灵活的nem公式。我们的框架构成了原始NEM模型的自然概括,因为它明确地陈述了隐含在原始版本之下的假设。我们的方法为nem提供了新的学习方法,这些方法已经在R/Bioconductor封装nem中实现。我们在模拟研究中验证了这些方法,并将它们应用于酵母的合成致死性数据集。
{"title":"A Bayesian network view on nested effects models.","authors":"Cordula Zeller,&nbsp;Holger Fröhlich,&nbsp;Achim Tresch","doi":"10.1155/2009/195272","DOIUrl":"https://doi.org/10.1155/2009/195272","url":null,"abstract":"<p><p>Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the R/Bioconductor package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"195272"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/195272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9785774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Applications of signal processing techniques to bioinformatics, genomics, and proteomics. 信号处理技术在生物信息学、基因组学和蛋白质组学中的应用。
Pub Date : 2009-01-01 Epub Date: 2009-04-23 DOI: 10.1155/2009/250306
Erchin Serpedin, Javier Garcia-Frias, Yufei Huang, Ulisses Braga-Neto
The recent development of high-throughput molecular genetics technologies has brought a major impact to bioinformatics and systems biology. These technologies have made possible the measurement of the expression profiles of genes and proteins in a highly parallel and integrated fashion. The examination of the huge amounts of genomic and proteomic data holds the promise for understanding the complex interactions between genes and proteins, the functional processes of a cell, and the impact of various factors on a cell, and ultimately, for enabling the design of new technologies for intelligent management of diseases. This special issue focuses on modeling and processing of data arising in bioinformatics, genomics, and proteomics using signal processing methods. The importance of signal processing techniques is due to their important role in extracting, processing, and interpreting the information contained in genomic and proteomic data. It is our hope that signal processing methods will lead to new advances and insights in uncovering the structure, functioning and evolution of biological systems. The special issue consists of nine papers that span a wide range of problems and applications in bioinformatics, genomics, and proteomics such as design of compressive sensing microarrays, analysis of missing values in microarray data, and effect of imputation techniques on post genomic inference methods, RNA sequence alignment, detection of periodicity in genomic sequences and gene expression profiles, clustering and classification of gene and protein expression data, and intervention in probabilistic Boolean networks. Next, we will briefly introduce the papers reported in this special issue. W. Dai et al. analyze how to design a microarray that it is fit for compressive sensing and that captures also the biochemistry of probe-target DNA hybridization. Algorithms and design results are reported for determining probe sequences that satisfy the binding requirements and for evaluating the target concentrations. M. S. B. Sehgal et al. address the general problem of improving post genomic knowledge discovery procedures such as the selection of the most significant genes and inference of gene regulatory networks using missing microarray data imputation techniques. It is shown that instead of neglecting missing data, recycling microarray data via robust imputation techniques can yield substantial performance improvements in the subsequent post genomic discovery procedures. B.-J. Yoon developed a novel efficient and robust approach for fast and accurate structural alignment of RNAs, including pseudoknots. The proposed method turns out to accelerate the dynamic programming algorithm for family-specific models such as profile-csHMMs and CMs, and to be robust to small parameter changes that are present in the model used to predict the constraint. The paper by J. Epps explains in detail the origins of ambiguity in period estimation for
{"title":"Applications of signal processing techniques to bioinformatics, genomics, and proteomics.","authors":"Erchin Serpedin,&nbsp;Javier Garcia-Frias,&nbsp;Yufei Huang,&nbsp;Ulisses Braga-Neto","doi":"10.1155/2009/250306","DOIUrl":"https://doi.org/10.1155/2009/250306","url":null,"abstract":"The recent development of high-throughput molecular genetics technologies has brought a major impact to bioinformatics and systems biology. These technologies have made possible the measurement of the expression profiles of genes and proteins in a highly parallel and integrated fashion. The examination of the huge amounts of genomic and proteomic data holds the promise for understanding the complex interactions between genes and proteins, the functional processes of a cell, and the impact of various factors on a cell, and ultimately, for enabling the design of new technologies for intelligent management of diseases. This special issue focuses on modeling and processing of data arising in bioinformatics, genomics, and proteomics using signal processing methods. The importance of signal processing techniques is due to their important role in extracting, processing, and interpreting the information contained in genomic and proteomic data. It is our hope that signal processing methods will lead to new advances and insights in uncovering the structure, functioning and evolution of biological systems. The special issue consists of nine papers that span a wide range of problems and applications in bioinformatics, genomics, and proteomics such as design of compressive sensing microarrays, analysis of missing values in microarray data, and effect of imputation techniques on post genomic inference methods, RNA sequence alignment, detection of periodicity in genomic sequences and gene expression profiles, clustering and classification of gene and protein expression data, and intervention in probabilistic Boolean networks. Next, we will briefly introduce the papers reported in this special issue. W. Dai et al. analyze how to design a microarray that it is fit for compressive sensing and that captures also the biochemistry of probe-target DNA hybridization. Algorithms and design results are reported for determining probe sequences that satisfy the binding requirements and for evaluating the target concentrations. M. S. B. Sehgal et al. address the general problem of improving post genomic knowledge discovery procedures such as the selection of the most significant genes and inference of gene regulatory networks using missing microarray data imputation techniques. It is shown that instead of neglecting missing data, recycling microarray data via robust imputation techniques can yield substantial performance improvements in the subsequent post genomic discovery procedures. B.-J. Yoon developed a novel efficient and robust approach for fast and accurate structural alignment of RNAs, including pseudoknots. The proposed method turns out to accelerate the dynamic programming algorithm for family-specific models such as profile-csHMMs and CMs, and to be robust to small parameter changes that are present in the model used to predict the constraint. The paper by J. Epps explains in detail the origins of ambiguity in period estimation for ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"250306"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/250306","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28217201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clustering of gene expression data based on shape similarity. 基于形状相似性的基因表达数据聚类。
Pub Date : 2009-01-01 Epub Date: 2009-04-23 DOI: 10.1155/2009/195712
Travis J Hestilow, Yufei Huang

A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.

本文介绍了一种利用形状信息从表达谱对基因进行聚类的方法。K-means 等传统聚类方法假定具有相似功能的基因具有相似的表达水平,因此将具有相似表达水平的基因分配到同一聚类中。然而,具有相似功能的基因即使表达量相差甚远,其信号形状也往往具有相似性。因此,本研究根据信号形状的相似性进行聚类。这种形状信息以归一化和时间缩放的正向初差的形式捕获,然后进行变异贝叶斯聚类和非贝叶斯(Silhouette)聚类统计。该统计量在确定正确的聚类数量和分配聚类成分方面显示出更强的能力。根据生成的测试数据和大肠杆菌微阵列表达数据的初步结果,以及对大肠杆菌结果的初步验证,可以看出该方法有望根据形状相似性更好地对时间序列微阵列数据进行聚类。
{"title":"Clustering of gene expression data based on shape similarity.","authors":"Travis J Hestilow, Yufei Huang","doi":"10.1155/2009/195712","DOIUrl":"10.1155/2009/195712","url":null,"abstract":"<p><p>A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"195712"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9425663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
EURASIP journal on bioinformatics & systems biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1