首页 > 最新文献

EURASIP journal on bioinformatics & systems biology最新文献

英文 中文
The prediction of virus mutation using neural networks and rough set techniques. 基于神经网络和粗糙集技术的病毒变异预测。
Pub Date : 2016-05-13 eCollection Date: 2016-12-01 DOI: 10.1186/s13637-016-0042-0
Mostafa A Salama, Aboul Ella Hassanien, Ahmad Mostafa

Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.

病毒进化仍然是抗病毒治疗有效性的主要障碍。预测这种演变的能力将有助于早期发现耐药菌株,并可能促进设计更有效的抗病毒治疗方法。为了实现这一目标,基因组研究中使用了各种工具。其中一种工具是机器学习,它有助于研究结构-活动关系,二级和三级结构演化预测以及序列误差校正。这项工作提出了一种新的机器学习技术,用于预测初级RNA序列结构对齐上可能出现的点突变。它预测了RNA序列中每个核苷酸的基因型,并证明了RNA序列中的核苷酸会随着序列中其他核苷酸的变化而变化。首先利用神经网络技术预测新菌株,然后引入基于粗糙集理论的点突变模式提取算法。该算法应用于许多排列的RNA分离物的时间序列物种的纽卡斯尔病毒。在验证这些技术时,使用了来自两个来源的两个不同数据集。结果表明,该技术预测新一代核苷酸的准确率高达75%。突变规则可视化分析在同一RNA序列中不同核苷酸之间的相关性。
{"title":"The prediction of virus mutation using neural networks and rough set techniques.","authors":"Mostafa A Salama,&nbsp;Aboul Ella Hassanien,&nbsp;Ahmad Mostafa","doi":"10.1186/s13637-016-0042-0","DOIUrl":"https://doi.org/10.1186/s13637-016-0042-0","url":null,"abstract":"<p><p>Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2016-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0042-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34544083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Stochastic block coordinate Frank-Wolfe algorithm for large-scale biological network alignment. 大规模生物网络对齐的随机块坐标Frank-Wolfe算法。
Pub Date : 2016-04-08 eCollection Date: 2016-12-01 DOI: 10.1186/s13637-016-0041-1
Yijie Wang, Xiaoning Qian

With increasingly "big" data available in biomedical research, deriving accurate and reproducible biology knowledge from such big data imposes enormous computational challenges. In this paper, motivated by recently developed stochastic block coordinate algorithms, we propose a highly scalable randomized block coordinate Frank-Wolfe algorithm for convex optimization with general compact convex constraints, which has diverse applications in analyzing biomedical data for better understanding cellular and disease mechanisms. We focus on implementing the derived stochastic block coordinate algorithm to align protein-protein interaction networks for identifying conserved functional pathways based on the IsoRank framework. Our derived stochastic block coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and naturally leads to the decreased computational cost (time and space) for each iteration. Our experiments for querying conserved functional protein complexes in yeast networks confirm the effectiveness of this technique for analyzing large-scale biological networks.

随着生物医学研究中可用的“大”数据越来越多,从这些大数据中获得准确和可重复的生物学知识给计算带来了巨大的挑战。在本文中,受最近发展的随机块坐标算法的启发,我们提出了一种高度可扩展的随机块坐标Frank-Wolfe算法,用于具有一般紧性凸约束的凸优化,该算法在分析生物医学数据以更好地理解细胞和疾病机制方面具有多种应用。我们专注于实现衍生的随机块坐标算法,以对齐蛋白质-蛋白质相互作用网络,以识别基于IsoRank框架的保守功能途径。我们推导的随机块坐标Frank-Wolfe (SBCFW)算法具有收敛性保证,并且每次迭代的计算成本(时间和空间)自然会降低。我们在酵母网络中查询保守功能蛋白复合物的实验证实了这种技术在分析大规模生物网络中的有效性。
{"title":"Stochastic block coordinate Frank-Wolfe algorithm for large-scale biological network alignment.","authors":"Yijie Wang,&nbsp;Xiaoning Qian","doi":"10.1186/s13637-016-0041-1","DOIUrl":"https://doi.org/10.1186/s13637-016-0041-1","url":null,"abstract":"<p><p>With increasingly \"big\" data available in biomedical research, deriving accurate and reproducible biology knowledge from such big data imposes enormous computational challenges. In this paper, motivated by recently developed stochastic block coordinate algorithms, we propose a highly scalable randomized block coordinate Frank-Wolfe algorithm for convex optimization with general compact convex constraints, which has diverse applications in analyzing biomedical data for better understanding cellular and disease mechanisms. We focus on implementing the derived stochastic block coordinate algorithm to align protein-protein interaction networks for identifying conserved functional pathways based on the IsoRank framework. Our derived stochastic block coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and naturally leads to the decreased computational cost (time and space) for each iteration. Our experiments for querying conserved functional protein complexes in yeast networks confirm the effectiveness of this technique for analyzing large-scale biological networks.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2016-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0041-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34429398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning. 利用多步建议分布改进贝叶斯网络结构学习中的MCMC收敛性。
Pub Date : 2015-06-20 eCollection Date: 2015-12-01 DOI: 10.1186/s13637-015-0024-7
Antti Larjo, Harri Lähdesmäki

Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network.

贝叶斯网络在实体之间的概率关系建模方面已经变得非常流行。由于它们的结构也可以给出所研究系统的因果解释,因此它们可以用于学习,例如,生物网络和途径中基因或蛋白质的调节关系。贝叶斯网络结构的推断因模型结构空间的大小而变得复杂,需要使用优化方法或抽样技术,如马尔可夫链蒙特卡罗(MCMC)方法。然而,MCMC链的收敛在许多情况下是缓慢的,并且随着数据集大小的增长可能会成为一个更难的问题。我们在这里展示了如何通过使用可调节的建议分布来提高贝叶斯网络结构空间的收敛性,该分布可以在结构空间中提出大范围的步骤,并通过分析来自人类原代T细胞信号网络的磷酸化蛋白数据来证明改进的网络结构推断。
{"title":"Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning.","authors":"Antti Larjo,&nbsp;Harri Lähdesmäki","doi":"10.1186/s13637-015-0024-7","DOIUrl":"https://doi.org/10.1186/s13637-015-0024-7","url":null,"abstract":"<p><p>Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2015 ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2015-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-015-0024-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34832523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A comparison study of optimal and suboptimal intervention policies for gene regulatory networks in the presence of uncertainty. 存在不确定性的基因调控网络的最优和次优干预政策的比较研究。
Pub Date : 2014-04-03 DOI: 10.1186/1687-4153-2014-6
Mohammadmahdi R Yousefi, Edward R Dougherty

Perfect knowledge of the underlying state transition probabilities is necessary for designing an optimal intervention strategy for a given Markovian genetic regulatory network. However, in many practical situations, the complex nature of the network and/or identification costs limit the availability of such perfect knowledge. To address this difficulty, we propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, each assigned some probability, which reflects our prior knowledge about the system. We define the objective function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, starting from a prior probability distribution over the uncertainty class and collecting samples from the process over time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and adaptive.

对于给定的马尔可夫遗传调控网络,充分了解潜在状态转移概率是设计最佳干预策略的必要条件。然而,在许多实际情况下,网络的复杂性和/或识别成本限制了这种完美知识的可用性。为了解决这个困难,我们建议采用贝叶斯方法,并将感兴趣的系统表示为几个模型的不确定性类,每个模型分配一些概率,这反映了我们对系统的先验知识。我们将目标函数定义为相对于不确定性类别的概率分布的期望成本,并制定了使该成本函数最小化的最优贝叶斯鲁棒干预策略。所得到的策略对于不确定性类中的固定元素可能不是最优的,但是当在不确定性类中平均时,它是最优的。此外,从不确定性类的先验概率分布开始,随着时间的推移从过程中收集样本,可以将先验分布更新为后验分布,并找到相对于后验分布的相应最优贝叶斯鲁棒策略。因此,最优干预策略本质上是非平稳和自适应的。
{"title":"A comparison study of optimal and suboptimal intervention policies for gene regulatory networks in the presence of uncertainty.","authors":"Mohammadmahdi R Yousefi,&nbsp;Edward R Dougherty","doi":"10.1186/1687-4153-2014-6","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-6","url":null,"abstract":"<p><p>Perfect knowledge of the underlying state transition probabilities is necessary for designing an optimal intervention strategy for a given Markovian genetic regulatory network. However, in many practical situations, the complex nature of the network and/or identification costs limit the availability of such perfect knowledge. To address this difficulty, we propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, each assigned some probability, which reflects our prior knowledge about the system. We define the objective function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, starting from a prior probability distribution over the uncertainty class and collecting samples from the process over time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and adaptive. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2014-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32242551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Analysis of gene network robustness based on saturated fixed point attractors. 基于饱和不动点吸引子的基因网络鲁棒性分析。
Pub Date : 2014-03-20 DOI: 10.1186/1687-4153-2014-4
Genyuan Li, Herschel Rabitz

The analysis of gene network robustness to noise and mutation is important for fundamental and practical reasons. Robustness refers to the stability of the equilibrium expression state of a gene network to variations of the initial expression state and network topology. Numerical simulation of these variations is commonly used for the assessment of robustness. Since there exists a great number of possible gene network topologies and initial states, even millions of simulations may be still too small to give reliable results. When the initial and equilibrium expression states are restricted to being saturated (i.e., their elements can only take values 1 or -1 corresponding to maximum activation and maximum repression of genes), an analytical gene network robustness assessment is possible. We present this analytical treatment based on determination of the saturated fixed point attractors for sigmoidal function models. The analysis can determine (a) for a given network, which and how many saturated equilibrium states exist and which and how many saturated initial states converge to each of these saturated equilibrium states and (b) for a given saturated equilibrium state or a given pair of saturated equilibrium and initial states, which and how many gene networks, referred to as viable, share this saturated equilibrium state or the pair of saturated equilibrium and initial states. We also show that the viable networks sharing a given saturated equilibrium state must follow certain patterns. These capabilities of the analytical treatment make it possible to properly define and accurately determine robustness to noise and mutation for gene networks. Previous network research conclusions drawn from performing millions of simulations follow directly from the results of our analytical treatment. Furthermore, the analytical results provide criteria for the identification of model validity and suggest modified models of gene network dynamics. The yeast cell-cycle network is used as an illustration of the practical application of this analytical treatment.

基因网络对噪声和突变的鲁棒性分析具有重要的基础和现实意义。鲁棒性是指基因网络的平衡表达状态对初始表达状态和网络拓扑结构变化的稳定性。这些变化的数值模拟通常用于鲁棒性评估。由于存在大量可能的基因网络拓扑结构和初始状态,即使进行数百万次模拟也可能太小,无法给出可靠的结果。当初始和平衡表达状态被限制为饱和时(即它们的元素只能取1或-1,对应于基因的最大激活和最大抑制),分析基因网络稳健性评估是可能的。我们基于s型函数模型饱和不动点吸引子的确定给出了这种解析处理。分析可以确定(a)对于给定的网络,存在哪些饱和平衡状态以及存在多少饱和平衡状态以及存在哪些饱和初始状态以及有多少饱和初始状态收敛于这些饱和平衡状态中的每一个;(b)对于给定的饱和平衡状态或给定的一对饱和平衡状态和初始状态,哪些和多少基因网络,被称为可行的,共享这个饱和平衡状态或一对饱和平衡状态和初始状态。我们还证明了共享给定饱和平衡状态的可行网络必须遵循某些模式。分析处理的这些能力使得正确定义和准确确定基因网络对噪声和突变的稳健性成为可能。先前的网络研究结论是从执行数百万次模拟得出的,直接遵循我们分析处理的结果。此外,分析结果为模型有效性的识别提供了标准,并提出了基因网络动力学的修正模型。酵母细胞周期网络被用作这种分析处理的实际应用的例证。
{"title":"Analysis of gene network robustness based on saturated fixed point attractors.","authors":"Genyuan Li,&nbsp;Herschel Rabitz","doi":"10.1186/1687-4153-2014-4","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-4","url":null,"abstract":"<p><p>The analysis of gene network robustness to noise and mutation is important for fundamental and practical reasons. Robustness refers to the stability of the equilibrium expression state of a gene network to variations of the initial expression state and network topology. Numerical simulation of these variations is commonly used for the assessment of robustness. Since there exists a great number of possible gene network topologies and initial states, even millions of simulations may be still too small to give reliable results. When the initial and equilibrium expression states are restricted to being saturated (i.e., their elements can only take values 1 or -1 corresponding to maximum activation and maximum repression of genes), an analytical gene network robustness assessment is possible. We present this analytical treatment based on determination of the saturated fixed point attractors for sigmoidal function models. The analysis can determine (a) for a given network, which and how many saturated equilibrium states exist and which and how many saturated initial states converge to each of these saturated equilibrium states and (b) for a given saturated equilibrium state or a given pair of saturated equilibrium and initial states, which and how many gene networks, referred to as viable, share this saturated equilibrium state or the pair of saturated equilibrium and initial states. We also show that the viable networks sharing a given saturated equilibrium state must follow certain patterns. These capabilities of the analytical treatment make it possible to properly define and accurately determine robustness to noise and mutation for gene networks. Previous network research conclusions drawn from performing millions of simulations follow directly from the results of our analytical treatment. Furthermore, the analytical results provide criteria for the identification of model validity and suggest modified models of gene network dynamics. The yeast cell-cycle network is used as an illustration of the practical application of this analytical treatment. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32192610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Tracking of time-varying genomic regulatory networks with a LASSO-Kalman smoother. 用拉索-卡尔曼平滑器跟踪时变基因组调控网络。
Pub Date : 2014-02-12 DOI: 10.1186/1687-4153-2014-3
Jehandad Khan, Nidhal Bouaynaya, Hassan M Fathallah-Shaykh

: It is widely accepted that cellular requirements and environmental conditions dictate the architecture of genetic regulatory networks. Nonetheless, the status quo in regulatory network modeling and analysis assumes an invariant network topology over time. In this paper, we refocus on a dynamic perspective of genetic networks, one that can uncover substantial topological changes in network structure during biological processes such as developmental growth. We propose a novel outlook on the inference of time-varying genetic networks, from a limited number of noisy observations, by formulating the network estimation as a target tracking problem. We overcome the limited number of observations (small n large p problem) by performing tracking in a compressed domain. Assuming linear dynamics, we derive the LASSO-Kalman smoother, which recursively computes the minimum mean-square sparse estimate of the network connectivity at each time point. The LASSO operator, motivated by the sparsity of the genetic regulatory networks, allows simultaneous signal recovery and compression, thereby reducing the amount of required observations. The smoothing improves the estimation by incorporating all observations. We track the time-varying networks during the life cycle of the Drosophila melanogaster. The recovered networks show that few genes are permanent, whereas most are transient, acting only during specific developmental phases of the organism.

人们普遍认为细胞的需求和环境条件决定了基因调控网络的结构。尽管如此,监管网络建模和分析的现状假设网络拓扑随着时间的推移是不变的。在本文中,我们重新关注遗传网络的动态视角,一个可以揭示在发育生长等生物过程中网络结构的实质性拓扑变化。通过将网络估计表述为目标跟踪问题,我们对时变遗传网络从有限数量的噪声观测进行推理提出了一种新的观点。通过在压缩域中进行跟踪,克服了观测数有限(小n大p问题)的问题。假设线性动力学,我们推导出拉索-卡尔曼平滑,它递归地计算每个时间点网络连通性的最小均方稀疏估计。LASSO算子,由遗传调控网络的稀疏性驱动,允许同时恢复和压缩信号,从而减少所需的观测量。平滑通过合并所有观测值来改进估计。我们在黑腹果蝇的生命周期中跟踪时变网络。恢复的网络表明,很少有基因是永久的,而大多数是短暂的,只在生物体的特定发育阶段起作用。
{"title":"Tracking of time-varying genomic regulatory networks with a LASSO-Kalman smoother.","authors":"Jehandad Khan,&nbsp;Nidhal Bouaynaya,&nbsp;Hassan M Fathallah-Shaykh","doi":"10.1186/1687-4153-2014-3","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-3","url":null,"abstract":"<p><p>: It is widely accepted that cellular requirements and environmental conditions dictate the architecture of genetic regulatory networks. Nonetheless, the status quo in regulatory network modeling and analysis assumes an invariant network topology over time. In this paper, we refocus on a dynamic perspective of genetic networks, one that can uncover substantial topological changes in network structure during biological processes such as developmental growth. We propose a novel outlook on the inference of time-varying genetic networks, from a limited number of noisy observations, by formulating the network estimation as a target tracking problem. We overcome the limited number of observations (small n large p problem) by performing tracking in a compressed domain. Assuming linear dynamics, we derive the LASSO-Kalman smoother, which recursively computes the minimum mean-square sparse estimate of the network connectivity at each time point. The LASSO operator, motivated by the sparsity of the genetic regulatory networks, allows simultaneous signal recovery and compression, thereby reducing the amount of required observations. The smoothing improves the estimation by incorporating all observations. We track the time-varying networks during the life cycle of the Drosophila melanogaster. The recovered networks show that few genes are permanent, whereas most are transient, acting only during specific developmental phases of the organism. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2014-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32107151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. 基于最小正解法的高分辨率频率估计器有效预测基因
Pub Date : 2014-01-04 DOI: 10.1186/1687-4153-2014-2
Manidipa Roy, Soma Barman

Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method.

子空间的线性代数概念在最近的频谱估计技术中发挥了重要作用。在本文中,作者利用噪声子空间概念来寻找 DNA 序列中隐藏的周期性。随着基因组序列的大量增加,准确识别 DNA 中蛋白质编码区的需求也日益高涨。最近出现了几种涉及不同交叉领域的 DNA 特征提取技术,其中最重要的是数字信号处理工具的应用。众所周知,编码区段具有 3 个碱基的周期性,而非编码区段则没有这一独特特征。基于子空间概念的最重要频谱分析技术之一是最小正值法。本文开发的最小正估计器在编码区域显示出尖锐的 3 基周期峰,完全消除了背景噪声。通过对来自不同生物体的多个基因进行比较,将本文提出的方法与现有的滑动离散傅里叶变换(SDFT)方法(俗称修正周期图法)进行了比较,结果表明本文提出的方法在基因预测方面具有更好的效果。分辨率、品质因数、灵敏度、特异性、失误率和错误率被用来确定最小正态基因预测方法优于现有方法。
{"title":"Effective gene prediction by high resolution frequency estimator based on least-norm solution technique.","authors":"Manidipa Roy, Soma Barman","doi":"10.1186/1687-4153-2014-2","DOIUrl":"10.1186/1687-4153-2014-2","url":null,"abstract":"<p><p>Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2014-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3895782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31998149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 2D graphical representation of the sequences of DNA based on triplets and its application. 基于三联体的 DNA 序列二维图形表示法及其应用。
Pub Date : 2014-01-02 DOI: 10.1186/1687-4153-2014-1
Sai Zou, Lei Wang, Junfeng Wang

In this paper, we first present a new concept of 'weight' for 64 triplets and define a different weight for each kind of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of β-globin genes of 11 species illustrate the utilities of our newly proposed method.

在本文中,我们首先为 64 个三连串提出了一个新的 "权重 "概念,并为每种三连串定义了不同的权重。然后,我们给出了一种新颖的 DNA 序列二维图形表示法,它可以将 DNA 序列转化为图集,便于对 DNA 序列进行定量比较。之后,结合新设计的相似度量,我们介绍了一种对 DNA 序列进行相似性/不相似性分析的新方法。最后,在对 11 个物种的β-球蛋白基因的完整编码序列进行相似性/不相似性分析时的应用说明了我们新提出的方法的实用性。
{"title":"A 2D graphical representation of the sequences of DNA based on triplets and its application.","authors":"Sai Zou, Lei Wang, Junfeng Wang","doi":"10.1186/1687-4153-2014-1","DOIUrl":"10.1186/1687-4153-2014-1","url":null,"abstract":"<p><p>In this paper, we first present a new concept of 'weight' for 64 triplets and define a different weight for each kind of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of β-globin genes of 11 species illustrate the utilities of our newly proposed method. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3896961/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31994803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data. CNV/SNP基因型数据单倍型推断的序列蒙特卡罗框架。
Pub Date : 2014-01-01 Epub Date: 2014-04-24 DOI: 10.1186/1687-4153-2014-7
Alexandros Iliadis, Dimitris Anastassiou, Xiaodong Wang

Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme 'Tree-Based Deterministic Sampling CNV' (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv.

拷贝数变异(CNVs)在人类基因组中非常丰富。在全基因组关联研究(GWAS)中,它们与复杂性状相关,并有望在确定疾病表型的病因学方面继续发挥重要作用。由于目前的高通量全基因组单核苷酸多态性(SNP)阵列,我们目前拥有在CNV区域同时具有整数拷贝数和SNP基因型的数据集。与此同时,单倍型在识别疾病特征方面比基因型更有优势,即使SNP基因型可用,但由于计算工具不足,在很大程度上无法用于CNV/SNP数据。我们引入了一种新的框架,用于推断CNV/SNP数据中的单倍型,使用顺序蒙特卡罗采样方案“基于树的确定性采样CNV”(TDSCNV)。我们将我们的方法与polyHap(v2.0)进行了比较,polyHap是目前唯一能够对不同数量标记的数据集进行CNV/SNP基因型推断的软件。我们发现,这两种算法都显示出相似的准确性,但TDSCNV在随标记数量和个体数量线性缩放时要快一个数量级,因此可以在此类数据集中选择单倍型推断方法。我们的方法是在TDSCNV包中实现的,该包可从http://www.ee.columbia.edu/~anastas/tdscnv下载。
{"title":"A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data.","authors":"Alexandros Iliadis,&nbsp;Dimitris Anastassiou,&nbsp;Xiaodong Wang","doi":"10.1186/1687-4153-2014-7","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-7","url":null,"abstract":"<p><p>Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme 'Tree-Based Deterministic Sampling CNV' (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32373692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference. 非线性动态系统稀疏参数估计的正则化EM算法及其在基因调控网络推理中的应用。
Pub Date : 2014-01-01 Epub Date: 2014-04-03 DOI: 10.1186/1687-4153-2014-5
Bin Jia, Xiaodong Wang

Parameter estimation in dynamic systems finds applications in various disciplines, including system biology. The well-known expectation-maximization (EM) algorithm is a popular method and has been widely used to solve system identification and parameter estimation problems. However, the conventional EM algorithm cannot exploit the sparsity. On the other hand, in gene regulatory network inference problems, the parameters to be estimated often exhibit sparse structure. In this paper, a regularized expectation-maximization (rEM) algorithm for sparse parameter estimation in nonlinear dynamic systems is proposed that is based on the maximum a posteriori (MAP) estimation and can incorporate the sparse prior. The expectation step involves the forward Gaussian approximation filtering and the backward Gaussian approximation smoothing. The maximization step employs a re-weighted iterative thresholding method. The proposed algorithm is then applied to gene regulatory network inference. Results based on both synthetic and real data show the effectiveness of the proposed algorithm.

动态系统的参数估计在包括系统生物学在内的各个学科中都有应用。期望最大化(EM)算法是一种流行的方法,已广泛用于解决系统辨识和参数估计问题。然而,传统的电磁算法不能充分利用稀疏性。另一方面,在基因调控网络推理问题中,待估计的参数往往呈现稀疏结构。提出了一种基于最大后验(MAP)估计并结合稀疏先验的非线性动态系统稀疏参数估计正则化期望最大化算法。期望步包括前向高斯逼近滤波和后向高斯逼近平滑。最大化步骤采用重新加权迭代阈值法。将该算法应用于基因调控网络的推理。基于合成数据和实际数据的结果表明了该算法的有效性。
{"title":"Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference.","authors":"Bin Jia,&nbsp;Xiaodong Wang","doi":"10.1186/1687-4153-2014-5","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-5","url":null,"abstract":"<p><p>Parameter estimation in dynamic systems finds applications in various disciplines, including system biology. The well-known expectation-maximization (EM) algorithm is a popular method and has been widely used to solve system identification and parameter estimation problems. However, the conventional EM algorithm cannot exploit the sparsity. On the other hand, in gene regulatory network inference problems, the parameters to be estimated often exhibit sparse structure. In this paper, a regularized expectation-maximization (rEM) algorithm for sparse parameter estimation in nonlinear dynamic systems is proposed that is based on the maximum a posteriori (MAP) estimation and can incorporate the sparse prior. The expectation step involves the forward Gaussian approximation filtering and the backward Gaussian approximation smoothing. The maximization step employs a re-weighted iterative thresholding method. The proposed algorithm is then applied to gene regulatory network inference. Results based on both synthetic and real data show the effectiveness of the proposed algorithm. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32243163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
EURASIP journal on bioinformatics & systems biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1