EURASIP journal on bioinformatics & systems biology最新文献_第3页

Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity. 离散傅立叶间系数差在基因序列相似性评估中的应用。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2014-01-01 Epub Date: 2014-05-28 DOI: 10.1186/1687-4153-2014-8

Brian R King, Maurice Aburdene, Alex Thompson, Zach Warres

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

由于这些序列固有的数字性质，用于生物序列分析的数字信号处理(DSP)技术继续受到欢迎。DSP方法在检测基因编码区域方面已经取得了早期的成功。最近，这些方法被用于建立DNA基因相似性。我们提出了系数间差分(ICD)变换，这是离散傅里叶变换的一种新扩展，可以应用于任何DNA序列。ICD方法是一种数学的、无比对的DNA比较方法，可为任何DNA序列生成遗传标记，用于生成DNA序列之间相似性的相对度量。我们在进化范围广泛的物种中获得的一组胰岛素基因和一组禽流感病毒序列上展示了我们的方法，这些序列代表了一组高度相似的序列。我们比较了使用我们的技术生成的系统发育树与使用传统比对技术生成的树的相似性，并证明了ICD方法可以产生高度精确的树，而无需在建立序列相似性之前进行比对。

{"title":"Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.","authors":"Brian R King, Maurice Aburdene, Alex Thompson, Zach Warres","doi":"10.1186/1687-4153-2014-8","DOIUrl":"https://doi.org/10.1186/1687-4153-2014-8","url":null,"abstract":"Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2014-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32476629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Learning restricted Boolean network model by time-series data. 通过时间序列数据学习受限布尔网络模型

EURASIP journal on bioinformatics & systems biology

Pub Date : 2014-01-01 Epub Date: 2014-07-15 DOI: 10.1186/s13637-014-0010-5

Hongjia Ouyang, Jie Fang, Liangzhong Shen, Edward R Dougherty, Wenbin Liu

Restricted Boolean networks are simplified Boolean networks that are required for either negative or positive regulations between genes. Higa et al. (BMC Proc 5:S5, 2011) proposed a three-rule algorithm to infer a restricted Boolean network from time-series data. However, the algorithm suffers from a major drawback, namely, it is very sensitive to noise. In this paper, we systematically analyze the regulatory relationships between genes based on the state switch of the target gene and propose an algorithm with which restricted Boolean networks may be inferred from time-series data. We compare the proposed algorithm with the three-rule algorithm and the best-fit algorithm based on both synthetic networks and a well-studied budding yeast cell cycle network. The performance of the algorithms is evaluated by three distance metrics: the normalized-edge Hamming distance [Formula: see text], the normalized Hamming distance of state transition [Formula: see text], and the steady-state distribution distance μ (ssd). Results show that the proposed algorithm outperforms the others according to both [Formula: see text] and [Formula: see text], whereas its performance according to μ (ssd) is intermediate between best-fit and the three-rule algorithms. Thus, our new algorithm is more appropriate for inferring interactions between genes from time-series data.

限制性布尔网络是简化的布尔网络，基因之间的负向或正向调控都需要它。Higa 等人（BMC Proc 5:S5, 2011）提出了一种从时间序列数据推断限制性布尔网络的三规则算法。然而，该算法存在一个主要缺点，即对噪声非常敏感。在本文中，我们根据目标基因的状态开关系统地分析了基因之间的调控关系，并提出了一种可以从时间序列数据中推断出受限布尔网络的算法。我们将所提出的算法与三规则算法和最佳拟合算法进行了比较，这两种算法都是基于合成网络和经过充分研究的萌发酵母细胞周期网络。算法的性能通过三个距离指标来评估：归一化边汉明距离[公式：见正文]、状态转换的归一化汉明距离[公式：见正文]和稳态分布距离μ (ssd)。结果表明，根据[公式：见正文]和[公式：见正文]，拟议算法的性能优于其他算法，而根据μ (ssd)，其性能介于最佳拟合算法和三规则算法之间。因此，我们的新算法更适合从时间序列数据中推断基因间的相互作用。

{"title":"Learning restricted Boolean network model by time-series data.","authors":"Hongjia Ouyang, Jie Fang, Liangzhong Shen, Edward R Dougherty, Wenbin Liu","doi":"10.1186/s13637-014-0010-5","DOIUrl":"10.1186/s13637-014-0010-5","url":null,"abstract":"Restricted Boolean networks are simplified Boolean networks that are required for either negative or positive regulations between genes. Higa et al. (BMC Proc 5:S5, 2011) proposed a three-rule algorithm to infer a restricted Boolean network from time-series data. However, the algorithm suffers from a major drawback, namely, it is very sensitive to noise. In this paper, we systematically analyze the regulatory relationships between genes based on the state switch of the target gene and propose an algorithm with which restricted Boolean networks may be inferred from time-series data. We compare the proposed algorithm with the three-rule algorithm and the best-fit algorithm based on both synthetic networks and a well-studied budding yeast cell cycle network. The performance of the algorithms is evaluated by three distance metrics: the normalized-edge Hamming distance [Formula: see text], the normalized Hamming distance of state transition [Formula: see text], and the steady-state distribution distance μ (ssd). Results show that the proposed algorithm outperforms the others according to both [Formula: see text] and [Formula: see text], whereas its performance according to μ (ssd) is intermediate between best-fit and the three-rule algorithms. Thus, our new algorithm is more appropriate for inferring interactions between genes from time-series data. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2014 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4107581/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32561156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gene regulatory network inference by point-based Gaussian approximation filters incorporating the prior information. 结合先验信息的基于点高斯近似滤波器的基因调控网络推断。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-12-17 DOI: 10.1186/1687-4153-2013-16

Bin Jia, Xiaodong Wang

: The extended Kalman filter (EKF) has been applied to inferring gene regulatory networks. However, it is well known that the EKF becomes less accurate when the system exhibits high nonlinearity. In addition, certain prior information about the gene regulatory network exists in practice, and no systematic approach has been developed to incorporate such prior information into the Kalman-type filter for inferring the structure of the gene regulatory network. In this paper, an inference framework based on point-based Gaussian approximation filters that can exploit the prior information is developed to solve the gene regulatory network inference problem. Different point-based Gaussian approximation filters, including the unscented Kalman filter (UKF), the third-degree cubature Kalman filter (CKF3), and the fifth-degree cubature Kalman filter (CKF5) are employed. Several types of network prior information, including the existing network structure information, sparsity assumption, and the range constraint of parameters, are considered, and the corresponding filters incorporating the prior information are developed. Experiments on a synthetic network of eight genes and the yeast protein synthesis network of five genes are carried out to demonstrate the performance of the proposed framework. The results show that the proposed methods provide more accurate inference results than existing methods, such as the EKF and the traditional UKF.

扩展卡尔曼滤波(EKF)已被应用于基因调控网络的推断。然而，众所周知，当系统表现出高度非线性时，EKF的精度会降低。此外，基因调控网络在实践中存在一定的先验信息，目前还没有系统的方法将这些先验信息纳入卡尔曼型滤波器中来推断基因调控网络的结构。本文提出了一种利用先验信息的基于点高斯近似滤波器的基因调控网络推理框架。采用了不同的基于点的高斯逼近滤波器，包括unscented卡尔曼滤波器(UKF)、三度cubature Kalman滤波器(CKF3)和五度cubature Kalman滤波器(CKF5)。考虑了现有网络结构信息、稀疏性假设和参数范围约束等几种网络先验信息，并开发了包含先验信息的相应滤波器。在8个基因的合成网络和5个基因的酵母蛋白合成网络上进行了实验，验证了该框架的性能。结果表明，该方法比现有的EKF和传统UKF方法提供了更准确的推理结果。

{"title":"Gene regulatory network inference by point-based Gaussian approximation filters incorporating the prior information.","authors":"Bin Jia, Xiaodong Wang","doi":"10.1186/1687-4153-2013-16","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-16","url":null,"abstract":": The extended Kalman filter (EKF) has been applied to inferring gene regulatory networks. However, it is well known that the EKF becomes less accurate when the system exhibits high nonlinearity. In addition, certain prior information about the gene regulatory network exists in practice, and no systematic approach has been developed to incorporate such prior information into the Kalman-type filter for inferring the structure of the gene regulatory network. In this paper, an inference framework based on point-based Gaussian approximation filters that can exploit the prior information is developed to solve the gene regulatory network inference problem. Different point-based Gaussian approximation filters, including the unscented Kalman filter (UKF), the third-degree cubature Kalman filter (CKF3), and the fifth-degree cubature Kalman filter (CKF5) are employed. Several types of network prior information, including the existing network structure information, sparsity assumption, and the range constraint of parameters, are considered, and the corresponding filters incorporating the prior information are developed. Experiments on a synthetic network of eight genes and the yeast protein synthesis network of five genes are carried out to demonstrate the performance of the proposed framework. The results show that the proposed methods provide more accurate inference results than existing methods, such as the EKF and the traditional UKF. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2013-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-16","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31957258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On the impoverishment of scientific education. 论科学教育的贫困化。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-11-11 DOI: 10.1186/1687-4153-2013-15

Edward R Dougherty

Hannah Arendt, one of the foremost political philosophers of the twentieth century, has argued that it is the responsibility of educators not to leave children in their own world but instead to bring them into the adult world so that, as adults, they can carry civilization forward to whatever challenges it will face by bringing to bear the learning of the past. In the same collection of essays, she discusses the recognition by modern science that Nature is inconceivable in terms of ordinary human conceptual categories - as she writes, 'unthinkable in terms of pure reason'. Together, these views on scientific education lead to an educational process that transforms children into adults, with a scientific adult being one who has the ability to conceptualize scientific systems independent of ordinary physical intuition. This article begins with Arendt's basic educational and scientific points and develops from them a critique of current scientific education in conjunction with an appeal to educate young scientists in a manner that allows them to fulfill their potential 'on the shoulders of giants'. While the article takes a general philosophical perspective, its specifics tend to be directed at biomedical education, in particular, how such education pertains to translational science.

20世纪最重要的政治哲学家之一汉娜·阿伦特(Hannah Arendt)认为，教育者的责任不是把孩子留在他们自己的世界里，而是把他们带入成人的世界，这样，作为成年人，他们可以通过学习过去的经验，将文明推向前进，迎接它将面临的任何挑战。在同一文集中，她讨论了现代科学的认识，即从普通人类的概念范畴来看，自然是不可想象的——正如她所写的，“从纯粹理性的角度来看，不可想象”。总之，这些关于科学教育的观点导致了一个将儿童转变为成人的教育过程，一个科学的成年人是一个有能力独立于普通的物理直觉将科学系统概念化的人。本文从阿伦特的基本教育和科学观点开始，从这些观点出发，对当前的科学教育进行了批判，并呼吁以一种允许他们“站在巨人的肩膀上”发挥潜力的方式教育年轻科学家。虽然这篇文章采取了一般的哲学观点，但它的具体内容往往是针对生物医学教育，特别是这种教育如何与转化科学相关。

{"title":"On the impoverishment of scientific education.","authors":"Edward R Dougherty","doi":"10.1186/1687-4153-2013-15","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-15","url":null,"abstract":"Hannah Arendt, one of the foremost political philosophers of the twentieth century, has argued that it is the responsibility of educators not to leave children in their own world but instead to bring them into the adult world so that, as adults, they can carry civilization forward to whatever challenges it will face by bringing to bear the learning of the past. In the same collection of essays, she discusses the recognition by modern science that Nature is inconceivable in terms of ordinary human conceptual categories - as she writes, 'unthinkable in terms of pure reason'. Together, these views on scientific education lead to an educational process that transforms children into adults, with a scientific adult being one who has the ability to conceptualize scientific systems independent of ordinary physical intuition. This article begins with Arendt's basic educational and scientific points and develops from them a critique of current scientific education in conjunction with an appeal to educate young scientists in a manner that allows them to fulfill their potential 'on the shoulders of giants'. While the article takes a general philosophical perspective, its specifics tend to be directed at biomedical education, in particular, how such education pertains to translational science. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2013-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31851697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Identification of genomic functional hotspots with copy number alteration in liver cancer. 肝癌拷贝数改变的基因组功能热点的鉴定。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-10-25 DOI: 10.1186/1687-4153-2013-14

Tzu-Hung Hsiao, Hung-I Harry Chen, Stephanie Roessler, Xin Wei Wang, Yidong Chen

Copy number alterations (CNAs) can be observed in most of cancer patients. Several oncogenes and tumor suppressor genes with CNAs have been identified in different kinds of tumor. However, the systematic survey of CNA-affected functions is still lack. By employing systems biology approaches, instead of examining individual genes, we directly identified the functional hotspots on human genome. A total of 838 hotspots on human genome with 540 enriched Gene Ontology functions were identified. Seventy-six aCGH array data of hepatocellular carcinoma (HCC) tumors were employed in this study. A total of 150 regions which putatively affected by CNAs and the encoded functions were identified. Our results indicate that two immune related hotspots had copy number alterations in most of patients. In addition, our data implied that these immune-related regions might be involved in HCC oncogenesis. Also, we identified 39 hotspots of which copy number status were associated with patient survival. Our data implied that copy number alterations of the regions may contribute in the dysregulation of the encoded functions. These results further demonstrated that our method enables researchers to survey biological functions of CNAs and to construct regulation hypothesis at pathway and functional levels.

拷贝数改变(CNAs)可在大多数癌症患者中观察到。在不同类型的肿瘤中已经发现了几种与CNAs相关的癌基因和抑癌基因。然而，对cna影响功能的系统调查仍然缺乏。采用系统生物学方法，直接识别人类基因组上的功能热点，而不是对单个基因进行检测。共鉴定出838个人类基因组热点和540个富集的基因本体功能。本研究采用76例肝细胞癌(HCC)肿瘤的aCGH阵列数据。共鉴定出150个可能受CNAs影响的区域及其编码功能。我们的结果表明，在大多数患者中，两个免疫相关热点的拷贝数发生了改变。此外，我们的数据表明这些免疫相关区域可能参与了HCC的发生。此外，我们确定了39个拷贝数状态与患者生存相关的热点。我们的数据表明，这些区域的拷贝数改变可能导致编码功能的失调。这些结果进一步表明，我们的方法使研究人员能够在通路和功能水平上研究CNAs的生物学功能，并构建调控假说。

{"title":"Identification of genomic functional hotspots with copy number alteration in liver cancer.","authors":"Tzu-Hung Hsiao, Hung-I Harry Chen, Stephanie Roessler, Xin Wei Wang, Yidong Chen","doi":"10.1186/1687-4153-2013-14","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-14","url":null,"abstract":"Copy number alterations (CNAs) can be observed in most of cancer patients. Several oncogenes and tumor suppressor genes with CNAs have been identified in different kinds of tumor. However, the systematic survey of CNA-affected functions is still lack. By employing systems biology approaches, instead of examining individual genes, we directly identified the functional hotspots on human genome. A total of 838 hotspots on human genome with 540 enriched Gene Ontology functions were identified. Seventy-six aCGH array data of hepatocellular carcinoma (HCC) tumors were employed in this study. A total of 150 regions which putatively affected by CNAs and the encoded functions were identified. Our results indicate that two immune related hotspots had copy number alterations in most of patients. In addition, our data implied that these immune-related regions might be involved in HCC oncogenesis. Also, we identified 39 hotspots of which copy number status were associated with patient survival. Our data implied that copy number alterations of the regions may contribute in the dysregulation of the encoded functions. These results further demonstrated that our method enables researchers to survey biological functions of CNAs and to construct regulation hypothesis at pathway and functional levels. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2013-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40266587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Bayesian methods for expression-based integration of various types of genomics data. 用于基于表达的各种类型基因组学数据整合的贝叶斯方法。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-09-21 DOI: 10.1186/1687-4153-2013-13

Elizabeth M Jennings, Jeffrey S Morris, Raymond J Carroll, Ganiraju C Manyam, Veerabhadran Baladandayuthapani

: We propose methods to integrate data across several genomic platforms using a hierarchical Bayesian analysis framework that incorporates the biological relationships among the platforms to identify genes whose expression is related to clinical outcomes in cancer. This integrated approach combines information across all platforms, leading to increased statistical power in finding these predictive genes, and further provides mechanistic information about the manner in which the gene affects the outcome. We demonstrate the advantages of the shrinkage estimation used by this approach through a simulation, and finally, we apply our method to a Glioblastoma Multiforme dataset and identify several genes potentially associated with the patients' survival. We find 12 positive prognostic markers associated with nine genes and 13 negative prognostic markers associated with nine genes.

：我们提出了使用分层贝叶斯分析框架整合多个基因组平台数据的方法，该框架结合了平台之间的生物学关系，以识别其表达与癌症临床结果相关的基因。这种综合方法结合了所有平台的信息，提高了发现这些预测基因的统计能力，并进一步提供了有关基因影响结果的机制信息。我们通过模拟证明了这种方法使用的收缩估计的优势，最后，我们将我们的方法应用于多型胶质母细胞瘤数据集，并确定了几个可能与患者生存相关的基因。我们发现12个阳性预后标记与9个基因相关，13个阴性预后标记与九个基因相关。

引用次数: 26

Feature ranking based on synergy networks to identify prognostic markers in DPT-1. 基于协同网络识别DPT-1预后标志物的特征排序。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-09-19 DOI: 10.1186/1687-4153-2013-12

Amin Ahmadi Adl, Xiaoning Qian, Ping Xu, Kendra Vehik, Jeffrey P Krischer

: Interaction among different risk factors plays an important role in the development and progress of complex disease, such as diabetes. However, traditional epidemiological methods often focus on analyzing individual or a few 'essential' risk factors, hopefully to obtain some insights into the etiology of complex disease. In this paper, we propose a systematic framework for risk factor analysis based on a synergy network, which enables better identification of potential risk factors that may serve as prognostic markers for complex disease. A spectral approximate algorithm is derived to solve this network optimization problem, which leads to a new network-based feature ranking method that improves the traditional feature ranking by taking into account the pairwise synergistic interactions among risk factors in addition to their individual predictive power. We first evaluate the performance of our method based on simulated datasets, and then, we use our method to study immunologic and metabolic indices based on the Diabetes Prevention Trial-Type 1 (DPT-1) study that may provide prognostic and diagnostic information regarding the development of type 1 diabetes. The performance comparison based on both simulated and DPT-1 datasets demonstrates that our network-based ranking method provides prognostic markers with higher predictive power than traditional analysis based on individual factors.

不同危险因素之间的相互作用在糖尿病等复杂疾病的发生发展中起着重要作用。然而，传统的流行病学方法往往侧重于分析单个或少数“基本”危险因素，以期对复杂疾病的病因有所了解。在本文中，我们提出了一个基于协同网络的风险因素分析的系统框架，该框架能够更好地识别可能作为复杂疾病预后标记的潜在风险因素。针对这一网络优化问题，提出了一种新的基于网络的特征排序方法，除了考虑风险因素的个体预测能力外，还考虑了风险因素之间的两两协同作用，从而改进了传统的特征排序方法。我们首先基于模拟数据集评估了我们的方法的性能，然后，我们使用我们的方法研究基于糖尿病预防试验-1 (DPT-1)研究的免疫和代谢指标，这些指标可能为1型糖尿病的发展提供预后和诊断信息。基于模拟数据集和DPT-1数据集的性能比较表明，我们基于网络的排序方法提供的预后标记比基于单个因素的传统分析具有更高的预测能力。

{"title":"Feature ranking based on synergy networks to identify prognostic markers in DPT-1.","authors":"Amin Ahmadi Adl, Xiaoning Qian, Ping Xu, Kendra Vehik, Jeffrey P Krischer","doi":"10.1186/1687-4153-2013-12","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-12","url":null,"abstract":": Interaction among different risk factors plays an important role in the development and progress of complex disease, such as diabetes. However, traditional epidemiological methods often focus on analyzing individual or a few 'essential' risk factors, hopefully to obtain some insights into the etiology of complex disease. In this paper, we propose a systematic framework for risk factor analysis based on a synergy network, which enables better identification of potential risk factors that may serve as prognostic markers for complex disease. A spectral approximate algorithm is derived to solve this network optimization problem, which leads to a new network-based feature ranking method that improves the traditional feature ranking by taking into account the pairwise synergistic interactions among risk factors in addition to their individual predictive power. We first evaluate the performance of our method based on simulated datasets, and then, we use our method to study immunologic and metabolic indices based on the Diabetes Prevention Trial-Type 1 (DPT-1) study that may provide prognostic and diagnostic information regarding the development of type 1 diabetes. The performance comparison based on both simulated and DPT-1 datasets demonstrates that our network-based ranking method provides prognostic markers with higher predictive power than traditional analysis based on individual factors. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-12","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31745527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Inferring Boolean network states from partial information. 从部分信息推断布尔网络状态

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-09-05 DOI: 10.1186/1687-4153-2013-11

Guy Karlebach

Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data.

分子相互作用网络调控着活细胞中的关键过程。因此，了解它们的功能是增进生物学知识的重中之重。布尔网络经常被用来对细胞网络进行数学描述，并与实验数据集进行拟合。由于对测量结果的解释并不直截了当，而且数据中含有噪声，因此拟合结果往往含糊不清。为了促进数据集与布尔网络之间更可靠的映射，我们开发了一种算法，可以从被噪声扭曲的数据集中推导出网络轨迹。我们从理论上分析了我们的算法，并使用模拟和微阵列表达数据证明了它的准确性。

引用次数: 0

Scientific knowledge is possible with small-sample classification. 科学知识可以通过小样本分类实现。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-08-20 DOI: 10.1186/1687-4153-2013-10

Edward R Dougherty, Lori A Dalton

: A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.

例如，一篇典型的小样本生物标记物分类论文会根据3万个基因和不到100分的小样本来区分病理类型。一些分类规则被用来从这些数据设计分类器，但是我们没有给出很好的理由或条件，该算法应该表现良好。误差估计规则用于使用相同的数据估计总体上的分类误差，但是再一次，我们没有给出好的理由或条件，这个误差估计器应该产生一个好的估计，因此我们不知道分类器应该表现得有多好。事实上，几乎所有这类论文的误差估计都是高度不准确的。简而言之，我们没有理由提出任何要求。鉴于文献中空洞的小样本分类论文无处不在，人们很容易得出结论，在小样本环境中不可能获得科学知识。这并不是说成千上万的论文公开宣称科学知识在内容上是不可能的;相反，他们使用的方法排除了科学知识。在本文中，我们相反地认为，只要有足够的先验知识，小样本分类中的科学知识是可能的。这里讨论的一种自然的方法是通过模式识别的范例，我们将先验知识纳入整个分类过程(分类器设计和误差估计)，在给定可用信息的情况下优化过程的每一步，并获得分类器和误差估计器的性能的理论度量，后者是关键的认识论问题。总之，我们可以对所提出的小样本分类器及其误差估计进行科学验证。

{"title":"Scientific knowledge is possible with small-sample classification.","authors":"Edward R Dougherty, Lori A Dalton","doi":"10.1186/1687-4153-2013-10","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-10","url":null,"abstract":": A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate. ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2013-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31667997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Integrating multi-platform genomic data using hierarchical Bayesian relevance vector machines. 利用层次贝叶斯相关向量机集成多平台基因组数据。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2013-06-28 DOI: 10.1186/1687-4153-2013-9

Sanvesh Srivastava, Wenyi Wang, Ganiraju Manyam, Carlos Ordonez, Veerabhadran Baladandayuthapani

Background: Recent advances in genome technologies and the subsequent collection of genomic information at various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that acknowledges and models the interaction between platform-specific measurements through nonlinear kernel machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes approach that scales well to large high-throughput datasets.

Results: We apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and microRNA interactions that have known implications in the etiology of GBM.

Conclusions: Our approach gains its flexibility and power by modeling the non-linear interaction structures between and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a freely available software at: http://odin.mdacc.tmc.edu/~vbaladan.

背景:基因组技术的最新进展和随后在各种分子分辨率上收集的基因组信息有望加速发现新的治疗靶点。实现这些目标的关键一步是开发有效的临床预测模型，整合这些不同来源的高通量数据。由于数据中存在高维和复杂的相互作用，这一步具有挑战性。为了预测相关的临床结果，我们提出了一种灵活的统计机器学习方法，该方法通过非线性核机器识别和建模平台特定测量之间的相互作用，并通过分层贝叶斯框架借用平台内部和平台之间的信息。我们的模型有一些参数，这些参数直接解释了平台的影响以及平台内部和平台之间的数据交互。我们模型中的参数估计算法使用计算效率高的变分贝叶斯方法，可以很好地扩展到大型高通量数据集。结果:我们将整合基因/mRNA表达和microRNA谱的方法应用于基于癌症基因组图谱(TCGA)的多形性胶质母细胞瘤(GBM)数据集，以预测患者的生存时间。在预测精度方面，我们表明，我们的非线性和基于交互的综合方法比线性替代方案和不考虑平台之间交互的非综合方法表现得更好。我们还发现了几种与肿瘤侵袭有关的预后mrna和microrna，已知它们在GBM中驱动肿瘤转移和严重的炎症反应。此外，我们的分析揭示了几个有趣的mRNA和microRNA相互作用，这些相互作用在GBM的病因学中具有已知的意义。结论:我们的方法通过建模平台之间和平台内部的非线性交互结构获得了灵活性和能力。我们的框架对生物医学研究人员来说是一个有用的工具，因为使用多平台基因组信息进行临床预测是实现许多癌症个性化治疗的重要一步。我们有一个免费的软件:http://odin.mdacc.tmc.edu/~vbaladan。

{"title":"Integrating multi-platform genomic data using hierarchical Bayesian relevance vector machines.","authors":"Sanvesh Srivastava, Wenyi Wang, Ganiraju Manyam, Carlos Ordonez, Veerabhadran Baladandayuthapani","doi":"10.1186/1687-4153-2013-9","DOIUrl":"https://doi.org/10.1186/1687-4153-2013-9","url":null,"abstract":"Background: Recent advances in genome technologies and the subsequent collection of genomic information at various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that acknowledges and models the interaction between platform-specific measurements through nonlinear kernel machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes approach that scales well to large high-throughput datasets.Results: We apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and microRNA interactions that have known implications in the etiology of GBM.Conclusions: Our approach gains its flexibility and power by modeling the non-linear interaction structures between and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a freely available software at: http://odin.mdacc.tmc.edu/~vbaladan.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2013-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31541607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9