EURASIP journal on bioinformatics & systems biology最新文献

BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. BCC-NER:双向，上下文线索命名的实体标记器，用于基因/蛋白质提及识别。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-12-01 Epub Date: 2017-05-05 DOI: 10.1186/s13637-017-0060-6

Gurusamy Murugesan, Sabenabanu Abdulkadhar, Balu Bhasuran, Jeyakumar Natarajan

Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.

标记生物医学实体(如基因、蛋白质、细胞、细胞系)是生物医学文献挖掘的第一步，也是重要的先决条件。在本文中，我们描述了我们的混合命名实体标记方法，即BCC-NER(双向，上下文线索命名实体标记器用于基因/蛋白质提及识别)。BCC-NER部署了三个模块。第一个模块用于文本处理，包括基本的自然语言处理预处理、特征提取和特征选择。第二个模块是使用双向条件随机场(CRF)进行训练和模型构建，在两个方向(向前和向后)上解析文本，并使用边缘注入放松算法(MIRA)整合向后和向前训练的模型。第三个也是最后一个模块用于后处理，以获得更好的性能，其中包括围绕文本特征，括号不匹配和两层缩写算法。BCC-NER在BioCreative II GM测试语料库上的评价结果，准确率为89.95，召回率为84.15，总体f分为86.95，高于目前其他开源标注器。

{"title":"BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.","authors":"Gurusamy Murugesan, Sabenabanu Abdulkadhar, Balu Bhasuran, Jeyakumar Natarajan","doi":"10.1186/s13637-017-0060-6","DOIUrl":"https://doi.org/10.1186/s13637-017-0060-6","url":null,"abstract":"Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0060-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34972923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Bayesian inference for biomarker discovery in proteomics: an analytic solution. 蛋白质组学中生物标志物发现的贝叶斯推断:分析解决方案。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-12-01 Epub Date: 2017-07-14 DOI: 10.1186/s13637-017-0062-4

Noura Dridi, Audrey Giremus, Jean-Francois Giovannelli, Caroline Truntzer, Melita Hadzagic, Jean-Philippe Charrier, Laurent Gerfault, Patrick Ducoroy, Bruno Lacroix, Pierre Grangeat, Pascal Roy

This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.

本文讨论了蛋白质组学中生物标志物的发现问题。给定关于一组个体的蛋白质列表的临床数据，解决的问题是提取一小部分蛋白质，其浓度是生物状态(健康或病理)的指标。本文将其表述为变量选择的具体实例。该方法的独创性在于不逐个研究蛋白质，而是直接寻求区分蛋白质和非区分蛋白质之间的最佳划分。通过这种方式，在决策中本质上考虑了蛋白质之间的相关性。所开发的策略是在贝叶斯环境中导出的，并且决策是最优的，因为它使全局平均误差最小化。最后，它是基于分区的后验概率。主要的困难是计算这些概率，因为它们是基于所谓的证据，需要将所有未知的模型参数边缘化。提出了两种模型，将状态与蛋白质浓度联系起来，取决于后者是否为生物标志物。第一个模型通过假设浓度是高斯分布的，其平均值和协方差矩阵仅取决于生物标记物的状态，来解释生物变异性。第二个是一个扩展，它也考虑了可能对观测到的浓度产生重大影响的技术变化。本文的主要贡献有:(1)生物标志物选择问题的新贝叶斯公式，(2)无噪声情况下后验概率的封闭形式表达式，以及(3)有噪声情况下的合适近似解。在选定的反应监测模式下，对这些方法进行数值评估，并与最先进的方法(t检验、LASSO、Battacharyya距离、FOHSIC)进行比较，使用质谱法对人血清中蛋白质的合成和真实数据进行定量。

{"title":"Bayesian inference for biomarker discovery in proteomics: an analytic solution.","authors":"Noura Dridi, Audrey Giremus, Jean-Francois Giovannelli, Caroline Truntzer, Melita Hadzagic, Jean-Philippe Charrier, Laurent Gerfault, Patrick Ducoroy, Bruno Lacroix, Pierre Grangeat, Pascal Roy","doi":"10.1186/s13637-017-0062-4","DOIUrl":"10.1186/s13637-017-0062-4","url":null,"abstract":"This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0062-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35173527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Review of stochastic hybrid systems with applications in biological systems modeling and analysis. 随机混合系统及其在生物系统建模和分析中的应用综述。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-12-01 Epub Date: 2017-06-30 DOI: 10.1186/s13637-017-0061-5

Xiangfang Li, Oluwaseyi Omotere, Lijun Qian, Edward R Dougherty

Stochastic hybrid systems (SHS) have attracted a lot of research interests in recent years. In this paper, we review some of the recent applications of SHS to biological systems modeling and analysis. Due to the nature of molecular interactions, many biological processes can be conveniently described as a mixture of continuous and discrete phenomena employing SHS models. With the advancement of SHS theory, it is expected that insights can be obtained about biological processes such as drug effects on gene regulation. Furthermore, combining with advanced experimental methods, in silico simulations using SHS modeling techniques can be carried out for massive and rapid verification or falsification of biological hypotheses. The hope is to substitute costly and time-consuming in vitro or in vivo experiments or provide guidance for those experiments and generate better hypotheses.

近年来，随机混合系统(SHS)引起了广泛的研究兴趣。本文综述了近年来SHS在生物系统建模和分析中的一些应用。由于分子相互作用的性质，许多生物过程可以方便地描述为使用SHS模型的连续和离散现象的混合。随着SHS理论的发展，有望对药物对基因调控的作用等生物学过程有更深入的了解。此外，结合先进的实验方法，使用SHS建模技术的计算机模拟可以进行大规模和快速的验证或伪造生物学假设。希望能够替代昂贵且耗时的体外或体内实验，或为这些实验提供指导，并产生更好的假设。

引用次数: 23

From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data. 从蛋白质相互作用到蛋白质共表达网络:评估大规模蛋白质组学数据的新视角。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-12-01 Epub Date: 2017-03-20 DOI: 10.1186/s13637-017-0059-z

Danila Vella, Italo Zoppis, Giancarlo Mauri, Pierluigi Mauri, Dario Di Silvestre

The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.

在分子生物学的第一阶段，将生物系统分解成其组成部分的还原论方法已经成功地阐明了一些生物过程的化学基础。这些知识帮助生物学家了解生物系统的复杂性，证明大多数生物功能不是由单个分子产生的;因此，认识到生物系统的涌现特性不能通过研究单个分子而不考虑它们之间的关系来解释或预测。随着当前组学技术的进步和对分子关系认识的加深，越来越多的研究开始利用图论的方法来评价生物系统。基因组学和蛋白质组学数据通常与蛋白质-蛋白质相互作用(PPI)网络相结合，其结构通常通过算法和工具进行分析，以表征枢纽/瓶颈以及拓扑、功能和疾病模块。另一方面，共表达网络代表了一种补充程序，它提供了在系统水平上评估包括缺乏ppi信息的生物体的机会。基于这些前提，我们向读者介绍了PPI和共表达网络，包括重建和分析方面。特别是，将讨论利用共表达网络评估大规模蛋白质组学数据的新思路，并给出一些应用实例。将展示它们在推断生物知识方面的应用，并特别关注拓扑和模块分析。

{"title":"From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data.","authors":"Danila Vella, Italo Zoppis, Giancarlo Mauri, Pierluigi Mauri, Dario Di Silvestre","doi":"10.1186/s13637-017-0059-z","DOIUrl":"https://doi.org/10.1186/s13637-017-0059-z","url":null,"abstract":"The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0059-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34971828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

On biometric systems: electrocardiogram Gaussianity and data synthesis. 生物识别系统:心电图高斯性和数据合成。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-12-01 Epub Date: 2017-02-21 DOI: 10.1186/s13637-017-0056-2

Wael Louis, Shahad Abdulnour, Sahar Javaher Haghighi, Dimitrios Hatzinakos

Electrocardiogram is a slow signal to acquire, and it is prone to noise. It can be inconvenient to collect large number of ECG heartbeats in order to train a reliable biometric system; hence, this issue might result in a small sample size phenomenon which occurs when the number of samples is much smaller than the number of observations to model. In this paper, we study ECG heartbeat Gaussianity and we generate synthesized data to increase the number of observations. Data synthesis, in this paper, is based on our hypothesis, which we support, that ECG heartbeats exhibit a multivariate normal distribution; therefore, one can generate ECG heartbeats from such distribution. This distribution is deviated from Gaussianity due to internal and external factors that change ECG morphology such as noise, diet, physical and psychological changes, and other factors, but we attempt to capture the underlying Gaussianity of the heartbeats. When this method was implemented for a biometric system and was examined on the University of Toronto database of 1012 subjects, an equal error rate (EER) of 6.71% was achieved in comparison to 9.35% to the same system but without data synthesis. Dimensionality reduction is widely examined in the problem of small sample size; however, our results suggest that using the proposed data synthesis outperformed several dimensionality reduction techniques by at least 3.21% in EER. With small sample size, classifier instability becomes a bigger issue and we used a parallel classifier scheme to reduce it. Each classifier in the parallel classifier is trained with the same genuine dataset but different imposter datasets. The parallel classifier has reduced predictors' true acceptance rate instability from 6.52% standard deviation to 1.94% standard deviation.

心电图是一种缓慢的信号，而且容易产生噪声。为了训练可靠的生物识别系统，采集大量心电心跳数据不方便;因此，这个问题可能会导致小样本现象，当样本数量远远小于要建模的观测数量时，就会发生这种现象。在本文中，我们研究了心电心跳的高斯性，并生成了合成数据来增加观测的数量。本文的数据合成基于我们的假设，我们支持该假设，即心电心跳呈现多元正态分布;因此，人们可以从这种分布中产生心电心跳。由于改变心电图形态的内部和外部因素(如噪音、饮食、身体和心理变化以及其他因素)，这种分布偏离了高斯性，但我们试图捕捉心跳的潜在高斯性。将该方法应用于一个生物识别系统，并在多伦多大学1012名受试者的数据库中进行了测试，结果表明，该方法的错误率(EER)为6.71%，而在没有数据合成的情况下，该系统的错误率为9.35%。在小样本量问题中，降维问题被广泛研究;然而，我们的研究结果表明，使用所提出的数据合成技术在EER方面的表现至少优于几种降维技术3.21%。在小样本量的情况下，分类器的不稳定性成为一个更大的问题，我们使用并行分类器方案来减少它。并行分类器中的每个分类器都使用相同的真实数据集和不同的冒名顶替数据集进行训练。并行分类器将预测器的真实接受率不稳定性从6.52%标准差降低到1.94%标准差。

{"title":"On biometric systems: electrocardiogram Gaussianity and data synthesis.","authors":"Wael Louis, Shahad Abdulnour, Sahar Javaher Haghighi, Dimitrios Hatzinakos","doi":"10.1186/s13637-017-0056-2","DOIUrl":"https://doi.org/10.1186/s13637-017-0056-2","url":null,"abstract":"Electrocardiogram is a slow signal to acquire, and it is prone to noise. It can be inconvenient to collect large number of ECG heartbeats in order to train a reliable biometric system; hence, this issue might result in a small sample size phenomenon which occurs when the number of samples is much smaller than the number of observations to model. In this paper, we study ECG heartbeat Gaussianity and we generate synthesized data to increase the number of observations. Data synthesis, in this paper, is based on our hypothesis, which we support, that ECG heartbeats exhibit a multivariate normal distribution; therefore, one can generate ECG heartbeats from such distribution. This distribution is deviated from Gaussianity due to internal and external factors that change ECG morphology such as noise, diet, physical and psychological changes, and other factors, but we attempt to capture the underlying Gaussianity of the heartbeats. When this method was implemented for a biometric system and was examined on the University of Toronto database of 1012 subjects, an equal error rate (EER) of 6.71% was achieved in comparison to 9.35% to the same system but without data synthesis. Dimensionality reduction is widely examined in the problem of small sample size; however, our results suggest that using the proposed data synthesis outperformed several dimensionality reduction techniques by at least 3.21% in EER. With small sample size, classifier instability becomes a bigger issue and we used a parallel classifier scheme to reduce it. Each classifier in the parallel classifier is trained with the same genuine dataset but different imposter datasets. The parallel classifier has reduced predictors' true acceptance rate instability from 6.52% standard deviation to 1.94% standard deviation.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0056-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34972922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning directed acyclic graphs from large-scale genomics data. 从大规模基因组学数据中学习有向无环图。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-09-20 DOI: 10.1186/s13637-017-0063-3

Fabio Nikolay, Marius Pesavento, George Kritikos, Nassos Typas

In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.

在本文中，我们考虑的问题是学习基因相互作用图谱，即从嘈杂的双基因敲除（DK）数据中学习基因相互作用有向无环图（DAG）的拓扑结构。基于一套成熟的生物相互作用模型，我们对基因间的相互作用进行了检测和分类。我们提出了一种名为基因相互作用检测器（GENIE）的新型线性整数优化程序，用于识别基因间复杂的生物依赖关系，并计算出与 DK 测量结果最匹配的 DAG 拓扑。此外，我们还扩展了 GENIE 程序，纳入了基因相互作用图谱（GI-profile）数据，以进一步提高检测性能。此外，我们还提出了一种针对大型研究基因集的顺序扩展技术，以便为真实测量数据提供具有统计意义的结果。最后，我们通过数字模拟表明，GENIE 程序和 GI-profile 数据扩展 GENIE（GI-GENIE）程序明显优于传统技术，并展示了我们提出的顺序可扩展性技术的真实数据结果。

{"title":"Learning directed acyclic graphs from large-scale genomics data.","authors":"Fabio Nikolay, Marius Pesavento, George Kritikos, Nassos Typas","doi":"10.1186/s13637-017-0063-3","DOIUrl":"10.1186/s13637-017-0063-3","url":null,"abstract":"In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Biomedical informatics with optimization and machine learning. 生物医学信息学与优化和机器学习。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2017-02-08 eCollection Date: 2017-12-01 DOI: 10.1186/s13637-017-0058-0

Shuai Huang, Jiayu Zhou, Zhangyang Wang, Qing Ling, Yang Shen

引用次数: 7

Heterogeneous multimodal biomarkers analysis for Alzheimer's disease via Bayesian network. 基于贝叶斯网络的阿尔茨海默病异质性多模态生物标志物分析。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2016-08-19 eCollection Date: 2016-12-01 DOI: 10.1186/s13637-016-0046-9

Yan Jin, Yi Su, Xiao-Hua Zhou, Shuai Huang

By 2050, it is estimated that the number of worldwide Alzheimer's disease (AD) patients will quadruple from the current number of 36 million, while no proven disease-modifying treatments are available. At present, the underlying disease mechanisms remain under investigation, and recent studies suggest that the disease involves multiple etiological pathways. To better understand the disease and develop treatment strategies, a number of ongoing studies including the Alzheimer's Disease Neuroimaging Initiative (ADNI) enroll many study participants and acquire a large number of biomarkers from various modalities including demographic, genotyping, fluid biomarkers, neuroimaging, neuropsychometric test, and clinical assessments. However, a systematic approach that can integrate all the collected data is lacking. The overarching goal of our study is to use machine learning techniques to understand the relationships among different biomarkers and to establish a system-level model that can better describe the interactions among biomarkers and provide superior diagnostic and prognostic information. In this pilot study, we use Bayesian network (BN) to analyze multimodal data from ADNI, including demographics, volumetric MRI, PET, genotypes, and neuropsychometric measurements and demonstrate our approach to have superior prediction accuracy.

据估计，到2050年，全球阿尔茨海默病(AD)患者的数量将从目前的3600万增加到四倍，而目前还没有经过证实的改善疾病的治疗方法。目前，潜在的疾病机制仍在调查中，最近的研究表明，该疾病涉及多种病因途径。为了更好地了解这种疾病并制定治疗策略，包括阿尔茨海默病神经影像学倡议(ADNI)在内的许多正在进行的研究招募了许多研究参与者，并从各种方式获取了大量生物标志物，包括人口统计学、基因分型、液体生物标志物、神经影像学、神经心理测试和临床评估。然而，目前还缺乏一种能够整合所有收集到的数据的系统方法。我们研究的总体目标是利用机器学习技术来理解不同生物标志物之间的关系，并建立一个系统级模型，以更好地描述生物标志物之间的相互作用，并提供卓越的诊断和预后信息。在这项初步研究中，我们使用贝叶斯网络(BN)分析来自ADNI的多模态数据，包括人口统计学、体积MRI、PET、基因型和神经心理测量，并证明我们的方法具有优越的预测准确性。

{"title":"Heterogeneous multimodal biomarkers analysis for Alzheimer's disease via Bayesian network.","authors":"Yan Jin, Yi Su, Xiao-Hua Zhou, Shuai Huang","doi":"10.1186/s13637-016-0046-9","DOIUrl":"https://doi.org/10.1186/s13637-016-0046-9","url":null,"abstract":"By 2050, it is estimated that the number of worldwide Alzheimer's disease (AD) patients will quadruple from the current number of 36 million, while no proven disease-modifying treatments are available. At present, the underlying disease mechanisms remain under investigation, and recent studies suggest that the disease involves multiple etiological pathways. To better understand the disease and develop treatment strategies, a number of ongoing studies including the Alzheimer's Disease Neuroimaging Initiative (ADNI) enroll many study participants and acquire a large number of biomarkers from various modalities including demographic, genotyping, fluid biomarkers, neuroimaging, neuropsychometric test, and clinical assessments. However, a systematic approach that can integrate all the collected data is lacking. The overarching goal of our study is to use machine learning techniques to understand the relationships among different biomarkers and to establish a system-level model that can better describe the interactions among biomarkers and provide superior diagnostic and prognostic information. In this pilot study, we use Bayesian network (BN) to analyze multimodal data from ADNI, including demographics, volumetric MRI, PET, genotypes, and neuropsychometric measurements and demonstrate our approach to have superior prediction accuracy.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2016-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0046-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34720476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A pedagogical walkthrough of computational modeling and simulation of Wnt signaling pathway using static causal models in MATLAB. 在 MATLAB 中使用静态因果模型对 Wnt 信号通路进行计算建模和模拟的教学演练。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2016-08-08 eCollection Date: 2016-12-01 DOI: 10.1186/s13637-016-0044-y

Shriprakash Sinha

Simulation study in systems biology involving computational experiments dealing with Wnt signaling pathways abound in literature but often lack a pedagogical perspective that might ease the understanding of beginner students and researchers in transition, who intend to work on the modeling of the pathway. This paucity might happen due to restrictive business policies which enforce an unwanted embargo on the sharing of important scientific knowledge. A tutorial introduction to computational modeling of Wnt signaling pathway in a human colorectal cancer dataset using static Bayesian network models is provided. The walkthrough might aid biologists/informaticians in understanding the design of computational experiments that is interleaved with exposition of the Matlab code and causal models from Bayesian network toolbox. The manuscript elucidates the coding contents of the advance article by Sinha (Integr. Biol. 6:1034-1048, 2014) and takes the reader in a step-by-step process of how (a) the collection and the transformation of the available biological information from literature is done, (b) the integration of the heterogeneous data and prior biological knowledge in the network is achieved, (c) the simulation study is designed, (d) the hypothesis regarding a biological phenomena is transformed into computational framework, and (e) results and inferences drawn using d-connectivity/separability are reported. The manuscript finally ends with a programming assignment to help the readers get hands-on experience of a perturbation project. Description of Matlab files is made available under GNU GPL v3 license at the Google code project on https://code.google.com/p/static-bn-for-wnt-signaling-pathway and https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway. Latest updates can be found in the latter website.

涉及 Wnt 信号通路计算实验的系统生物学模拟研究在文献中比比皆是，但往往缺乏教学视角，这可能会让打算从事通路建模工作的初学者和转型期研究人员更容易理解。这种匮乏可能是由于限制性商业政策造成的，这些政策对重要科学知识的共享实施了不必要的封锁。本文提供了使用静态贝叶斯网络模型对人类结直肠癌数据集中的 Wnt 信号通路进行计算建模的教程介绍。在介绍 Matlab 代码和贝叶斯网络工具箱中的因果模型的同时，还介绍了计算实验的设计。手稿阐明了辛哈（Sinha）预先发表的文章（Integr. Biol.6:1034-1048，2014）的先期文章的编码内容，并带领读者逐步了解如何（a）从文献中收集和转换可用的生物信息，（b）在网络中实现异构数据和先验生物知识的整合，（c）设计模拟研究，（d）将有关生物现象的假设转化为计算框架，以及（e）报告使用 d-connectivity/separability 得出的结果和推论。手稿最后附有编程作业，帮助读者获得扰动项目的实践经验。在 GNU GPL v3 许可下，Matlab 文件的说明可从 https://code.google.com/p/static-bn-for-wnt-signaling-pathway 和 https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway上的谷歌代码项目获取。最新更新可在后一个网站上找到。

{"title":"A pedagogical walkthrough of computational modeling and simulation of Wnt signaling pathway using static causal models in MATLAB.","authors":"Shriprakash Sinha","doi":"10.1186/s13637-016-0044-y","DOIUrl":"10.1186/s13637-016-0044-y","url":null,"abstract":"Simulation study in systems biology involving computational experiments dealing with Wnt signaling pathways abound in literature but often lack a pedagogical perspective that might ease the understanding of beginner students and researchers in transition, who intend to work on the modeling of the pathway. This paucity might happen due to restrictive business policies which enforce an unwanted embargo on the sharing of important scientific knowledge. A tutorial introduction to computational modeling of Wnt signaling pathway in a human colorectal cancer dataset using static Bayesian network models is provided. The walkthrough might aid biologists/informaticians in understanding the design of computational experiments that is interleaved with exposition of the Matlab code and causal models from Bayesian network toolbox. The manuscript elucidates the coding contents of the advance article by Sinha (Integr. Biol. 6:1034-1048, 2014) and takes the reader in a step-by-step process of how (a) the collection and the transformation of the available biological information from literature is done, (b) the integration of the heterogeneous data and prior biological knowledge in the network is achieved, (c) the simulation study is designed, (d) the hypothesis regarding a biological phenomena is transformed into computational framework, and (e) results and inferences drawn using d-connectivity/separability are reported. The manuscript finally ends with a programming assignment to help the readers get hands-on experience of a perturbation project. Description of Matlab files is made available under GNU GPL v3 license at the Google code project on https://code.google.com/p/static-bn-for-wnt-signaling-pathway and https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway. Latest updates can be found in the latter website.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34324964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new method to detect event-related potentials based on Pearson's correlation. 基于Pearson相关的事件相关电位检测新方法。

EURASIP journal on bioinformatics & systems biology

Pub Date : 2016-06-07 eCollection Date: 2016-12-01 DOI: 10.1186/s13637-016-0043-z

William Giroldini, Luciano Pederzoli, Marco Bilucaglia, Simone Melloni, Patrizio Tressoldi

Event-related potentials (ERPs) are widely used in brain-computer interface applications and in neuroscience. Normal EEG activity is rich in background noise, and therefore, in order to detect ERPs, it is usually necessary to take the average from multiple trials to reduce the effects of this noise. The noise produced by EEG activity itself is not correlated with the ERP waveform and so, by calculating the average, the noise is decreased by a factor inversely proportional to the square root of N, where N is the number of averaged epochs. This is the easiest strategy currently used to detect ERPs, which is based on calculating the average of all ERP's waveform, these waveforms being time- and phase-locked. In this paper, a new method called GW6 is proposed, which calculates the ERP using a mathematical method based only on Pearson's correlation. The result is a graph with the same time resolution as the classical ERP and which shows only positive peaks representing the increase-in consonance with the stimuli-in EEG signal correlation over all channels. This new method is also useful for selectively identifying and highlighting some hidden components of the ERP response that are not phase-locked, and that are usually hidden in the standard and simple method based on the averaging of all the epochs. These hidden components seem to be caused by variations (between each successive stimulus) of the ERP's inherent phase latency period (jitter), although the same stimulus across all EEG channels produces a reasonably constant phase. For this reason, this new method could be very helpful to investigate these hidden components of the ERP response and to develop applications for scientific and medical purposes. Moreover, this new method is more resistant to EEG artifacts than the standard calculations of the average and could be very useful in research and neurology. The method we are proposing can be directly used in the form of a process written in the well-known Matlab programming language and can be easily and quickly written in any other software language.

事件相关电位在脑机接口和神经科学中有着广泛的应用。正常的脑电图活动具有丰富的背景噪声，因此，为了检测erp，通常需要从多次试验中取平均值，以减少该噪声的影响。脑电图活动本身产生的噪声与ERP波形不相关，因此，通过计算平均值，噪声降低了一个与N的平方根成反比的因子，其中N是平均epoch的数量。这是目前用于检测ERP的最简单的策略，它基于计算所有ERP波形的平均值，这些波形是时间和锁相的。本文提出了一种新的计算ERP的方法GW6，该方法仅使用基于Pearson相关的数学方法来计算ERP。结果是一个与经典ERP具有相同时间分辨率的图，并且在所有通道上仅显示正峰，表示与刺激一致的脑电图信号相关性的增加。该方法还可以选择性地识别和突出一些ERP响应中不锁相的隐藏成分，这些成分通常隐藏在基于所有时代平均的标准和简单方法中。这些隐藏的成分似乎是由ERP固有相位潜伏期(抖动)的变化(在每个连续刺激之间)引起的，尽管相同的刺激在所有EEG通道中产生一个相当恒定的相位。因此，这种新方法可能非常有助于研究ERP反应的这些隐藏成分，并开发用于科学和医学目的的应用。此外，这种新方法比标准的平均计算更能抵抗脑电图伪影，在研究和神经学方面非常有用。我们提出的方法可以直接以众所周知的Matlab编程语言编写的过程的形式使用，并且可以轻松快速地用任何其他软件语言编写。

{"title":"A new method to detect event-related potentials based on Pearson's correlation.","authors":"William Giroldini, Luciano Pederzoli, Marco Bilucaglia, Simone Melloni, Patrizio Tressoldi","doi":"10.1186/s13637-016-0043-z","DOIUrl":"https://doi.org/10.1186/s13637-016-0043-z","url":null,"abstract":"Event-related potentials (ERPs) are widely used in brain-computer interface applications and in neuroscience. Normal EEG activity is rich in background noise, and therefore, in order to detect ERPs, it is usually necessary to take the average from multiple trials to reduce the effects of this noise. The noise produced by EEG activity itself is not correlated with the ERP waveform and so, by calculating the average, the noise is decreased by a factor inversely proportional to the square root of N, where N is the number of averaged epochs. This is the easiest strategy currently used to detect ERPs, which is based on calculating the average of all ERP's waveform, these waveforms being time- and phase-locked. In this paper, a new method called GW6 is proposed, which calculates the ERP using a mathematical method based only on Pearson's correlation. The result is a graph with the same time resolution as the classical ERP and which shows only positive peaks representing the increase-in consonance with the stimuli-in EEG signal correlation over all channels. This new method is also useful for selectively identifying and highlighting some hidden components of the ERP response that are not phase-locked, and that are usually hidden in the standard and simple method based on the averaging of all the epochs. These hidden components seem to be caused by variations (between each successive stimulus) of the ERP's inherent phase latency period (jitter), although the same stimulus across all EEG channels produces a reasonably constant phase. For this reason, this new method could be very helpful to investigate these hidden components of the ERP response and to develop applications for scientific and medical purposes. Moreover, this new method is more resistant to EEG artifacts than the standard calculations of the average and could be very useful in research and neurology. The method we are proposing can be directly used in the form of a process written in the well-known Matlab programming language and can be easily and quickly written in any other software language.","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0043-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34602742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13