Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.
标记生物医学实体(如基因、蛋白质、细胞、细胞系)是生物医学文献挖掘的第一步,也是重要的先决条件。在本文中,我们描述了我们的混合命名实体标记方法,即BCC-NER(双向,上下文线索命名实体标记器用于基因/蛋白质提及识别)。BCC-NER部署了三个模块。第一个模块用于文本处理,包括基本的自然语言处理预处理、特征提取和特征选择。第二个模块是使用双向条件随机场(CRF)进行训练和模型构建,在两个方向(向前和向后)上解析文本,并使用边缘注入放松算法(MIRA)整合向后和向前训练的模型。第三个也是最后一个模块用于后处理,以获得更好的性能,其中包括围绕文本特征,括号不匹配和两层缩写算法。BCC-NER在BioCreative II GM测试语料库上的评价结果,准确率为89.95,召回率为84.15,总体f分为86.95,高于目前其他开源标注器。
{"title":"BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.","authors":"Gurusamy Murugesan, Sabenabanu Abdulkadhar, Balu Bhasuran, Jeyakumar Natarajan","doi":"10.1186/s13637-017-0060-6","DOIUrl":"https://doi.org/10.1186/s13637-017-0060-6","url":null,"abstract":"<p><p>Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0060-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34972923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01Epub Date: 2017-07-14DOI: 10.1186/s13637-017-0062-4
Noura Dridi, Audrey Giremus, Jean-Francois Giovannelli, Caroline Truntzer, Melita Hadzagic, Jean-Philippe Charrier, Laurent Gerfault, Patrick Ducoroy, Bruno Lacroix, Pierre Grangeat, Pascal Roy
This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.
{"title":"Bayesian inference for biomarker discovery in proteomics: an analytic solution.","authors":"Noura Dridi, Audrey Giremus, Jean-Francois Giovannelli, Caroline Truntzer, Melita Hadzagic, Jean-Philippe Charrier, Laurent Gerfault, Patrick Ducoroy, Bruno Lacroix, Pierre Grangeat, Pascal Roy","doi":"10.1186/s13637-017-0062-4","DOIUrl":"10.1186/s13637-017-0062-4","url":null,"abstract":"<p><p>This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0062-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35173527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01Epub Date: 2017-06-30DOI: 10.1186/s13637-017-0061-5
Xiangfang Li, Oluwaseyi Omotere, Lijun Qian, Edward R Dougherty
Stochastic hybrid systems (SHS) have attracted a lot of research interests in recent years. In this paper, we review some of the recent applications of SHS to biological systems modeling and analysis. Due to the nature of molecular interactions, many biological processes can be conveniently described as a mixture of continuous and discrete phenomena employing SHS models. With the advancement of SHS theory, it is expected that insights can be obtained about biological processes such as drug effects on gene regulation. Furthermore, combining with advanced experimental methods, in silico simulations using SHS modeling techniques can be carried out for massive and rapid verification or falsification of biological hypotheses. The hope is to substitute costly and time-consuming in vitro or in vivo experiments or provide guidance for those experiments and generate better hypotheses.
{"title":"Review of stochastic hybrid systems with applications in biological systems modeling and analysis.","authors":"Xiangfang Li, Oluwaseyi Omotere, Lijun Qian, Edward R Dougherty","doi":"10.1186/s13637-017-0061-5","DOIUrl":"https://doi.org/10.1186/s13637-017-0061-5","url":null,"abstract":"<p><p>Stochastic hybrid systems (SHS) have attracted a lot of research interests in recent years. In this paper, we review some of the recent applications of SHS to biological systems modeling and analysis. Due to the nature of molecular interactions, many biological processes can be conveniently described as a mixture of continuous and discrete phenomena employing SHS models. With the advancement of SHS theory, it is expected that insights can be obtained about biological processes such as drug effects on gene regulation. Furthermore, combining with advanced experimental methods, in silico simulations using SHS modeling techniques can be carried out for massive and rapid verification or falsification of biological hypotheses. The hope is to substitute costly and time-consuming in vitro or in vivo experiments or provide guidance for those experiments and generate better hypotheses.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0061-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35134273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.
{"title":"From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data.","authors":"Danila Vella, Italo Zoppis, Giancarlo Mauri, Pierluigi Mauri, Dario Di Silvestre","doi":"10.1186/s13637-017-0059-z","DOIUrl":"https://doi.org/10.1186/s13637-017-0059-z","url":null,"abstract":"<p><p>The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0059-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34971828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electrocardiogram is a slow signal to acquire, and it is prone to noise. It can be inconvenient to collect large number of ECG heartbeats in order to train a reliable biometric system; hence, this issue might result in a small sample size phenomenon which occurs when the number of samples is much smaller than the number of observations to model. In this paper, we study ECG heartbeat Gaussianity and we generate synthesized data to increase the number of observations. Data synthesis, in this paper, is based on our hypothesis, which we support, that ECG heartbeats exhibit a multivariate normal distribution; therefore, one can generate ECG heartbeats from such distribution. This distribution is deviated from Gaussianity due to internal and external factors that change ECG morphology such as noise, diet, physical and psychological changes, and other factors, but we attempt to capture the underlying Gaussianity of the heartbeats. When this method was implemented for a biometric system and was examined on the University of Toronto database of 1012 subjects, an equal error rate (EER) of 6.71% was achieved in comparison to 9.35% to the same system but without data synthesis. Dimensionality reduction is widely examined in the problem of small sample size; however, our results suggest that using the proposed data synthesis outperformed several dimensionality reduction techniques by at least 3.21% in EER. With small sample size, classifier instability becomes a bigger issue and we used a parallel classifier scheme to reduce it. Each classifier in the parallel classifier is trained with the same genuine dataset but different imposter datasets. The parallel classifier has reduced predictors' true acceptance rate instability from 6.52% standard deviation to 1.94% standard deviation.
{"title":"On biometric systems: electrocardiogram Gaussianity and data synthesis.","authors":"Wael Louis, Shahad Abdulnour, Sahar Javaher Haghighi, Dimitrios Hatzinakos","doi":"10.1186/s13637-017-0056-2","DOIUrl":"https://doi.org/10.1186/s13637-017-0056-2","url":null,"abstract":"<p><p>Electrocardiogram is a slow signal to acquire, and it is prone to noise. It can be inconvenient to collect large number of ECG heartbeats in order to train a reliable biometric system; hence, this issue might result in a small sample size phenomenon which occurs when the number of samples is much smaller than the number of observations to model. In this paper, we study ECG heartbeat Gaussianity and we generate synthesized data to increase the number of observations. Data synthesis, in this paper, is based on our hypothesis, which we support, that ECG heartbeats exhibit a multivariate normal distribution; therefore, one can generate ECG heartbeats from such distribution. This distribution is deviated from Gaussianity due to internal and external factors that change ECG morphology such as noise, diet, physical and psychological changes, and other factors, but we attempt to capture the underlying Gaussianity of the heartbeats. When this method was implemented for a biometric system and was examined on the University of Toronto database of 1012 subjects, an equal error rate (EER) of 6.71% was achieved in comparison to 9.35% to the same system but without data synthesis. Dimensionality reduction is widely examined in the problem of small sample size; however, our results suggest that using the proposed data synthesis outperformed several dimensionality reduction techniques by at least 3.21% in EER. With small sample size, classifier instability becomes a bigger issue and we used a parallel classifier scheme to reduce it. Each classifier in the parallel classifier is trained with the same genuine dataset but different imposter datasets. The parallel classifier has reduced predictors' true acceptance rate instability from 6.52% standard deviation to 1.94% standard deviation.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0056-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34972922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-20DOI: 10.1186/s13637-017-0063-3
Fabio Nikolay, Marius Pesavento, George Kritikos, Nassos Typas
In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.
{"title":"Learning directed acyclic graphs from large-scale genomics data.","authors":"Fabio Nikolay, Marius Pesavento, George Kritikos, Nassos Typas","doi":"10.1186/s13637-017-0063-3","DOIUrl":"10.1186/s13637-017-0063-3","url":null,"abstract":"<p><p>In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-02-08eCollection Date: 2017-12-01DOI: 10.1186/s13637-017-0058-0
Shuai Huang, Jiayu Zhou, Zhangyang Wang, Qing Ling, Yang Shen
{"title":"Biomedical informatics with optimization and machine learning.","authors":"Shuai Huang, Jiayu Zhou, Zhangyang Wang, Qing Ling, Yang Shen","doi":"10.1186/s13637-017-0058-0","DOIUrl":"https://doi.org/10.1186/s13637-017-0058-0","url":null,"abstract":"","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2017-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-017-0058-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34772817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-08-19eCollection Date: 2016-12-01DOI: 10.1186/s13637-016-0046-9
Yan Jin, Yi Su, Xiao-Hua Zhou, Shuai Huang
By 2050, it is estimated that the number of worldwide Alzheimer's disease (AD) patients will quadruple from the current number of 36 million, while no proven disease-modifying treatments are available. At present, the underlying disease mechanisms remain under investigation, and recent studies suggest that the disease involves multiple etiological pathways. To better understand the disease and develop treatment strategies, a number of ongoing studies including the Alzheimer's Disease Neuroimaging Initiative (ADNI) enroll many study participants and acquire a large number of biomarkers from various modalities including demographic, genotyping, fluid biomarkers, neuroimaging, neuropsychometric test, and clinical assessments. However, a systematic approach that can integrate all the collected data is lacking. The overarching goal of our study is to use machine learning techniques to understand the relationships among different biomarkers and to establish a system-level model that can better describe the interactions among biomarkers and provide superior diagnostic and prognostic information. In this pilot study, we use Bayesian network (BN) to analyze multimodal data from ADNI, including demographics, volumetric MRI, PET, genotypes, and neuropsychometric measurements and demonstrate our approach to have superior prediction accuracy.
{"title":"Heterogeneous multimodal biomarkers analysis for Alzheimer's disease via Bayesian network.","authors":"Yan Jin, Yi Su, Xiao-Hua Zhou, Shuai Huang","doi":"10.1186/s13637-016-0046-9","DOIUrl":"https://doi.org/10.1186/s13637-016-0046-9","url":null,"abstract":"<p><p>By 2050, it is estimated that the number of worldwide Alzheimer's disease (AD) patients will quadruple from the current number of 36 million, while no proven disease-modifying treatments are available. At present, the underlying disease mechanisms remain under investigation, and recent studies suggest that the disease involves multiple etiological pathways. To better understand the disease and develop treatment strategies, a number of ongoing studies including the Alzheimer's Disease Neuroimaging Initiative (ADNI) enroll many study participants and acquire a large number of biomarkers from various modalities including demographic, genotyping, fluid biomarkers, neuroimaging, neuropsychometric test, and clinical assessments. However, a systematic approach that can integrate all the collected data is lacking. The overarching goal of our study is to use machine learning techniques to understand the relationships among different biomarkers and to establish a system-level model that can better describe the interactions among biomarkers and provide superior diagnostic and prognostic information. In this pilot study, we use Bayesian network (BN) to analyze multimodal data from ADNI, including demographics, volumetric MRI, PET, genotypes, and neuropsychometric measurements and demonstrate our approach to have superior prediction accuracy.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2016-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0046-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34720476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-08-08eCollection Date: 2016-12-01DOI: 10.1186/s13637-016-0044-y
Shriprakash Sinha
Simulation study in systems biology involving computational experiments dealing with Wnt signaling pathways abound in literature but often lack a pedagogical perspective that might ease the understanding of beginner students and researchers in transition, who intend to work on the modeling of the pathway. This paucity might happen due to restrictive business policies which enforce an unwanted embargo on the sharing of important scientific knowledge. A tutorial introduction to computational modeling of Wnt signaling pathway in a human colorectal cancer dataset using static Bayesian network models is provided. The walkthrough might aid biologists/informaticians in understanding the design of computational experiments that is interleaved with exposition of the Matlab code and causal models from Bayesian network toolbox. The manuscript elucidates the coding contents of the advance article by Sinha (Integr. Biol. 6:1034-1048, 2014) and takes the reader in a step-by-step process of how (a) the collection and the transformation of the available biological information from literature is done, (b) the integration of the heterogeneous data and prior biological knowledge in the network is achieved, (c) the simulation study is designed, (d) the hypothesis regarding a biological phenomena is transformed into computational framework, and (e) results and inferences drawn using d-connectivity/separability are reported. The manuscript finally ends with a programming assignment to help the readers get hands-on experience of a perturbation project. Description of Matlab files is made available under GNU GPL v3 license at the Google code project on https://code.google.com/p/static-bn-for-wnt-signaling-pathway and https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway. Latest updates can be found in the latter website.
{"title":"A pedagogical walkthrough of computational modeling and simulation of Wnt signaling pathway using static causal models in MATLAB.","authors":"Shriprakash Sinha","doi":"10.1186/s13637-016-0044-y","DOIUrl":"10.1186/s13637-016-0044-y","url":null,"abstract":"<p><p>Simulation study in systems biology involving computational experiments dealing with Wnt signaling pathways abound in literature but often lack a pedagogical perspective that might ease the understanding of beginner students and researchers in transition, who intend to work on the modeling of the pathway. This paucity might happen due to restrictive business policies which enforce an unwanted embargo on the sharing of important scientific knowledge. A tutorial introduction to computational modeling of Wnt signaling pathway in a human colorectal cancer dataset using static Bayesian network models is provided. The walkthrough might aid biologists/informaticians in understanding the design of computational experiments that is interleaved with exposition of the Matlab code and causal models from Bayesian network toolbox. The manuscript elucidates the coding contents of the advance article by Sinha (Integr. Biol. 6:1034-1048, 2014) and takes the reader in a step-by-step process of how (a) the collection and the transformation of the available biological information from literature is done, (b) the integration of the heterogeneous data and prior biological knowledge in the network is achieved, (c) the simulation study is designed, (d) the hypothesis regarding a biological phenomena is transformed into computational framework, and (e) results and inferences drawn using <i>d</i>-connectivity/separability are reported. The manuscript finally ends with a programming assignment to help the readers get hands-on experience of a perturbation project. Description of Matlab files is made available under GNU GPL v3 license at the Google code project on https://code.google.com/p/static-bn-for-wnt-signaling-pathway and https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway. Latest updates can be found in the latter website.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2017 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34324964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-07eCollection Date: 2016-12-01DOI: 10.1186/s13637-016-0043-z
William Giroldini, Luciano Pederzoli, Marco Bilucaglia, Simone Melloni, Patrizio Tressoldi
Event-related potentials (ERPs) are widely used in brain-computer interface applications and in neuroscience. Normal EEG activity is rich in background noise, and therefore, in order to detect ERPs, it is usually necessary to take the average from multiple trials to reduce the effects of this noise. The noise produced by EEG activity itself is not correlated with the ERP waveform and so, by calculating the average, the noise is decreased by a factor inversely proportional to the square root of N, where N is the number of averaged epochs. This is the easiest strategy currently used to detect ERPs, which is based on calculating the average of all ERP's waveform, these waveforms being time- and phase-locked. In this paper, a new method called GW6 is proposed, which calculates the ERP using a mathematical method based only on Pearson's correlation. The result is a graph with the same time resolution as the classical ERP and which shows only positive peaks representing the increase-in consonance with the stimuli-in EEG signal correlation over all channels. This new method is also useful for selectively identifying and highlighting some hidden components of the ERP response that are not phase-locked, and that are usually hidden in the standard and simple method based on the averaging of all the epochs. These hidden components seem to be caused by variations (between each successive stimulus) of the ERP's inherent phase latency period (jitter), although the same stimulus across all EEG channels produces a reasonably constant phase. For this reason, this new method could be very helpful to investigate these hidden components of the ERP response and to develop applications for scientific and medical purposes. Moreover, this new method is more resistant to EEG artifacts than the standard calculations of the average and could be very useful in research and neurology. The method we are proposing can be directly used in the form of a process written in the well-known Matlab programming language and can be easily and quickly written in any other software language.
{"title":"A new method to detect event-related potentials based on Pearson's correlation.","authors":"William Giroldini, Luciano Pederzoli, Marco Bilucaglia, Simone Melloni, Patrizio Tressoldi","doi":"10.1186/s13637-016-0043-z","DOIUrl":"https://doi.org/10.1186/s13637-016-0043-z","url":null,"abstract":"<p><p>Event-related potentials (ERPs) are widely used in brain-computer interface applications and in neuroscience. Normal EEG activity is rich in background noise, and therefore, in order to detect ERPs, it is usually necessary to take the average from multiple trials to reduce the effects of this noise. The noise produced by EEG activity itself is not correlated with the ERP waveform and so, by calculating the average, the noise is decreased by a factor inversely proportional to the square root of <i>N</i>, where <i>N</i> is the number of averaged epochs. This is the easiest strategy currently used to detect ERPs, which is based on calculating the average of all ERP's waveform, these waveforms being time- and phase-locked. In this paper, a new method called GW6 is proposed, which calculates the ERP using a mathematical method based only on Pearson's correlation. The result is a graph with the same time resolution as the classical ERP and which shows only positive peaks representing the increase-in consonance with the stimuli-in EEG signal correlation over all channels. This new method is also useful for selectively identifying and highlighting some hidden components of the ERP response that are not phase-locked, and that are usually hidden in the standard and simple method based on the averaging of all the epochs. These hidden components seem to be caused by variations (between each successive stimulus) of the ERP's inherent phase latency period (jitter), although the same stimulus across all EEG channels produces a reasonably constant phase. For this reason, this new method could be very helpful to investigate these hidden components of the ERP response and to develop applications for scientific and medical purposes. Moreover, this new method is more resistant to EEG artifacts than the standard calculations of the average and could be very useful in research and neurology. The method we are proposing can be directly used in the form of a process written in the well-known Matlab programming language and can be easily and quickly written in any other software language.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2016 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13637-016-0043-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34602742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}