Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822595
Shaoliang Peng, Xiaoyu Zhang, Yutong Lu, Xiangke Liao, Kai Lu, Canqun Yang, Jie Liu, Weiliang Zhu, Dongqing Wei
Molecular dynamics (MD) is a computer simulation method of studying physical movements of atoms and molecules that provide detailed microscopic sampling on molecular scale. With the continuous efforts and improvements, MD simulation gained popularity in materials science, biochemistry and biophysics with various application areas and expanding data scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most widely used software packages for conducting MD simulations. However, the speed of AMBER MD simulations for system with millions of atoms in microsecond scale still need to be improved. In this paper, we propose a parallel acceleration strategy for AMBER on Tianhe-2 supercomputer. The parallel optimization of AMBER is carried out on three different levels: fine grained OpenMP parallel on a single MIC, single-node CPU/MIC collaborated parallel optimization and multi-node multi-MIC collaborated parallel acceleration. By the three levels of parallel acceleration strategy above, we achieved the highest speedup of 25–33 times compared with the original program. Source Code: https://github.com/tianhe2/mAMBER
{"title":"mAMBER: A CPU/MIC collaborated parallel framework for AMBER on Tianhe-2 supercomputer","authors":"Shaoliang Peng, Xiaoyu Zhang, Yutong Lu, Xiangke Liao, Kai Lu, Canqun Yang, Jie Liu, Weiliang Zhu, Dongqing Wei","doi":"10.1109/BIBM.2016.7822595","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822595","url":null,"abstract":"Molecular dynamics (MD) is a computer simulation method of studying physical movements of atoms and molecules that provide detailed microscopic sampling on molecular scale. With the continuous efforts and improvements, MD simulation gained popularity in materials science, biochemistry and biophysics with various application areas and expanding data scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most widely used software packages for conducting MD simulations. However, the speed of AMBER MD simulations for system with millions of atoms in microsecond scale still need to be improved. In this paper, we propose a parallel acceleration strategy for AMBER on Tianhe-2 supercomputer. The parallel optimization of AMBER is carried out on three different levels: fine grained OpenMP parallel on a single MIC, single-node CPU/MIC collaborated parallel optimization and multi-node multi-MIC collaborated parallel acceleration. By the three levels of parallel acceleration strategy above, we achieved the highest speedup of 25–33 times compared with the original program. Source Code: https://github.com/tianhe2/mAMBER","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134490291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822558
M. Zhou, Daisy Li, X. Huan, Joseph Manthey, E. Lioutikova, Hong Zhou
The true power of genome editing mechanism known as RNA-programmable CRISPR Cas9 endonuclease system, lies in the fact that Cas9 can be guided to any loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, and therefore allows the introduction of wanted mutations. Unfortunately, sgRNA is prone to off-target homologous attachment, thus guiding Cas9 to cleave DNA sequences at unwanted sites. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as the example, this article analyzed the probabilities of off-target sites of sgRNAs and discovered that for large-size genomes such as human genome, off-target sites are nearly inevitable for sgRNA selection. Based on the mathematical analysis, it seems that the double nicking approach is currently the only feasible solution to promise genome editing specificity. An effective computational algorithm for off-target homology searching is also implemented to confirm the mathematical analysis.
被称为RNA可编程CRISPR Cas9内切酶系统的基因组编辑机制的真正力量在于,Cas9可以被引导到与20 nt RNA(单导RNA (sgRNA))互补的任何位点上,以切割双链DNA,从而允许引入所需的突变。不幸的是,sgRNA容易脱靶同源附着,从而引导Cas9在不需要的位点切割DNA序列。本文以人类基因组和化脓性链球菌Cas9 (SpCas9)为例,分析了sgRNA脱靶位点的概率,发现对于人类基因组这样的大尺度基因组,sgRNA选择的脱靶位点几乎是不可避免的。基于数学分析,双切口方法似乎是目前唯一可行的解决方案,以保证基因组编辑的特异性。实现了一种有效的脱靶同源搜索计算算法来验证数学分析。
{"title":"Mathematical and computational analysis of CRISPR Cas9 sgRNA off-target homologies","authors":"M. Zhou, Daisy Li, X. Huan, Joseph Manthey, E. Lioutikova, Hong Zhou","doi":"10.1109/BIBM.2016.7822558","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822558","url":null,"abstract":"The true power of genome editing mechanism known as RNA-programmable CRISPR Cas9 endonuclease system, lies in the fact that Cas9 can be guided to any loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, and therefore allows the introduction of wanted mutations. Unfortunately, sgRNA is prone to off-target homologous attachment, thus guiding Cas9 to cleave DNA sequences at unwanted sites. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as the example, this article analyzed the probabilities of off-target sites of sgRNAs and discovered that for large-size genomes such as human genome, off-target sites are nearly inevitable for sgRNA selection. Based on the mathematical analysis, it seems that the double nicking approach is currently the only feasible solution to promise genome editing specificity. An effective computational algorithm for off-target homology searching is also implemented to confirm the mathematical analysis.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822572
Shuohong Wang, Xiang Liu, Jingwen Zhao, Ye Liu, Y. Chen
3D motion data of fish school is more valuable than 2D data for behavior and other researches. This paper proposes to use a master view tracking first strategy based on a novel master-slave camera setup. On this basis, fish are firstly tracked in master view in 2D after being extracted via an eye-focused Gaussian Mixture Model (E-GMM) detector. Then 3D trajectories are reconstructed by associating 2D tracking results in master view and detection results in slave views after fish in slave views are localized using an eye-focused Gabor (E-Gabor) detector. Experiments on data sets with different fish densities demonstrate that the proposed method outperforms two state-of-the-art methods in terms of 5 evaluation metrics.
{"title":"3D tracking swimming fish school using a master view tracking first strategy","authors":"Shuohong Wang, Xiang Liu, Jingwen Zhao, Ye Liu, Y. Chen","doi":"10.1109/BIBM.2016.7822572","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822572","url":null,"abstract":"3D motion data of fish school is more valuable than 2D data for behavior and other researches. This paper proposes to use a master view tracking first strategy based on a novel master-slave camera setup. On this basis, fish are firstly tracked in master view in 2D after being extracted via an eye-focused Gaussian Mixture Model (E-GMM) detector. Then 3D trajectories are reconstructed by associating 2D tracking results in master view and detection results in slave views after fish in slave views are localized using an eye-focused Gabor (E-Gabor) detector. Experiments on data sets with different fish densities demonstrate that the proposed method outperforms two state-of-the-art methods in terms of 5 evaluation metrics.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133171954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822644
Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz
Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.
{"title":"An evaluation of data replication for bioinformatics workflows on NoSQL systems","authors":"Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz","doi":"10.1109/BIBM.2016.7822644","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822644","url":null,"abstract":"Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133781263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822540
Kouki Yonezawa, Tsukasa Mori, S. Shigeno, A. Ogura
Alternative splicing is a mechanism to produce gene expression diversity under the constraint of a limited number of genes, causing spatiotemporal gene expression in many tissues and developmental processes in organisms. This mechanism is well studied in model organisms but not in non-model organisms because the current standard method requires genomic sequences as well as fully annotated information of exons and introns, that are not accessible from non-model organisms. However, it is essential to uncover the landscape of alternative splicing of organisms to understand its evolutionary impacts and roles. We developed a method for condition-specific alternative splicing estimation based on de novo transcriptome assembly, and it would help to enlarge knowledge of alternative splicing functionalized in non-model organisms. The software is deposited to https://github.com/koukiyonezawa/DASE.
{"title":"DASE: Condition-specific differential alternative splicing variants estimation method without reference genome sequence, and its application to non-model organisms","authors":"Kouki Yonezawa, Tsukasa Mori, S. Shigeno, A. Ogura","doi":"10.1109/BIBM.2016.7822540","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822540","url":null,"abstract":"Alternative splicing is a mechanism to produce gene expression diversity under the constraint of a limited number of genes, causing spatiotemporal gene expression in many tissues and developmental processes in organisms. This mechanism is well studied in model organisms but not in non-model organisms because the current standard method requires genomic sequences as well as fully annotated information of exons and introns, that are not accessible from non-model organisms. However, it is essential to uncover the landscape of alternative splicing of organisms to understand its evolutionary impacts and roles. We developed a method for condition-specific alternative splicing estimation based on de novo transcriptome assembly, and it would help to enlarge knowledge of alternative splicing functionalized in non-model organisms. The software is deposited to https://github.com/koukiyonezawa/DASE.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115044240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822715
Yangyang Hu, Wenqiang Zhang, Hong Lu, Fufeng Li, Weifei Zhang
Line detection plays a vital role in visual analysis tasks like Traditional Chinese Medicine (TCM) image analytics. However, most of the current methods ignore line thickness and perform poorly for the lines with different widths. This paper proposes a novel line detection method by using the water flow method. Unlike most edge-based and region-based line detectors, the water flow method is applied to obtaining the whole line response map by simply imitating the movement of water in the image smoothed by guided filter, which is viewed as a geomorphological map. In addition, this paper also proposes an adaptive parameter selection method so that the line detection can be more robust and accurate. Experimental results demonstrate the effectiveness of the proposed method on tongue crack images in comparison to the existing line extraction methods.
{"title":"Wide line detection with water flow","authors":"Yangyang Hu, Wenqiang Zhang, Hong Lu, Fufeng Li, Weifei Zhang","doi":"10.1109/BIBM.2016.7822715","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822715","url":null,"abstract":"Line detection plays a vital role in visual analysis tasks like Traditional Chinese Medicine (TCM) image analytics. However, most of the current methods ignore line thickness and perform poorly for the lines with different widths. This paper proposes a novel line detection method by using the water flow method. Unlike most edge-based and region-based line detectors, the water flow method is applied to obtaining the whole line response map by simply imitating the movement of water in the image smoothed by guided filter, which is viewed as a geomorphological map. In addition, this paper also proposes an adaptive parameter selection method so that the line detection can be more robust and accurate. Experimental results demonstrate the effectiveness of the proposed method on tongue crack images in comparison to the existing line extraction methods.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115087121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822648
Borzou Alipourfard, Jean X. Gao, Olajide Babawale, Hanli Liu
Causality analysis of simultaneous measurements of the brain's electrical activity and its hemodynamic activity provides the opportunity to study the neural underpinning of hemodynamic fluctuations. This multimodal analysis can also be used to extract valuable information regarding the location of the generators of various electrical events such as Alpha rhythms or epileptiform activity. To best of our knowledge, we are the first propose a method to assess causality from EEG to the hemodynamic activity measured using functional near-infrared spectroscopy (fNIRs). The main challenge in studying causality within this setting arises from the low sampling rate of the fNIRs and the mixed frequency nature of the data. Our method of analysis consists of two parts. Through a simple modification of Geweke's formulation of contamination, we first show that the low sampling frequency of the fNIRs does not cause contamination in estimating causality from EEG to fNIRs. We then apply a novel causality test to avoid the down-sampling of the EEG when measuring for causality. The method of analysis proposed here can be generalized to study causality in other biomedical signal analysis applications and mixed frequency settings.
{"title":"Is EEG causal to fNIRs?","authors":"Borzou Alipourfard, Jean X. Gao, Olajide Babawale, Hanli Liu","doi":"10.1109/BIBM.2016.7822648","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822648","url":null,"abstract":"Causality analysis of simultaneous measurements of the brain's electrical activity and its hemodynamic activity provides the opportunity to study the neural underpinning of hemodynamic fluctuations. This multimodal analysis can also be used to extract valuable information regarding the location of the generators of various electrical events such as Alpha rhythms or epileptiform activity. To best of our knowledge, we are the first propose a method to assess causality from EEG to the hemodynamic activity measured using functional near-infrared spectroscopy (fNIRs). The main challenge in studying causality within this setting arises from the low sampling rate of the fNIRs and the mixed frequency nature of the data. Our method of analysis consists of two parts. Through a simple modification of Geweke's formulation of contamination, we first show that the low sampling frequency of the fNIRs does not cause contamination in estimating causality from EEG to fNIRs. We then apply a novel causality test to avoid the down-sampling of the EEG when measuring for causality. The method of analysis proposed here can be generalized to study causality in other biomedical signal analysis applications and mixed frequency settings.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114493302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822795
Jin-Xing Liu, Xiangzhen Kong, C. Zheng, J. Shang, Wei Zhang
Recently, feature extraction and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as genome data. In this paper, a new feature extraction method based on sparse singular value decomposition (SSVD) is developed. SSVD algorithm is applied to extract differentially expressed genes from two different genome datasets that are all from The Cancer Genome Atlas (TCGA), and then the extracted genes are evaluated by the tools based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis, the p-norm robust feature extraction and sparse principal component analysis. The experimental GO analysis results show that SSVD method outperforms the competitive algorithms. The KEGG analysis results demonstrate the genes which participate in the pathways in cancer. The elaborate experiments prove that SSVD is an effective feature selection method compared with the competitive methods. The KEGG analysis results may provide a meaningful reference to carry out further study for professionals in the field of biomedical science.
近年来,特征提取和降维已经成为许多数据挖掘任务的基本工具,特别是处理高维数据,如基因组数据。本文提出了一种基于稀疏奇异值分解(SSVD)的特征提取方法。采用SSVD算法从两个不同的基因组数据集(均来自The Cancer genome Atlas, TCGA)中提取差异表达基因,并基于基因本体(Gene Ontology, GO)和京都基因与基因组百科全书(Kyoto Encyclopedia of genes and Genomes, KEGG)途径富集分析工具对提取的基因进行评估。作为一种基因提取方法,SSVD还与独立成分分析、p范数鲁棒特征提取和稀疏主成分分析等现有特征提取方法进行了比较。实验结果表明,SSVD方法优于竞争算法。KEGG分析结果显示了参与癌症通路的基因。实验证明,与竞争方法相比,SSVD是一种有效的特征选择方法。KEGG分析结果可为生物医学领域的专业人员进一步开展研究提供有意义的参考。
{"title":"Sparse singular value decomposition-based feature extraction for identifying differentially expressed genes","authors":"Jin-Xing Liu, Xiangzhen Kong, C. Zheng, J. Shang, Wei Zhang","doi":"10.1109/BIBM.2016.7822795","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822795","url":null,"abstract":"Recently, feature extraction and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as genome data. In this paper, a new feature extraction method based on sparse singular value decomposition (SSVD) is developed. SSVD algorithm is applied to extract differentially expressed genes from two different genome datasets that are all from The Cancer Genome Atlas (TCGA), and then the extracted genes are evaluated by the tools based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis, the p-norm robust feature extraction and sparse principal component analysis. The experimental GO analysis results show that SSVD method outperforms the competitive algorithms. The KEGG analysis results demonstrate the genes which participate in the pathways in cancer. The elaborate experiments prove that SSVD is an effective feature selection method compared with the competitive methods. The KEGG analysis results may provide a meaningful reference to carry out further study for professionals in the field of biomedical science.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822709
Waldeyr M. C. Silva, Danilo Vilar, Daniel S. Souza, M. E. Walter, M. Brigido, M. Holanda
Terpenoids are involved in interactions as signaling for communication intra/inter species, signal molecules to attract pollinating insects, and defense against herbivores and microbes. Due to their chemical composition, many terpenoids possess vast pharmacological applicability in medicine and biotechnology, besides important roles in ecology, industry and commerce. The biosynthesis of terpenes has been widely studied over the years, and it is well known that they can be synthesized from two metabolic pathways: mevalonate pathway (MVA) and non-mevalonate pathway (MEP). On the other hand, genome-scale reconstruction of metabolic networks faces many challenges, including organizational data storage and data modeling, to properly represent the complexity of systems biology. Recent NoSQL database paradigms have introduced new concepts of scalable storage and data queries. Among them graph databases, which are versatile enough to cope with biological data. In this paper, we propose 2Path, a graph database model designed to represent terpenoid metabolic networks, with thousands of secondary metabolism reactions, such that it preserves important terpenoid biosynthesis characteristics.
{"title":"2Path: A terpenoid metabolic network modeled as graph database","authors":"Waldeyr M. C. Silva, Danilo Vilar, Daniel S. Souza, M. E. Walter, M. Brigido, M. Holanda","doi":"10.1109/BIBM.2016.7822709","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822709","url":null,"abstract":"Terpenoids are involved in interactions as signaling for communication intra/inter species, signal molecules to attract pollinating insects, and defense against herbivores and microbes. Due to their chemical composition, many terpenoids possess vast pharmacological applicability in medicine and biotechnology, besides important roles in ecology, industry and commerce. The biosynthesis of terpenes has been widely studied over the years, and it is well known that they can be synthesized from two metabolic pathways: mevalonate pathway (MVA) and non-mevalonate pathway (MEP). On the other hand, genome-scale reconstruction of metabolic networks faces many challenges, including organizational data storage and data modeling, to properly represent the complexity of systems biology. Recent NoSQL database paradigms have introduced new concepts of scalable storage and data queries. Among them graph databases, which are versatile enough to cope with biological data. In this paper, we propose 2Path, a graph database model designed to represent terpenoid metabolic networks, with thousands of secondary metabolism reactions, such that it preserves important terpenoid biosynthesis characteristics.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822582
Chuanchao Zhang, Juan Liu, Qianqian Shi, Xiangtian Yu, T. Zeng, Luonan Chen
Integration of different genomic profiles is challenging to understand complex diseases in a multi-view manner. Computational method is needed to preserve useful information of data types as well as correct bias. Thus, we proposed a novel framework pattern fusion analysis (PFA), to fuse the local sample patterns into a global pattern of patients with respect to the underlying data, by adaptively aligning the information in each type of biological data. In particular, PFA can adjust the distinct data types and achieve more robust sample pattern within different profiles. To validate the effectiveness of PFA, we tested PFA on various synthetic datasets and found that PFA is able to effectively capture the intrinsic clustering structure than the state-of-the-art integrative methods, such as moCluster, iClusterPlus and SNF. Moreover, in a case study on kidney cancer, PFA not only identified the multi-way feature modules among the prior-known disease associated genes, methylations and miRNAs, but also outperformed in cancer subtypes identification and could get effective clinical prognosis prediction. Totally, PFA not only provides new insights on the more holistic & systems-level sample pattern, but also supplies a new way for selecting more informative types of biological data.
{"title":"Integration of multiple heterogeneous omics data","authors":"Chuanchao Zhang, Juan Liu, Qianqian Shi, Xiangtian Yu, T. Zeng, Luonan Chen","doi":"10.1109/BIBM.2016.7822582","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822582","url":null,"abstract":"Integration of different genomic profiles is challenging to understand complex diseases in a multi-view manner. Computational method is needed to preserve useful information of data types as well as correct bias. Thus, we proposed a novel framework pattern fusion analysis (PFA), to fuse the local sample patterns into a global pattern of patients with respect to the underlying data, by adaptively aligning the information in each type of biological data. In particular, PFA can adjust the distinct data types and achieve more robust sample pattern within different profiles. To validate the effectiveness of PFA, we tested PFA on various synthetic datasets and found that PFA is able to effectively capture the intrinsic clustering structure than the state-of-the-art integrative methods, such as moCluster, iClusterPlus and SNF. Moreover, in a case study on kidney cancer, PFA not only identified the multi-way feature modules among the prior-known disease associated genes, methylations and miRNAs, but also outperformed in cancer subtypes identification and could get effective clinical prognosis prediction. Totally, PFA not only provides new insights on the more holistic & systems-level sample pattern, but also supplies a new way for selecting more informative types of biological data.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129923959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}