Traditional Chinese medicine has been using Tripterygium wilfordii Hook f. (TWHF) against systemic lupus erythematosus (SLE) over 200 years. Triptolide is an active compound of TWHF with therapeutic effects for auto-immune disease SLE. However, till now, few associated studies were reported and little is known about the mechanism of triptolide against SLE which blocks the new drug discovery. In this study, focused on the proteins participated in the process of immune respond, an integrated bioinformatics analysis covering targeted proteins, SLE OMIM genes, biological process enrichment, and protein-protein interactions (PPI) was deployed. As a result, the candidate therapeutic network against SLE with negative regulation of immune response was proposed. It contains a PPI network of 7 targeted proteins and 7 SLE OMIM genes further regulated by triptolide. Primary validation of this network indicating that processes of apoptosis and pro-inflammatory processes were involved.
{"title":"Triptolide regulates immune response network against systemic lupus erythematosus","authors":"Guang Zheng, Zhibin Wang, Chengqiang Li, Hongtao Guo, Jihua Wang, Xiaojuan He","doi":"10.1109/BIBM.2016.7822729","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822729","url":null,"abstract":"Traditional Chinese medicine has been using Tripterygium wilfordii Hook f. (TWHF) against systemic lupus erythematosus (SLE) over 200 years. Triptolide is an active compound of TWHF with therapeutic effects for auto-immune disease SLE. However, till now, few associated studies were reported and little is known about the mechanism of triptolide against SLE which blocks the new drug discovery. In this study, focused on the proteins participated in the process of immune respond, an integrated bioinformatics analysis covering targeted proteins, SLE OMIM genes, biological process enrichment, and protein-protein interactions (PPI) was deployed. As a result, the candidate therapeutic network against SLE with negative regulation of immune response was proposed. It contains a PPI network of 7 targeted proteins and 7 SLE OMIM genes further regulated by triptolide. Primary validation of this network indicating that processes of apoptosis and pro-inflammatory processes were involved.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131724261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many datasets existed in the real world are often comprised of different representations or views which provide complementary information to each other. For example, microbiome datasets can be represented by metabolic paths, taxonomic assignment or gene families. To integrate information from multiple views, data integration approaches such as methods based on nonnegative matrix factorization (NMF) have been developed to combine multi-view information simultaneously to obtain a comprehensive view which reveals the underlying data structure shared by multiple views. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularized joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project (HMP) data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets.
{"title":"Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization","authors":"Yuanyuan Ma, Xiaohua Hu, Tingting He, Xingpeng Jiang","doi":"10.1109/BIBM.2016.7822591","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822591","url":null,"abstract":"Many datasets existed in the real world are often comprised of different representations or views which provide complementary information to each other. For example, microbiome datasets can be represented by metabolic paths, taxonomic assignment or gene families. To integrate information from multiple views, data integration approaches such as methods based on nonnegative matrix factorization (NMF) have been developed to combine multi-view information simultaneously to obtain a comprehensive view which reveals the underlying data structure shared by multiple views. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularized joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project (HMP) data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822606
Sarah ElShal, M. Mathad, J. Simm, Jesse Davis, Y. Moreau
The massive growth of biomedical text makes it very challenging for researchers to review all relevant work and generate all possible hypotheses in a reasonable amount of time. Many text mining methods have been developed to simplify this process and quickly present the researcher with a learned set of biomedical hypotheses that could be potentially validated. Previously, we have focused on the task of identifying genes that are linked with a given disease by text mining the PubMed abstracts. We applied a word-based concept profile similarity to learn patterns between disease and gene entities and hence identify links between them. In this work, we study an alternative approach based on topic modelling to learn different patterns between the disease and the gene entities and measure how well this affects the identified links. We investigated multiple input corpuses, word representations, topic parameters, and similarity measures. On one hand, our results show that when we (1) learn the topics from an input set of gene-clustered set of abstracts, and (2) apply the dot-product similarity measure, we succeed to improve our original methods and identify more correct disease-gene links. On the other hand, the results also show that the learned topics remain limited to the diseases existing in our vocabulary such that scaling the methodology to new disease queries becomes non trivial.
{"title":"Topic modeling of biomedical text","authors":"Sarah ElShal, M. Mathad, J. Simm, Jesse Davis, Y. Moreau","doi":"10.1109/BIBM.2016.7822606","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822606","url":null,"abstract":"The massive growth of biomedical text makes it very challenging for researchers to review all relevant work and generate all possible hypotheses in a reasonable amount of time. Many text mining methods have been developed to simplify this process and quickly present the researcher with a learned set of biomedical hypotheses that could be potentially validated. Previously, we have focused on the task of identifying genes that are linked with a given disease by text mining the PubMed abstracts. We applied a word-based concept profile similarity to learn patterns between disease and gene entities and hence identify links between them. In this work, we study an alternative approach based on topic modelling to learn different patterns between the disease and the gene entities and measure how well this affects the identified links. We investigated multiple input corpuses, word representations, topic parameters, and similarity measures. On one hand, our results show that when we (1) learn the topics from an input set of gene-clustered set of abstracts, and (2) apply the dot-product similarity measure, we succeed to improve our original methods and identify more correct disease-gene links. On the other hand, the results also show that the learned topics remain limited to the diseases existing in our vocabulary such that scaling the methodology to new disease queries becomes non trivial.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123239096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822523
Lin Wu, Lingkai Tang, Min Li, Jianxin Wang, Fang-Xiang Wu
Networks are employed to represent many real world complex systems. For biological systems, biomolecules interact with each other to form so-called biomolecular networks. The explorations on the connections between structural control theory and biological networks have uncovered some interesting biological phenomena. Recently, some studies have paid attentions to the structural controllability of networks in notion of the minimum steering sets (MSSs). However, the MSSs for a complex network are not unique. Therefore, it is meaningful to find out the most special one with some centrality-based preference. The MSS of a network which has the maximum (minimum) average value of a certain centrality among all possible MSSs of the network can be identified by our method. Then we apply the method to the human liver metabolic network and find that centralities of steering nodes in different MSSs can be remarkably different. In addition, we observe that, for some centralities, the liver cancer reactions are significantly enriched in the MSSs with the minimum average centrality value. This result suggests that when investigating the controllability of biomolecular networks, the centralities, which could provide more meaningful biological information, can be taken into consideration.
{"title":"The MSS of complex networks with centrality based preference and its application to biomolecular networks","authors":"Lin Wu, Lingkai Tang, Min Li, Jianxin Wang, Fang-Xiang Wu","doi":"10.1109/BIBM.2016.7822523","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822523","url":null,"abstract":"Networks are employed to represent many real world complex systems. For biological systems, biomolecules interact with each other to form so-called biomolecular networks. The explorations on the connections between structural control theory and biological networks have uncovered some interesting biological phenomena. Recently, some studies have paid attentions to the structural controllability of networks in notion of the minimum steering sets (MSSs). However, the MSSs for a complex network are not unique. Therefore, it is meaningful to find out the most special one with some centrality-based preference. The MSS of a network which has the maximum (minimum) average value of a certain centrality among all possible MSSs of the network can be identified by our method. Then we apply the method to the human liver metabolic network and find that centralities of steering nodes in different MSSs can be remarkably different. In addition, we observe that, for some centralities, the liver cancer reactions are significantly enriched in the MSSs with the minimum average centrality value. This result suggests that when investigating the controllability of biomolecular networks, the centralities, which could provide more meaningful biological information, can be taken into consideration.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822645
Shanshan Ren, K. Bertels, Z. Al-Ars
In order to handle the massive raw data generated by next generation sequencing (NGS) platforms, GPUs are widely used by many genetic analysis tools to speed up the used algorithms. In this paper, we use GPUs to accelerate the pair-HMMs forward algorithm, which is used to calculate the overall alignment probability in many genomics analysis tools. We firstly evaluate two different implementation methods to accelerate the pair-HMMs forward algorithm according to their effectiveness on GPU platforms. Based on these two methods, we present several implementations of the pair-HMMs forward algorithm. We execute these implementations on the NVIDIA Tesla K40 card using different datasets to compare the performance. Experimental results show that the intra-task implementation has the highest throughput in most cases, achieving pure computational throughput as high as 23.56 GCUPS for synthetic datasets. On a real dataset, the inter-task implementation achieves 4.82× speedup compared with a parallelized software implementation executed on a 20-core POWER8 system.
为了处理下一代测序(NGS)平台产生的大量原始数据,许多遗传分析工具广泛使用gpu来加快所用算法的速度。在本文中,我们使用gpu来加速pair- hmm前向算法,该算法在许多基因组学分析工具中用于计算总体比对概率。首先,我们根据两种不同的实现方法在GPU平台上的有效性,对加速pair- hmm前向算法的两种不同实现方法进行了评估。在这两种方法的基础上,我们给出了对hmm前向算法的几种实现。我们使用不同的数据集在NVIDIA Tesla K40卡上执行这些实现来比较性能。实验结果表明,在大多数情况下,任务内实现具有最高的吞吐量,对于合成数据集,其纯计算吞吐量高达23.56 GCUPS。在真实数据集上,与在20核POWER8系统上执行的并行化软件实现相比,任务间实现实现了4.82倍的加速。
{"title":"Exploration of alternative GPU implementations of the pair-HMMs forward algorithm","authors":"Shanshan Ren, K. Bertels, Z. Al-Ars","doi":"10.1109/BIBM.2016.7822645","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822645","url":null,"abstract":"In order to handle the massive raw data generated by next generation sequencing (NGS) platforms, GPUs are widely used by many genetic analysis tools to speed up the used algorithms. In this paper, we use GPUs to accelerate the pair-HMMs forward algorithm, which is used to calculate the overall alignment probability in many genomics analysis tools. We firstly evaluate two different implementation methods to accelerate the pair-HMMs forward algorithm according to their effectiveness on GPU platforms. Based on these two methods, we present several implementations of the pair-HMMs forward algorithm. We execute these implementations on the NVIDIA Tesla K40 card using different datasets to compare the performance. Experimental results show that the intra-task implementation has the highest throughput in most cases, achieving pure computational throughput as high as 23.56 GCUPS for synthetic datasets. On a real dataset, the inter-task implementation achieves 4.82× speedup compared with a parallelized software implementation executed on a 20-core POWER8 system.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Epilepsy is growingly considered as a brain network disorder. In this study, epileptiform discharges were induced by low-Mg2+ in mouse entorhinal cortex-hippocampal slices, and recorded with a micro-electrode array. Dynamic effective network connectivity was constructed by calculating the time-variant partial directed coherence (tvPDC) of signals. We proposed a novel approach to track the state transitions of epileptic networks over time, and characterized the network topology by using graphical measures. We found that the hub nodes with high degrees in the network coincided with the epileptogenic zone in previous electrophysiological findings. Two consecutive states with distinct network topologies were identified during the ictal-like discharges. The small-worldness remained at a low level at the first state but increased significantly at the second state. Our results indicate the ability of tvPDC to capture the causal interaction between multi-channel signals important in indentifying the epileptogenetic zone. Moreover, the evolution of network states extends our knowledge of the network drivers for the initiation and maintenance of ical activity, and suggests the practical value of our network clustering approach.
{"title":"Analyzing epileptic network dynamics via time-variant partial directed coherence","authors":"Bo-Wen Liu, Jun-Wei Mao, Ye-Jun Shi, Q. Lu, P. Liang, Pu-Ming Zhang","doi":"10.1109/BIBM.2016.7822547","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822547","url":null,"abstract":"Epilepsy is growingly considered as a brain network disorder. In this study, epileptiform discharges were induced by low-Mg2+ in mouse entorhinal cortex-hippocampal slices, and recorded with a micro-electrode array. Dynamic effective network connectivity was constructed by calculating the time-variant partial directed coherence (tvPDC) of signals. We proposed a novel approach to track the state transitions of epileptic networks over time, and characterized the network topology by using graphical measures. We found that the hub nodes with high degrees in the network coincided with the epileptogenic zone in previous electrophysiological findings. Two consecutive states with distinct network topologies were identified during the ictal-like discharges. The small-worldness remained at a low level at the first state but increased significantly at the second state. Our results indicate the ability of tvPDC to capture the causal interaction between multi-channel signals important in indentifying the epileptogenetic zone. Moreover, the evolution of network states extends our knowledge of the network drivers for the initiation and maintenance of ical activity, and suggests the practical value of our network clustering approach.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122202201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822545
Xiang Li, D. Song, Peng Zhang, Guangliang Yu, Yuexian Hou, B. Hu
Automatic emotion recognition based on multi-channel neurophysiological signals, as a challenging pattern recognition task, is becoming an important computer-aided method for emotional disorder diagnoses in neurology and psychiatry. Traditional approaches require designing and extracting a range of features from single or multiple channel signals based on extensive domain knowledge. This may be an obstacle for non-domain experts. Moreover, traditional feature fusion method can not fully utilize correlation information between different channels. In this paper, we propose a preprocessing method that encapsulates the multi-channel neurophysiological signals into grid-like frames through wavelet and scalogram transform. We further design a hybrid deep learning model that combines the ‘Convolutional Neural Network (CNN)’ and ‘Recurrent Neural Network (RNN)’, for extracting task-related features, mining inter-channel correlation and incorporating contextual information from those frames. Experiments are carried out, in a trial-level emotion recognition task, on the DEAP benchmarking dataset. Our results demonstrate the effectiveness of the proposed methods, with respect to the emotional dimensions of Valence and Arousal.
{"title":"Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network","authors":"Xiang Li, D. Song, Peng Zhang, Guangliang Yu, Yuexian Hou, B. Hu","doi":"10.1109/BIBM.2016.7822545","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822545","url":null,"abstract":"Automatic emotion recognition based on multi-channel neurophysiological signals, as a challenging pattern recognition task, is becoming an important computer-aided method for emotional disorder diagnoses in neurology and psychiatry. Traditional approaches require designing and extracting a range of features from single or multiple channel signals based on extensive domain knowledge. This may be an obstacle for non-domain experts. Moreover, traditional feature fusion method can not fully utilize correlation information between different channels. In this paper, we propose a preprocessing method that encapsulates the multi-channel neurophysiological signals into grid-like frames through wavelet and scalogram transform. We further design a hybrid deep learning model that combines the ‘Convolutional Neural Network (CNN)’ and ‘Recurrent Neural Network (RNN)’, for extracting task-related features, mining inter-channel correlation and incorporating contextual information from those frames. Experiments are carried out, in a trial-level emotion recognition task, on the DEAP benchmarking dataset. Our results demonstrate the effectiveness of the proposed methods, with respect to the emotional dimensions of Valence and Arousal.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822806
Chunqiu Zheng, Lei Xing, Tian Li, Tingting Li, Huan Yang, Jia Cao, Badong Chen, Ziyuan Zhou, Le Zhang
Currently, colorectal cancer (CRC) already becomes one of the most common cancers worldwide. Though the prognosis of CRC patients is dramatically improved due to the new advanced treatments and medical improvements, the 5-year survival rate for the CRC patient is still low. Thus, we hypothesize that CRC may result from the complicated reasons related to both genetic and environmental factors. For this reason, this study collects such big CRC data with information of genetic variations and environmental exposure for the CRC patients and cancer-free controls that are employed to train and test the predictive CRC model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to cause the CRC by the manually reviewed experimental evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, (3) our innovated generalized kernel recursive maximum correntropy(GKRMC) algorithm has high predictive power. Finally, we discuss why the GKRMC can outperform the classical regression algorithms and the related future study.
{"title":"Developing a robust colorectal cancer (CRC) risk predictive model with the big genetic and environment related CRC data","authors":"Chunqiu Zheng, Lei Xing, Tian Li, Tingting Li, Huan Yang, Jia Cao, Badong Chen, Ziyuan Zhou, Le Zhang","doi":"10.1109/BIBM.2016.7822806","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822806","url":null,"abstract":"Currently, colorectal cancer (CRC) already becomes one of the most common cancers worldwide. Though the prognosis of CRC patients is dramatically improved due to the new advanced treatments and medical improvements, the 5-year survival rate for the CRC patient is still low. Thus, we hypothesize that CRC may result from the complicated reasons related to both genetic and environmental factors. For this reason, this study collects such big CRC data with information of genetic variations and environmental exposure for the CRC patients and cancer-free controls that are employed to train and test the predictive CRC model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to cause the CRC by the manually reviewed experimental evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, (3) our innovated generalized kernel recursive maximum correntropy(GKRMC) algorithm has high predictive power. Finally, we discuss why the GKRMC can outperform the classical regression algorithms and the related future study.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822733
Jian-Yu Shi
As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.
随着高通量技术在生物学及其相关学科(化学或医学)的发展,大量的生物条目是可用的。发现它们之间的关系(例如相互作用或关联)揭示了重要的生物学事实,这些事实在基于个体的生物学实验中从未发现过。生物网络是系统分析和揭示这些事实的合适工具。生物分子之间的关系通常被建模为单侧网络,如蛋白质-蛋白质的相互作用,而生物分子与其他物体之间的关系被建模为双侧网络,如化合物-蛋白质的相互作用,基因-疾病的关联,ncrna -疾病的关联。生物网络可能包含大量节点,每个节点都具有许多异构属性,包括二进制、实值和语义形式。目前基于大规模生物网络的系统分析算法由于计算量大,要么占用大量内存,要么耗费大量时间。以化合物-蛋白质相互作用网络为例。《PubChem》中有超过9000万种化合物,每种化合物都被描述为高维向量(例如881 d PubChem指纹或4860 d Klekota-Roth指纹)。同时,如果采用K-mer描述符,则可以将蛋白质表征为20k维向量。然而,涉及到密集的矩阵操作(如矩阵分解、逆和张量积),目前的算法不能直接应用于预测大规模的化合物-蛋白质相互作用。例如,具有复杂度O(n3),奇异值分解(SVD)在Windows 7(64位)下使用Intel Corei7-4700MQ (2.40G)和GeForce GTX 765M在MATLAB 2013b(64位)中运行6,000□6,000矩阵。SVD在仅使用CPU、CPU 4 worker和CPU + GPU时分别花费81.9秒、77.9秒和51.4秒。因此,迫切需要将它们转化为支持加速器的并行算法或开发新的加速器来加速生物网络中的知识挖掘。
{"title":"The need of accelerators in analyzing biological networks","authors":"Jian-Yu Shi","doi":"10.1109/BIBM.2016.7822733","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822733","url":null,"abstract":"As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121162931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822734
M. Wang, Wei Jiang, R. Ma, Weichuan Yu
Detecting gene-gene interaction patterns is important to reveal associations between genotype and complex diseases. This task, however, is computationally challenging. For example, in order to exhaustively detect interactions of 1,000,000 single nucleotide polymorphisms (SNPs) genotyped from thousands of individuals, we need to carry out 5×1011 statistical tests. To address the computational challenge, Wan et. al. [1] proposed a fast method named BOOST to exhaustively detect interactions of all SNP pairs. BOOST completes pairwise analysis of 360,000 SNPs in 60 hours on a standard desktop PC. As the interaction tests of SNP pairs are highly parallel, Yung et. al. [2] implemented the BOOST method in GPU and named it GBOOST. GBOOST usually takes about one and a half hours to finish genome-wide interaction analysis of a data set containing about 350,000 SNPs and 5,000 samples using Nvidia GeForce GTX 285 dispaly card.
{"title":"GBOOST 2.0: A GPU-based tool for detecting gene-gene interactions with covariates adjustment in genome-wide association studies","authors":"M. Wang, Wei Jiang, R. Ma, Weichuan Yu","doi":"10.1109/BIBM.2016.7822734","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822734","url":null,"abstract":"Detecting gene-gene interaction patterns is important to reveal associations between genotype and complex diseases. This task, however, is computationally challenging. For example, in order to exhaustively detect interactions of 1,000,000 single nucleotide polymorphisms (SNPs) genotyped from thousands of individuals, we need to carry out 5×1011 statistical tests. To address the computational challenge, Wan et. al. [1] proposed a fast method named BOOST to exhaustively detect interactions of all SNP pairs. BOOST completes pairwise analysis of 360,000 SNPs in 60 hours on a standard desktop PC. As the interaction tests of SNP pairs are highly parallel, Yung et. al. [2] implemented the BOOST method in GPU and named it GBOOST. GBOOST usually takes about one and a half hours to finish genome-wide interaction analysis of a data set containing about 350,000 SNPs and 5,000 samples using Nvidia GeForce GTX 285 dispaly card.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"13 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}