2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文中文

Triptolide regulates immune response network against systemic lupus erythematosus 雷公藤甲素调节系统性红斑狼疮免疫反应网络

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822729

Guang Zheng, Zhibin Wang, Chengqiang Li, Hongtao Guo, Jihua Wang, Xiaojuan He

Traditional Chinese medicine has been using Tripterygium wilfordii Hook f. (TWHF) against systemic lupus erythematosus (SLE) over 200 years. Triptolide is an active compound of TWHF with therapeutic effects for auto-immune disease SLE. However, till now, few associated studies were reported and little is known about the mechanism of triptolide against SLE which blocks the new drug discovery. In this study, focused on the proteins participated in the process of immune respond, an integrated bioinformatics analysis covering targeted proteins, SLE OMIM genes, biological process enrichment, and protein-protein interactions (PPI) was deployed. As a result, the candidate therapeutic network against SLE with negative regulation of immune response was proposed. It contains a PPI network of 7 targeted proteins and 7 SLE OMIM genes further regulated by triptolide. Primary validation of this network indicating that processes of apoptosis and pro-inflammatory processes were involved.

雷公藤治疗系统性红斑狼疮(SLE)已有200多年的历史。雷公藤甲素是一种治疗自身免疫性疾病SLE的活性化合物。然而到目前为止，相关研究报道较少，雷公藤甲素抗SLE的机制也知之甚少，阻碍了新药的发现。本研究以参与免疫应答过程的蛋白为研究对象，开展了包括靶向蛋白、SLE OMIM基因、生物过程富集、蛋白-蛋白相互作用(PPI)等在内的综合生物信息学分析。因此，提出了免疫应答负调控的SLE候选治疗网络。它包含一个由7个靶向蛋白和7个SLE OMIM基因组成的PPI网络，这些基因由雷公藤甲素进一步调控。该网络的初步验证表明参与了细胞凋亡和促炎过程。

引用次数: 1

Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization 基于拉普拉斯正则化的联合对称非负矩阵分解多视图聚类微生物组数据

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822591

Yuanyuan Ma, Xiaohua Hu, Tingting He, Xingpeng Jiang

Many datasets existed in the real world are often comprised of different representations or views which provide complementary information to each other. For example, microbiome datasets can be represented by metabolic paths, taxonomic assignment or gene families. To integrate information from multiple views, data integration approaches such as methods based on nonnegative matrix factorization (NMF) have been developed to combine multi-view information simultaneously to obtain a comprehensive view which reveals the underlying data structure shared by multiple views. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularized joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project (HMP) data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets.

现实世界中存在的许多数据集通常由不同的表示或视图组成，这些表示或视图相互提供互补的信息。例如，微生物组数据集可以用代谢途径、分类分配或基因家族来表示。为了集成多视图信息，人们提出了基于非负矩阵分解(NMF)的数据集成方法，将多视图信息同时组合在一起，从而获得揭示多视图共享的底层数据结构的综合视图。本文提出了对称非负矩阵分解(SNMF)的一种新变体，即拉普拉斯正则化联合对称非负矩阵分解(LJ-SNMF)，用于多视图数据聚类。我们在包括人类微生物组计划(HMP)数据在内的几个现实数据集上进行了广泛的实验。实验结果表明，该方法优于其他NMF方法，表明LJ-SNMF在多视图数据集聚类中的潜在应用。

{"title":"Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization","authors":"Yuanyuan Ma, Xiaohua Hu, Tingting He, Xingpeng Jiang","doi":"10.1109/BIBM.2016.7822591","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822591","url":null,"abstract":"Many datasets existed in the real world are often comprised of different representations or views which provide complementary information to each other. For example, microbiome datasets can be represented by metabolic paths, taxonomic assignment or gene families. To integrate information from multiple views, data integration approaches such as methods based on nonnegative matrix factorization (NMF) have been developed to combine multi-view information simultaneously to obtain a comprehensive view which reveals the underlying data structure shared by multiple views. In this paper, we proposed a novel variant of symmetric nonnegative matrix factorization (SNMF), called Laplacian regularized joint symmetric nonnegative matrix factorization (LJ-SNMF) for clustering multi-view data. We conduct extensive experiments on several realistic datasets including Human Microbiome Project (HMP) data. The experimental results show that the proposed method outperforms other variants of NMF, which suggests the potential application of LJ-SNMF in clustering multi-view datasets.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Topic modeling of biomedical text 生物医学文本的主题建模

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822606

Sarah ElShal, M. Mathad, J. Simm, Jesse Davis, Y. Moreau

The massive growth of biomedical text makes it very challenging for researchers to review all relevant work and generate all possible hypotheses in a reasonable amount of time. Many text mining methods have been developed to simplify this process and quickly present the researcher with a learned set of biomedical hypotheses that could be potentially validated. Previously, we have focused on the task of identifying genes that are linked with a given disease by text mining the PubMed abstracts. We applied a word-based concept profile similarity to learn patterns between disease and gene entities and hence identify links between them. In this work, we study an alternative approach based on topic modelling to learn different patterns between the disease and the gene entities and measure how well this affects the identified links. We investigated multiple input corpuses, word representations, topic parameters, and similarity measures. On one hand, our results show that when we (1) learn the topics from an input set of gene-clustered set of abstracts, and (2) apply the dot-product similarity measure, we succeed to improve our original methods and identify more correct disease-gene links. On the other hand, the results also show that the learned topics remain limited to the diseases existing in our vocabulary such that scaling the methodology to new disease queries becomes non trivial.

生物医学文献的大量增长使得研究人员在合理的时间内审查所有相关工作并产生所有可能的假设非常具有挑战性。许多文本挖掘方法已经开发出来，以简化这一过程，并迅速向研究人员提供一组可能被验证的生物医学假设。以前，我们通过文本挖掘PubMed摘要，专注于识别与特定疾病相关的基因的任务。我们应用基于单词的概念轮廓相似性来学习疾病和基因实体之间的模式，从而确定它们之间的联系。在这项工作中，我们研究了一种基于主题建模的替代方法，以了解疾病和基因实体之间的不同模式，并测量这对已识别链接的影响程度。我们研究了多输入语料库、词表示、主题参数和相似度度量。一方面，我们的结果表明，当我们(1)从输入的基因聚类摘要集中学习主题，(2)应用点积相似度度量时，我们成功地改进了我们的原始方法，并识别出更正确的疾病-基因链接。另一方面，结果还表明，学习的主题仍然局限于我们词汇表中存在的疾病，因此将方法扩展到新的疾病查询变得不平凡。

{"title":"Topic modeling of biomedical text","authors":"Sarah ElShal, M. Mathad, J. Simm, Jesse Davis, Y. Moreau","doi":"10.1109/BIBM.2016.7822606","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822606","url":null,"abstract":"The massive growth of biomedical text makes it very challenging for researchers to review all relevant work and generate all possible hypotheses in a reasonable amount of time. Many text mining methods have been developed to simplify this process and quickly present the researcher with a learned set of biomedical hypotheses that could be potentially validated. Previously, we have focused on the task of identifying genes that are linked with a given disease by text mining the PubMed abstracts. We applied a word-based concept profile similarity to learn patterns between disease and gene entities and hence identify links between them. In this work, we study an alternative approach based on topic modelling to learn different patterns between the disease and the gene entities and measure how well this affects the identified links. We investigated multiple input corpuses, word representations, topic parameters, and similarity measures. On one hand, our results show that when we (1) learn the topics from an input set of gene-clustered set of abstracts, and (2) apply the dot-product similarity measure, we succeed to improve our original methods and identify more correct disease-gene links. On the other hand, the results also show that the learned topics remain limited to the diseases existing in our vocabulary such that scaling the methodology to new disease queries becomes non trivial.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123239096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The MSS of complex networks with centrality based preference and its application to biomolecular networks 基于中心性偏好的复杂网络MSS及其在生物分子网络中的应用

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822523

Lin Wu, Lingkai Tang, Min Li, Jianxin Wang, Fang-Xiang Wu

Networks are employed to represent many real world complex systems. For biological systems, biomolecules interact with each other to form so-called biomolecular networks. The explorations on the connections between structural control theory and biological networks have uncovered some interesting biological phenomena. Recently, some studies have paid attentions to the structural controllability of networks in notion of the minimum steering sets (MSSs). However, the MSSs for a complex network are not unique. Therefore, it is meaningful to find out the most special one with some centrality-based preference. The MSS of a network which has the maximum (minimum) average value of a certain centrality among all possible MSSs of the network can be identified by our method. Then we apply the method to the human liver metabolic network and find that centralities of steering nodes in different MSSs can be remarkably different. In addition, we observe that, for some centralities, the liver cancer reactions are significantly enriched in the MSSs with the minimum average centrality value. This result suggests that when investigating the controllability of biomolecular networks, the centralities, which could provide more meaningful biological information, can be taken into consideration.

网络被用来表示许多现实世界的复杂系统。对于生物系统，生物分子相互作用形成所谓的生物分子网络。对结构控制理论与生物网络之间联系的探索揭示了一些有趣的生物现象。近年来，一些研究以最小转向集的概念来研究网络的结构可控性。然而，复杂网络的mss并不是唯一的。因此，找出具有一定中心性偏好的最特殊的一个是有意义的。在网络的所有可能的MSS中，具有某种中心性的最大(最小)平均值的网络的MSS可以用我们的方法识别。然后，我们将该方法应用于人类肝脏代谢网络，发现不同mss中转向节点的中心性可能有显著差异。此外，我们观察到，对于某些中心性，肝癌反应在平均中心性值最小的mss中显著富集。这一结果表明，在研究生物分子网络的可控性时，可以考虑中心性，因为中心性可以提供更多有意义的生物信息。

{"title":"The MSS of complex networks with centrality based preference and its application to biomolecular networks","authors":"Lin Wu, Lingkai Tang, Min Li, Jianxin Wang, Fang-Xiang Wu","doi":"10.1109/BIBM.2016.7822523","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822523","url":null,"abstract":"Networks are employed to represent many real world complex systems. For biological systems, biomolecules interact with each other to form so-called biomolecular networks. The explorations on the connections between structural control theory and biological networks have uncovered some interesting biological phenomena. Recently, some studies have paid attentions to the structural controllability of networks in notion of the minimum steering sets (MSSs). However, the MSSs for a complex network are not unique. Therefore, it is meaningful to find out the most special one with some centrality-based preference. The MSS of a network which has the maximum (minimum) average value of a certain centrality among all possible MSSs of the network can be identified by our method. Then we apply the method to the human liver metabolic network and find that centralities of steering nodes in different MSSs can be remarkably different. In addition, we observe that, for some centralities, the liver cancer reactions are significantly enriched in the MSSs with the minimum average centrality value. This result suggests that when investigating the controllability of biomolecular networks, the centralities, which could provide more meaningful biological information, can be taken into consideration.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Exploration of alternative GPU implementations of the pair-HMMs forward algorithm 探索pair- hmm前向算法的其他GPU实现

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822645

Shanshan Ren, K. Bertels, Z. Al-Ars

In order to handle the massive raw data generated by next generation sequencing (NGS) platforms, GPUs are widely used by many genetic analysis tools to speed up the used algorithms. In this paper, we use GPUs to accelerate the pair-HMMs forward algorithm, which is used to calculate the overall alignment probability in many genomics analysis tools. We firstly evaluate two different implementation methods to accelerate the pair-HMMs forward algorithm according to their effectiveness on GPU platforms. Based on these two methods, we present several implementations of the pair-HMMs forward algorithm. We execute these implementations on the NVIDIA Tesla K40 card using different datasets to compare the performance. Experimental results show that the intra-task implementation has the highest throughput in most cases, achieving pure computational throughput as high as 23.56 GCUPS for synthetic datasets. On a real dataset, the inter-task implementation achieves 4.82× speedup compared with a parallelized software implementation executed on a 20-core POWER8 system.

为了处理下一代测序(NGS)平台产生的大量原始数据，许多遗传分析工具广泛使用gpu来加快所用算法的速度。在本文中，我们使用gpu来加速pair- hmm前向算法，该算法在许多基因组学分析工具中用于计算总体比对概率。首先，我们根据两种不同的实现方法在GPU平台上的有效性，对加速pair- hmm前向算法的两种不同实现方法进行了评估。在这两种方法的基础上，我们给出了对hmm前向算法的几种实现。我们使用不同的数据集在NVIDIA Tesla K40卡上执行这些实现来比较性能。实验结果表明，在大多数情况下，任务内实现具有最高的吞吐量，对于合成数据集，其纯计算吞吐量高达23.56 GCUPS。在真实数据集上，与在20核POWER8系统上执行的并行化软件实现相比，任务间实现实现了4.82倍的加速。

引用次数: 17

Analyzing epileptic network dynamics via time-variant partial directed coherence 通过时变部分定向相干分析癫痫网络动力学

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822547

Bo-Wen Liu, Jun-Wei Mao, Ye-Jun Shi, Q. Lu, P. Liang, Pu-Ming Zhang

Epilepsy is growingly considered as a brain network disorder. In this study, epileptiform discharges were induced by low-Mg2+ in mouse entorhinal cortex-hippocampal slices, and recorded with a micro-electrode array. Dynamic effective network connectivity was constructed by calculating the time-variant partial directed coherence (tvPDC) of signals. We proposed a novel approach to track the state transitions of epileptic networks over time, and characterized the network topology by using graphical measures. We found that the hub nodes with high degrees in the network coincided with the epileptogenic zone in previous electrophysiological findings. Two consecutive states with distinct network topologies were identified during the ictal-like discharges. The small-worldness remained at a low level at the first state but increased significantly at the second state. Our results indicate the ability of tvPDC to capture the causal interaction between multi-channel signals important in indentifying the epileptogenetic zone. Moreover, the evolution of network states extends our knowledge of the network drivers for the initiation and maintenance of ical activity, and suggests the practical value of our network clustering approach.

癫痫越来越被认为是一种大脑网络紊乱。在本研究中，低mg2 +诱导小鼠内嗅皮层-海马切片出现癫痫样放电，并通过微电极阵列记录。通过计算信号的时变部分定向相干(tvPDC)，构建了动态有效网络连通性。我们提出了一种新的方法来跟踪癫痫网络的状态随时间的变化，并通过图形化的度量来表征网络拓扑。我们发现网络中高度的枢纽节点与先前电生理发现的致痫区一致。两种具有不同网络拓扑结构的连续状态被识别为在初始放电期间。小世界度在第一状态保持在较低水平，而在第二状态显著增加。我们的研究结果表明，tvPDC能够捕捉多通道信号之间的因果相互作用，这对确定癫痫发生区很重要。此外，网络状态的演化扩展了我们对网络活动发起和维持的网络驱动因素的认识，并表明了我们的网络聚类方法的实用价值。

{"title":"Analyzing epileptic network dynamics via time-variant partial directed coherence","authors":"Bo-Wen Liu, Jun-Wei Mao, Ye-Jun Shi, Q. Lu, P. Liang, Pu-Ming Zhang","doi":"10.1109/BIBM.2016.7822547","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822547","url":null,"abstract":"Epilepsy is growingly considered as a brain network disorder. In this study, epileptiform discharges were induced by low-Mg2+ in mouse entorhinal cortex-hippocampal slices, and recorded with a micro-electrode array. Dynamic effective network connectivity was constructed by calculating the time-variant partial directed coherence (tvPDC) of signals. We proposed a novel approach to track the state transitions of epileptic networks over time, and characterized the network topology by using graphical measures. We found that the hub nodes with high degrees in the network coincided with the epileptogenic zone in previous electrophysiological findings. Two consecutive states with distinct network topologies were identified during the ictal-like discharges. The small-worldness remained at a low level at the first state but increased significantly at the second state. Our results indicate the ability of tvPDC to capture the causal interaction between multi-channel signals important in indentifying the epileptogenetic zone. Moreover, the evolution of network states extends our knowledge of the network drivers for the initiation and maintenance of ical activity, and suggests the practical value of our network clustering approach.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122202201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network 基于卷积递归神经网络的多通道脑电数据情绪识别

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822545

Xiang Li, D. Song, Peng Zhang, Guangliang Yu, Yuexian Hou, B. Hu

Automatic emotion recognition based on multi-channel neurophysiological signals, as a challenging pattern recognition task, is becoming an important computer-aided method for emotional disorder diagnoses in neurology and psychiatry. Traditional approaches require designing and extracting a range of features from single or multiple channel signals based on extensive domain knowledge. This may be an obstacle for non-domain experts. Moreover, traditional feature fusion method can not fully utilize correlation information between different channels. In this paper, we propose a preprocessing method that encapsulates the multi-channel neurophysiological signals into grid-like frames through wavelet and scalogram transform. We further design a hybrid deep learning model that combines the ‘Convolutional Neural Network (CNN)’ and ‘Recurrent Neural Network (RNN)’, for extracting task-related features, mining inter-channel correlation and incorporating contextual information from those frames. Experiments are carried out, in a trial-level emotion recognition task, on the DEAP benchmarking dataset. Our results demonstrate the effectiveness of the proposed methods, with respect to the emotional dimensions of Valence and Arousal.

基于多通道神经生理信号的情绪自动识别作为一项具有挑战性的模式识别任务，正在成为神经病学和精神病学中情绪障碍诊断的重要计算机辅助方法。传统的方法需要基于广泛的领域知识，从单通道或多通道信号中设计和提取一系列特征。这对于非领域专家来说可能是一个障碍。此外，传统的特征融合方法不能充分利用不同通道之间的相关信息。本文提出了一种将多通道神经生理信号通过小波变换和尺度图变换封装成网格状帧的预处理方法。我们进一步设计了一个混合深度学习模型，该模型结合了“卷积神经网络(CNN)”和“循环神经网络(RNN)”，用于提取任务相关特征，挖掘通道间相关性并从这些框架中合并上下文信息。在DEAP基准数据集上进行了实验级情绪识别任务。我们的结果证明了所提出的方法在效价和唤醒的情感维度方面的有效性。

{"title":"Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network","authors":"Xiang Li, D. Song, Peng Zhang, Guangliang Yu, Yuexian Hou, B. Hu","doi":"10.1109/BIBM.2016.7822545","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822545","url":null,"abstract":"Automatic emotion recognition based on multi-channel neurophysiological signals, as a challenging pattern recognition task, is becoming an important computer-aided method for emotional disorder diagnoses in neurology and psychiatry. Traditional approaches require designing and extracting a range of features from single or multiple channel signals based on extensive domain knowledge. This may be an obstacle for non-domain experts. Moreover, traditional feature fusion method can not fully utilize correlation information between different channels. In this paper, we propose a preprocessing method that encapsulates the multi-channel neurophysiological signals into grid-like frames through wavelet and scalogram transform. We further design a hybrid deep learning model that combines the ‘Convolutional Neural Network (CNN)’ and ‘Recurrent Neural Network (RNN)’, for extracting task-related features, mining inter-channel correlation and incorporating contextual information from those frames. Experiments are carried out, in a trial-level emotion recognition task, on the DEAP benchmarking dataset. Our results demonstrate the effectiveness of the proposed methods, with respect to the emotional dimensions of Valence and Arousal.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 176

Developing a robust colorectal cancer (CRC) risk predictive model with the big genetic and environment related CRC data 利用与结直肠癌相关的大量遗传和环境数据，建立可靠的结直肠癌(CRC)风险预测模型

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822806

Chunqiu Zheng, Lei Xing, Tian Li, Tingting Li, Huan Yang, Jia Cao, Badong Chen, Ziyuan Zhou, Le Zhang

Currently, colorectal cancer (CRC) already becomes one of the most common cancers worldwide. Though the prognosis of CRC patients is dramatically improved due to the new advanced treatments and medical improvements, the 5-year survival rate for the CRC patient is still low. Thus, we hypothesize that CRC may result from the complicated reasons related to both genetic and environmental factors. For this reason, this study collects such big CRC data with information of genetic variations and environmental exposure for the CRC patients and cancer-free controls that are employed to train and test the predictive CRC model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to cause the CRC by the manually reviewed experimental evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, (3) our innovated generalized kernel recursive maximum correntropy(GKRMC) algorithm has high predictive power. Finally, we discuss why the GKRMC can outperform the classical regression algorithms and the related future study.

目前，结直肠癌(CRC)已经成为世界范围内最常见的癌症之一。虽然由于新的先进治疗方法和医学水平的提高，结直肠癌患者的预后得到了显著改善，但结直肠癌患者的5年生存率仍然很低。因此，我们推测CRC可能是由遗传和环境因素共同作用的复杂原因引起的。因此，本研究收集了CRC患者和无癌对照的CRC遗传变异和环境暴露信息的大数据，用于CRC预测模型的训练和测试。研究结果表明:(1)人工评审的实验证据证实了所探索的遗传和环境生物标志物是导致结直肠癌的原因;(2)利用结直肠癌相关大数据进行参数优化后，该模型可以有效预测结直肠癌的风险;(3)我们创新的广义核递归最大相关熵(GKRMC)算法具有较高的预测能力。最后，我们讨论了GKRMC优于经典回归算法的原因以及相关的未来研究。

{"title":"Developing a robust colorectal cancer (CRC) risk predictive model with the big genetic and environment related CRC data","authors":"Chunqiu Zheng, Lei Xing, Tian Li, Tingting Li, Huan Yang, Jia Cao, Badong Chen, Ziyuan Zhou, Le Zhang","doi":"10.1109/BIBM.2016.7822806","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822806","url":null,"abstract":"Currently, colorectal cancer (CRC) already becomes one of the most common cancers worldwide. Though the prognosis of CRC patients is dramatically improved due to the new advanced treatments and medical improvements, the 5-year survival rate for the CRC patient is still low. Thus, we hypothesize that CRC may result from the complicated reasons related to both genetic and environmental factors. For this reason, this study collects such big CRC data with information of genetic variations and environmental exposure for the CRC patients and cancer-free controls that are employed to train and test the predictive CRC model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to cause the CRC by the manually reviewed experimental evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, (3) our innovated generalized kernel recursive maximum correntropy(GKRMC) algorithm has high predictive power. Finally, we discuss why the GKRMC can outperform the classical regression algorithms and the related future study.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The need of accelerators in analyzing biological networks 分析生物网络需要加速器

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822733

Jian-Yu Shi

As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.

随着高通量技术在生物学及其相关学科(化学或医学)的发展，大量的生物条目是可用的。发现它们之间的关系(例如相互作用或关联)揭示了重要的生物学事实，这些事实在基于个体的生物学实验中从未发现过。生物网络是系统分析和揭示这些事实的合适工具。生物分子之间的关系通常被建模为单侧网络，如蛋白质-蛋白质的相互作用，而生物分子与其他物体之间的关系被建模为双侧网络，如化合物-蛋白质的相互作用，基因-疾病的关联，ncrna -疾病的关联。生物网络可能包含大量节点，每个节点都具有许多异构属性，包括二进制、实值和语义形式。目前基于大规模生物网络的系统分析算法由于计算量大，要么占用大量内存，要么耗费大量时间。以化合物-蛋白质相互作用网络为例。《PubChem》中有超过9000万种化合物，每种化合物都被描述为高维向量(例如881 d PubChem指纹或4860 d Klekota-Roth指纹)。同时，如果采用K-mer描述符，则可以将蛋白质表征为20k维向量。然而，涉及到密集的矩阵操作(如矩阵分解、逆和张量积)，目前的算法不能直接应用于预测大规模的化合物-蛋白质相互作用。例如，具有复杂度O(n3)，奇异值分解(SVD)在Windows 7(64位)下使用Intel Corei7-4700MQ (2.40G)和GeForce GTX 765M在MATLAB 2013b(64位)中运行6,000□6,000矩阵。SVD在仅使用CPU、CPU 4 worker和CPU + GPU时分别花费81.9秒、77.9秒和51.4秒。因此，迫切需要将它们转化为支持加速器的并行算法或开发新的加速器来加速生物网络中的知识挖掘。

{"title":"The need of accelerators in analyzing biological networks","authors":"Jian-Yu Shi","doi":"10.1109/BIBM.2016.7822733","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822733","url":null,"abstract":"As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121162931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GBOOST 2.0: A GPU-based tool for detecting gene-gene interactions with covariates adjustment in genome-wide association studies GBOOST 2.0:一种基于gpu的工具，用于检测全基因组关联研究中伴随协变量调整的基因-基因相互作用

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822734

M. Wang, Wei Jiang, R. Ma, Weichuan Yu

Detecting gene-gene interaction patterns is important to reveal associations between genotype and complex diseases. This task, however, is computationally challenging. For example, in order to exhaustively detect interactions of 1,000,000 single nucleotide polymorphisms (SNPs) genotyped from thousands of individuals, we need to carry out 5×1011 statistical tests. To address the computational challenge, Wan et. al. [1] proposed a fast method named BOOST to exhaustively detect interactions of all SNP pairs. BOOST completes pairwise analysis of 360,000 SNPs in 60 hours on a standard desktop PC. As the interaction tests of SNP pairs are highly parallel, Yung et. al. [2] implemented the BOOST method in GPU and named it GBOOST. GBOOST usually takes about one and a half hours to finish genome-wide interaction analysis of a data set containing about 350,000 SNPs and 5,000 samples using Nvidia GeForce GTX 285 dispaly card.

检测基因-基因相互作用模式对于揭示基因型与复杂疾病之间的关系非常重要。然而，这项任务在计算上具有挑战性。例如，为了详尽地检测来自数千个个体的1,000,000个单核苷酸多态性(snp)基因分型的相互作用，我们需要进行5×1011统计测试。为了解决计算挑战，Wan等人提出了一种名为BOOST的快速方法，以详尽地检测所有SNP对的相互作用。BOOST在标准台式电脑上60小时内完成360,000个snp的成对分析。由于SNP对的相互作用测试具有高度并行性，Yung等人[2]在GPU中实现了BOOST方法，并将其命名为GBOOST。使用Nvidia GeForce GTX 285显卡，GBOOST通常需要一个半小时左右的时间来完成包含约35万个snp和5000个样本的数据集的全基因组相互作用分析。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀