The need of accelerators in analyzing biological networks

Jian-Yu Shi
{"title":"The need of accelerators in analyzing biological networks","authors":"Jian-Yu Shi","doi":"10.1109/BIBM.2016.7822733","DOIUrl":null,"url":null,"abstract":"As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分析生物网络需要加速器
随着高通量技术在生物学及其相关学科(化学或医学)的发展,大量的生物条目是可用的。发现它们之间的关系(例如相互作用或关联)揭示了重要的生物学事实,这些事实在基于个体的生物学实验中从未发现过。生物网络是系统分析和揭示这些事实的合适工具。生物分子之间的关系通常被建模为单侧网络,如蛋白质-蛋白质的相互作用,而生物分子与其他物体之间的关系被建模为双侧网络,如化合物-蛋白质的相互作用,基因-疾病的关联,ncrna -疾病的关联。生物网络可能包含大量节点,每个节点都具有许多异构属性,包括二进制、实值和语义形式。目前基于大规模生物网络的系统分析算法由于计算量大,要么占用大量内存,要么耗费大量时间。以化合物-蛋白质相互作用网络为例。《PubChem》中有超过9000万种化合物,每种化合物都被描述为高维向量(例如881 d PubChem指纹或4860 d Klekota-Roth指纹)。同时,如果采用K-mer描述符,则可以将蛋白质表征为20k维向量。然而,涉及到密集的矩阵操作(如矩阵分解、逆和张量积),目前的算法不能直接应用于预测大规模的化合物-蛋白质相互作用。例如,具有复杂度O(n3),奇异值分解(SVD)在Windows 7(64位)下使用Intel Corei7-4700MQ (2.40G)和GeForce GTX 765M在MATLAB 2013b(64位)中运行6,000□6,000矩阵。SVD在仅使用CPU、CPU 4 worker和CPU + GPU时分别花费81.9秒、77.9秒和51.4秒。因此,迫切需要将它们转化为支持加速器的并行算法或开发新的加速器来加速生物网络中的知识挖掘。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The role of high performance, grid and cloud computing in high-throughput sequencing A novel algorithm for identifying essential proteins by integrating subcellular localization CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features Inferring Social Influence of anti-Tobacco mass media campaigns Emotion recognition from multi-channel EEG data through Convolutional Recurrent Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1