{"title":"The need of accelerators in analyzing biological networks","authors":"Jian-Yu Shi","doi":"10.1109/BIBM.2016.7822733","DOIUrl":null,"url":null,"abstract":"As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the development of high-throughput techniques in both biology and its related disciplines (chemistry or medicine), the huge number of biological entries are available. The discovered relationship between them (e.g. interactions or associations) reveals important biological facts, which are never found in individual-based biological experiments. A biological network is an appropriate tool to systematically analyze and uncover such facts. The relationship between biological molecules is usually modeled as a monopartite network, such as protein-protein interactions, while that between biological molecules and other objects is modeled as a bipartite network, such as chemical compound-protein interactions, gene-disease associations and ncRNA-disease associations. A biological network may contain a large number of nodes, of which each owns many heterogeneous attributes, including binary, real-valued and semantic forms. Current algorithms for systematical analysis based on large-scale biological networks have always a need of either using much memory or taking much time, because of their high computational complexity. Take the compound-protein interaction network as an example. Over 90 million compounds are available in PubChem and each compound is characterized as a high-dimensional vector (e.g. 881-d PubChem fingerprint or 4860-d Klekota-Roth fingerprint). Meanwhile, a protein can be characterized as a 20K-demensional vector if the K-mer descriptor is adopted. However, involving intensive matrix manipulation (e.g. matrix factorization, inverse and tensor product), current algorithms cannot be directly applied to predict compound-protein interactions on a large scale. For example, having the complexity O(n3), singular value decomposition (SVD) runs for a 6,000□6,000 matrix in MATLAB 2013b (64 bits) under Windows 7(64bits) with Intel Corei7-4700MQ (2.40G) and GeForce GTX 765M. SVD spends 81.9, 77.9, and 51.4 seconds when using CPU only, CPU with four workers and CPU plus GPU respectively. Consequently, there is an urge need to turn them into accelerator-enabled parallel algorithms or develop novel accelerators to speed up the knowledge-mining in biological networks.