Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.16
Dzmitry Razmyslovich, G. Marcus, M. Gipp, M. Zapatka, Andreas Szillus
In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences (in the order of millions of nucleotides), a requirement for the target application in cancer research. Performance compares favorably against CPU, being on the order of 9 - 130 times faster, 3 times faster than the CUDA-enabled CUDASW++v2.0 for medium sequences or larger. Additionally, it is on par with Farrar's performance, but with less constraints in sequence length.
{"title":"Implementation of Smith-Waterman Algorithm in OpenCL for GPUs","authors":"Dzmitry Razmyslovich, G. Marcus, M. Gipp, M. Zapatka, Andreas Szillus","doi":"10.1109/PDMC-HIBI.2010.16","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.16","url":null,"abstract":"In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences (in the order of millions of nucleotides), a requirement for the target application in cancer research. Performance compares favorably against CPU, being on the order of 9 - 130 times faster, 3 times faster than the CUDA-enabled CUDASW++v2.0 for medium sequences or larger. Additionally, it is on par with Farrar's performance, but with less constraints in sequence length.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"21 1","pages":"48-56"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83577027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.12
Khaled Hamidouche, Alexandre Borghi, Pierre Estérie, J. Falcou, Sylvain Peyronnet
Approximate probabilistic model checking, and more generally sampling based model checking methods, proceed by drawing independent executions of a given model and by checking a temporal formula on these executions. In theory, these methods can be easily massively parallelized, but in practice one has to consider, for this purpose, important aspects such as the communication paradigm, the physical architecture of the machine, etc. Moreover, being able to develop multiple implementations of this algorithm on architectures as different as a cluster or many-cores requires various levels of expertise that may be problematic to gather. In this paper we propose to investigate the runtime behavior of approximate probabilistic model checking on various state of the art parallel machines - clusters, SMP, hybrid SMP clusters and the Cell processor - using a high-level parallel programming tool based on the Bulk Synchronous Parallelism paradigm to quickly instantiate model checking problems over a large variety of parallel architectures. Our conclusion assesses the relative efficiency of these architectures with respect to the algorithm classes and promotes guidelines for further work on parallel APMC implementation.
{"title":"Three High Performance Architectures in the Parallel APMC Boat","authors":"Khaled Hamidouche, Alexandre Borghi, Pierre Estérie, J. Falcou, Sylvain Peyronnet","doi":"10.1109/PDMC-HIBI.2010.12","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.12","url":null,"abstract":"Approximate probabilistic model checking, and more generally sampling based model checking methods, proceed by drawing independent executions of a given model and by checking a temporal formula on these executions. In theory, these methods can be easily massively parallelized, but in practice one has to consider, for this purpose, important aspects such as the communication paradigm, the physical architecture of the machine, etc. Moreover, being able to develop multiple implementations of this algorithm on architectures as different as a cluster or many-cores requires various levels of expertise that may be problematic to gather. In this paper we propose to investigate the runtime behavior of approximate probabilistic model checking on various state of the art parallel machines - clusters, SMP, hybrid SMP clusters and the Cell processor - using a high-level parallel programming tool based on the Bulk Synchronous Parallelism paradigm to quickly instantiate model checking problems over a large variety of parallel architectures. Our conclusion assesses the relative efficiency of these architectures with respect to the algorithm classes and promotes guidelines for further work on parallel APMC implementation.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"19 1","pages":"20-27"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78272277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.13
B. Bingham, Jesse D. Bingham, F. M. D. Paula, John Erickson, Gaurav Singh, Mark Reitblatt
We present Preach, an industrial strength distributed explicit state model checker based on Murphi. The goal of this project was to develop a reliable, easy to maintain, scalable model checker that was compatible with the Murphi specification language. Preach is implemented in the concurrent functional language Erlang, chosen for its parallel programming elegance. We use the original Murphifront-end to parse the model description, a layer written in Erlang to handle the communication aspects of the algorithm, and also use Murphias a back-end for state expansion and to store the hash table. This allowed a clean and simple implementation, with the core parallel algorithms written in under 1000 lines of code. This paper describes the Preach implementation including the various features that are necessary for the large models we target. We have used Preach to model check an industrial cache coherence protocol with approximately 30 billion states. To our knowledge, this is the largest number published for a distributed explicit state model checker. Preach has been released to the public under an open source BSD license.
{"title":"Industrial Strength Distributed Explicit State Model Checking","authors":"B. Bingham, Jesse D. Bingham, F. M. D. Paula, John Erickson, Gaurav Singh, Mark Reitblatt","doi":"10.1109/PDMC-HIBI.2010.13","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.13","url":null,"abstract":"We present Preach, an industrial strength distributed explicit state model checker based on Murphi. The goal of this project was to develop a reliable, easy to maintain, scalable model checker that was compatible with the Murphi specification language. Preach is implemented in the concurrent functional language Erlang, chosen for its parallel programming elegance. We use the original Murphifront-end to parse the model description, a layer written in Erlang to handle the communication aspects of the algorithm, and also use Murphias a back-end for state expansion and to store the hash table. This allowed a clean and simple implementation, with the core parallel algorithms written in under 1000 lines of code. This paper describes the Preach implementation including the various features that are necessary for the large models we target. We have used Preach to model check an industrial cache coherence protocol with approximately 30 billion states. To our knowledge, this is the largest number published for a distributed explicit state model checker. Preach has been released to the public under an open source BSD license.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"53 1","pages":"28-36"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86307375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.10
Rodrigo T. Saad, S. Zilio, B. Berthomieu
Verification via model-checking is a very demanding activity in terms of computational resources. While there are still gains to be expected from algorithmic improvements, it is necessary to take advantage of the advances in computer hardware to tackle bigger models. Recent improvements in his area take the form of multiprocessor and multicore architectures with access to large memory space. We address the problem of generating the state space of finite-state transition systems, often a preliminary step for model-checking. We propose a novel algorithm for enumerative state space construction targeted at shared memory systems. Our approach relies on the use of two data structures: a shared Bloom filter to coordinate the state space exploration distributed among several processors and local dictionaries to store the states. The goal is to limit synchronization overheads and to increase the locality of memory access without having to make constant use of locks to ensure data integrity. Bloom filters have already been applied for the probabilistic verification of systems, they are compact data structures used to encode sets, but in a way that false positives are possible, while false negatives are not. We circumvent this limitation and propose an original multiphase algorithm to perform exhaustive, deterministic, state space generations. We assess the performance of our algorithm on different benchmarks and compare our results with the solution proposed by Inggs and Barringer.
{"title":"A General Lock-Free Algorithm for Parallel State Space Construction","authors":"Rodrigo T. Saad, S. Zilio, B. Berthomieu","doi":"10.1109/PDMC-HIBI.2010.10","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.10","url":null,"abstract":"Verification via model-checking is a very demanding activity in terms of computational resources. While there are still gains to be expected from algorithmic improvements, it is necessary to take advantage of the advances in computer hardware to tackle bigger models. Recent improvements in his area take the form of multiprocessor and multicore architectures with access to large memory space. We address the problem of generating the state space of finite-state transition systems, often a preliminary step for model-checking. We propose a novel algorithm for enumerative state space construction targeted at shared memory systems. Our approach relies on the use of two data structures: a shared Bloom filter to coordinate the state space exploration distributed among several processors and local dictionaries to store the states. The goal is to limit synchronization overheads and to increase the locality of memory access without having to make constant use of locks to ensure data integrity. Bloom filters have already been applied for the probabilistic verification of systems, they are compact data structures used to encode sets, but in a way that false positives are possible, while false negatives are not. We circumvent this limitation and propose an original multiphase algorithm to perform exhaustive, deterministic, state space generations. We assess the performance of our algorithm on different benchmarks and compare our results with the solution proposed by Inggs and Barringer.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"3 1","pages":"8-16"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74463882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.14
F. Gava, Michaël Guedj, F. Pommereau
This paper presents a Bulk-Synchronous Parallel (BSP) algorithm to compute the discrete state space of structured models of security protocols. The BSP model of parallelism avoids concurrency related problems (mainly deadlocks and non-determinism) and allows us to design an efficient algorithm that is at the same time simple to express. A prototype implementation has been developed, allowing to run benchmarks showing the benefits of our algorithm.
{"title":"A BSP Algorithm for the State Space Construction of Security Protocols","authors":"F. Gava, Michaël Guedj, F. Pommereau","doi":"10.1109/PDMC-HIBI.2010.14","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.14","url":null,"abstract":"This paper presents a Bulk-Synchronous Parallel (BSP) algorithm to compute the discrete state space of structured models of security protocols. The BSP model of parallelism avoids concurrency related problems (mainly deadlocks and non-determinism) and allows us to design an efficient algorithm that is at the same time simple to express. A prototype implementation has been developed, allowing to run benchmarks showing the benefits of our algorithm.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"21 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78333407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.17
J. Himmelspach, Roland Ewald, Stefan Leye, A. Uhrmacher
Current and upcoming architectures of desktop and high performance computers offer increasing means for parallel execution. Since the computational demands induced by ever more realistic models increase steadily, this trend is of growing importance for systems biology. Simulations of these models may involve the consideration of multiple parameter combinations, their replications, data collection, and data analysis - all of which offer different opportunities for parallelization. We present a brief theoretical analysis of these opportunities in order to show their potential impact on the overall computation time. The benefits of using more than one opportunity for parallelization are illustrated by a set of benchmark experiments, which furthermore show that parallelizability should be exploited in a flexible manner to achieve speedup.
{"title":"Enhancing the Scalability of Simulations by Embracing Multiple Levels of Parallelization","authors":"J. Himmelspach, Roland Ewald, Stefan Leye, A. Uhrmacher","doi":"10.1109/PDMC-HIBI.2010.17","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.17","url":null,"abstract":"Current and upcoming architectures of desktop and high performance computers offer increasing means for parallel execution. Since the computational demands induced by ever more realistic models increase steadily, this trend is of growing importance for systems biology. Simulations of these models may involve the consideration of multiple parameter combinations, their replications, data collection, and data analysis - all of which offer different opportunities for parallelization. We present a brief theoretical analysis of these opportunities in order to show their potential impact on the overall computation time. The benefits of using more than one opportunity for parallelization are illustrated by a set of benchmark experiments, which furthermore show that parallelizability should be exploited in a flexible manner to achieve speedup.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"311 1","pages":"57-66"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79989809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.22
M. Forlin, T. Mazza, D. Prandi
Usually researchers require many experiments to verify how biological systems respond to stimuli. However, the high cost of reagents and facilities as well as the time required to carry out experiments are sometimes the main cause of failure. In this regards, Information Technology offers a valuable help: modeling and simulation are mathematical tools to execute virtual experiments on computing devices. Through synthetic experimentation, researchers can sample the parameters space of a biological system and obtain hundreds of potential results, ready to be reused to design and conduct more targeted wet-lab experiments. A non negligible achievement of this is the enormous saving of resources and time. In this paper, we present a plug-in-based software prototype that combines high performance computing and statistics. Our framework relies on parallel computing to run large numbers of synthetic experiments. Multivariate analysis is then used to interpret and validate results. The software is tested on two well-known oscillatory models: Predator-Prey (Lotka-Volterra) and Repressilator.
{"title":"Predicting the Effects of Parameters Changes in Stochastic Models through Parallel Synthetic Experiments and Multivariate Analysis","authors":"M. Forlin, T. Mazza, D. Prandi","doi":"10.1109/PDMC-HIBI.2010.22","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.22","url":null,"abstract":"Usually researchers require many experiments to verify how biological systems respond to stimuli. However, the high cost of reagents and facilities as well as the time required to carry out experiments are sometimes the main cause of failure. In this regards, Information Technology offers a valuable help: modeling and simulation are mathematical tools to execute virtual experiments on computing devices. Through synthetic experimentation, researchers can sample the parameters space of a biological system and obtain hundreds of potential results, ready to be reused to design and conduct more targeted wet-lab experiments. A non negligible achievement of this is the enormous saving of resources and time. In this paper, we present a plug-in-based software prototype that combines high performance computing and statistics. Our framework relies on parallel computing to run large numbers of synthetic experiments. Multivariate analysis is then used to interpret and validate results. The software is tested on two well-known oscillatory models: Predator-Prey (Lotka-Volterra) and Repressilator.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"8 1","pages":"105-115"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80061729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.21
J. Barnat, L. Brim, David Šafránek, Martin Vejnar
In this paper, a novel scalable method for scanning of kinetic parameter values in continuous (ODE) models of biological networks is provided. The presented method is property-driven, in particular, parameter values are scanned in order to satisfy a given dynamic property. The key result – the parameter scanning method – is based on an innovative adaptation of parallel LTL model checking for the framework of parameterized Kripke structures (PKS). First, we introduce the notion of PKS and we identify the parameter scanning and robustness analysis problems in this framework. Second, we present the algorithms for parallel LTL model checking on PKSs. Finally, the evaluation is provided on case studies of mammalian cell-cycle genetic regulatory network model and E. Coli ammonium transport model.
{"title":"Parameter Scanning by Parallel Model Checking with Applications in Systems Biology","authors":"J. Barnat, L. Brim, David Šafránek, Martin Vejnar","doi":"10.1109/PDMC-HIBI.2010.21","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.21","url":null,"abstract":"In this paper, a novel scalable method for scanning of kinetic parameter values in continuous (ODE) models of biological networks is provided. The presented method is property-driven, in particular, parameter values are scanned in order to satisfy a given dynamic property. The key result – the parameter scanning method – is based on an innovative adaptation of parallel LTL model checking for the framework of parameterized Kripke structures (PKS). First, we introduce the notion of PKS and we identify the parameter scanning and robustness analysis problems in this framework. Second, we present the algorithms for parallel LTL model checking on PKSs. Finally, the evaluation is provided on case studies of mammalian cell-cycle genetic regulatory network model and E. Coli ammonium transport model.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"1 1","pages":"95-104"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82185806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.18
Lorenzo Dematté
Space is a very important aspect in the simulation of biochemical models, recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and large models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localised fluctuations, transportation phenomena and diffusion. A common drawback of spatial models lies in their complexity: models could become very large, and their simulation could be time consuming, especially if we want to capture the systems behaviour in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to move from sequential to parallel simulation algorithms. In this paper we analyse Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of GPUs. The implementation offers good speedups (up to 130x) and real time, high quality graphics output at almost no performance penalties.
{"title":"Parallel Particle-Based Reaction Diffusion: A GPU Implementation","authors":"Lorenzo Dematté","doi":"10.1109/PDMC-HIBI.2010.18","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.18","url":null,"abstract":"Space is a very important aspect in the simulation of biochemical models, recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and large models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localised fluctuations, transportation phenomena and diffusion. A common drawback of spatial models lies in their complexity: models could become very large, and their simulation could be time consuming, especially if we want to capture the systems behaviour in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to move from sequential to parallel simulation algorithms. In this paper we analyse Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of GPUs. The implementation offers good speedups (up to 130x) and real time, high quality graphics output at almost no performance penalties.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"1 1","pages":"67-77"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86194958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-30DOI: 10.1109/PDMC-HIBI.2010.23
A. Bustamam, K. Burrage, N. Hamilton
Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.
{"title":"Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing","authors":"A. Bustamam, K. Burrage, N. Hamilton","doi":"10.1109/PDMC-HIBI.2010.23","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.23","url":null,"abstract":"Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"58 1","pages":"116-125"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80159883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}