Infinity最新文献

英文中文

Implementation of Smith-Waterman Algorithm in OpenCL for GPUs 基于gpu的Smith-Waterman算法的OpenCL实现

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.16

Dzmitry Razmyslovich, G. Marcus, M. Gipp, M. Zapatka, Andreas Szillus

In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences (in the order of millions of nucleotides), a requirement for the target application in cancer research. Performance compares favorably against CPU, being on the order of 9 - 130 times faster, 3 times faster than the CUDA-enabled CUDASW++v2.0 for medium sequences or larger. Additionally, it is on par with Farrar's performance, but with less constraints in sequence length.

本文给出了Smith-Waterman算法的一个实现。该实现是在OpenCL中完成的，目标是高端gpu。该实现能够计算引用序列和查询序列之间的相似度索引。该实现针对序列比对路径的计算进行了设计。此外，它能够处理非常长的参考序列(数以百万计的核苷酸)，这是癌症研究中目标应用的要求。性能优于CPU，速度快9 - 130倍，对于中等或更大的序列，比cuda支持的cudasw++ v2.0快3倍。此外，它与Farrar的性能相当，但对序列长度的限制较少。

引用次数: 17

Three High Performance Architectures in the Parallel APMC Boat 并行APMC船中的三种高性能架构

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.12

Khaled Hamidouche, Alexandre Borghi, Pierre Estérie, J. Falcou, Sylvain Peyronnet

Approximate probabilistic model checking, and more generally sampling based model checking methods, proceed by drawing independent executions of a given model and by checking a temporal formula on these executions. In theory, these methods can be easily massively parallelized, but in practice one has to consider, for this purpose, important aspects such as the communication paradigm, the physical architecture of the machine, etc. Moreover, being able to develop multiple implementations of this algorithm on architectures as different as a cluster or many-cores requires various levels of expertise that may be problematic to gather. In this paper we propose to investigate the runtime behavior of approximate probabilistic model checking on various state of the art parallel machines - clusters, SMP, hybrid SMP clusters and the Cell processor - using a high-level parallel programming tool based on the Bulk Synchronous Parallelism paradigm to quickly instantiate model checking problems over a large variety of parallel architectures. Our conclusion assesses the relative efficiency of these architectures with respect to the algorithm classes and promotes guidelines for further work on parallel APMC implementation.

近似概率模型检查，以及更普遍的基于抽样的模型检查方法，通过绘制给定模型的独立执行并检查这些执行的时间公式来进行。理论上，这些方法可以很容易地大规模并行化，但在实践中，为此目的，必须考虑诸如通信范例、机器的物理体系结构等重要方面。此外，能够在集群或多核等不同架构上开发该算法的多个实现需要不同级别的专业知识，而这些专业知识的收集可能会有问题。在本文中，我们建议研究近似概率模型检查在各种最先进的并行机器上的运行时行为-集群，SMP，混合SMP集群和Cell处理器-使用基于批量同步并行范式的高级并行编程工具来快速实例化各种并行架构上的模型检查问题。我们的结论评估了这些架构相对于算法类的相对效率，并为并行APMC实现的进一步工作提供了指导方针。

{"title":"Three High Performance Architectures in the Parallel APMC Boat","authors":"Khaled Hamidouche, Alexandre Borghi, Pierre Estérie, J. Falcou, Sylvain Peyronnet","doi":"10.1109/PDMC-HIBI.2010.12","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.12","url":null,"abstract":"Approximate probabilistic model checking, and more generally sampling based model checking methods, proceed by drawing independent executions of a given model and by checking a temporal formula on these executions. In theory, these methods can be easily massively parallelized, but in practice one has to consider, for this purpose, important aspects such as the communication paradigm, the physical architecture of the machine, etc. Moreover, being able to develop multiple implementations of this algorithm on architectures as different as a cluster or many-cores requires various levels of expertise that may be problematic to gather. In this paper we propose to investigate the runtime behavior of approximate probabilistic model checking on various state of the art parallel machines - clusters, SMP, hybrid SMP clusters and the Cell processor - using a high-level parallel programming tool based on the Bulk Synchronous Parallelism paradigm to quickly instantiate model checking problems over a large variety of parallel architectures. Our conclusion assesses the relative efficiency of these architectures with respect to the algorithm classes and promotes guidelines for further work on parallel APMC implementation.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"19 1","pages":"20-27"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78272277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Industrial Strength Distributed Explicit State Model Checking 工业强度分布式显式状态模型检查

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.13

B. Bingham, Jesse D. Bingham, F. M. D. Paula, John Erickson, Gaurav Singh, Mark Reitblatt

We present Preach, an industrial strength distributed explicit state model checker based on Murphi. The goal of this project was to develop a reliable, easy to maintain, scalable model checker that was compatible with the Murphi specification language. Preach is implemented in the concurrent functional language Erlang, chosen for its parallel programming elegance. We use the original Murphifront-end to parse the model description, a layer written in Erlang to handle the communication aspects of the algorithm, and also use Murphias a back-end for state expansion and to store the hash table. This allowed a clean and simple implementation, with the core parallel algorithms written in under 1000 lines of code. This paper describes the Preach implementation including the various features that are necessary for the large models we target. We have used Preach to model check an industrial cache coherence protocol with approximately 30 billion states. To our knowledge, this is the largest number published for a distributed explicit state model checker. Preach has been released to the public under an open source BSD license.

提出了基于Murphi的工业强度分布式显式状态模型检查器传道。这个项目的目标是开发一个可靠的、易于维护的、可伸缩的、与Murphi规范语言兼容的模型检查器。宣讲是用并发函数语言Erlang实现的，选择Erlang是因为其并行编程的优雅性。我们使用原始的murphi前端来解析模型描述，一个用Erlang编写的层来处理算法的通信方面，并且还使用murphi前端作为状态扩展和存储哈希表的后端。这允许一个干净和简单的实现，核心并行算法编写在1000行以下的代码。本文描述了包括我们所瞄准的大型模型所需的各种特性在内的传道实现。我们已经使用说教来模型检查一个工业缓存一致性协议大约有300亿个状态。据我们所知，这是分布式显式状态模型检查器发布的最大数量。在开源BSD许可下，布道已经向公众发布。

引用次数: 53

A General Lock-Free Algorithm for Parallel State Space Construction 并行状态空间构造的一种通用无锁算法

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.10

Rodrigo T. Saad, S. Zilio, B. Berthomieu

Verification via model-checking is a very demanding activity in terms of computational resources. While there are still gains to be expected from algorithmic improvements, it is necessary to take advantage of the advances in computer hardware to tackle bigger models. Recent improvements in his area take the form of multiprocessor and multicore architectures with access to large memory space. We address the problem of generating the state space of finite-state transition systems, often a preliminary step for model-checking. We propose a novel algorithm for enumerative state space construction targeted at shared memory systems. Our approach relies on the use of two data structures: a shared Bloom filter to coordinate the state space exploration distributed among several processors and local dictionaries to store the states. The goal is to limit synchronization overheads and to increase the locality of memory access without having to make constant use of locks to ensure data integrity. Bloom filters have already been applied for the probabilistic verification of systems, they are compact data structures used to encode sets, but in a way that false positives are possible, while false negatives are not. We circumvent this limitation and propose an original multiphase algorithm to perform exhaustive, deterministic, state space generations. We assess the performance of our algorithm on different benchmarks and compare our results with the solution proposed by Inggs and Barringer.

就计算资源而言，通过模型检查进行验证是一项非常苛刻的活动。虽然算法改进仍有好处，但有必要利用计算机硬件的进步来处理更大的模型。该领域最近的改进采用了多处理器和多核架构的形式，可以访问大内存空间。我们解决了生成有限状态转换系统的状态空间的问题，这通常是模型检查的初步步骤。针对共享存储系统，提出了一种新的枚举状态空间构建算法。我们的方法依赖于两种数据结构的使用:一个共享的Bloom过滤器，用于协调分布在多个处理器之间的状态空间探索;一个本地字典，用于存储状态。目标是限制同步开销并增加内存访问的局部性，而不必经常使用锁来确保数据完整性。布隆过滤器已经应用于系统的概率验证，它们是用于编码集合的紧凑数据结构，但在某种程度上可能出现假阳性，而假阴性则不会。我们规避了这一限制，并提出了一种原始的多阶段算法来执行穷举的、确定性的状态空间生成。我们在不同的基准测试中评估了算法的性能，并将结果与Inggs和Barringer提出的解决方案进行了比较。

{"title":"A General Lock-Free Algorithm for Parallel State Space Construction","authors":"Rodrigo T. Saad, S. Zilio, B. Berthomieu","doi":"10.1109/PDMC-HIBI.2010.10","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.10","url":null,"abstract":"Verification via model-checking is a very demanding activity in terms of computational resources. While there are still gains to be expected from algorithmic improvements, it is necessary to take advantage of the advances in computer hardware to tackle bigger models. Recent improvements in his area take the form of multiprocessor and multicore architectures with access to large memory space. We address the problem of generating the state space of finite-state transition systems, often a preliminary step for model-checking. We propose a novel algorithm for enumerative state space construction targeted at shared memory systems. Our approach relies on the use of two data structures: a shared Bloom filter to coordinate the state space exploration distributed among several processors and local dictionaries to store the states. The goal is to limit synchronization overheads and to increase the locality of memory access without having to make constant use of locks to ensure data integrity. Bloom filters have already been applied for the probabilistic verification of systems, they are compact data structures used to encode sets, but in a way that false positives are possible, while false negatives are not. We circumvent this limitation and propose an original multiphase algorithm to perform exhaustive, deterministic, state space generations. We assess the performance of our algorithm on different benchmarks and compare our results with the solution proposed by Inggs and Barringer.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"3 1","pages":"8-16"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74463882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

A BSP Algorithm for the State Space Construction of Security Protocols 安全协议状态空间构造的BSP算法

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.14

F. Gava, Michaël Guedj, F. Pommereau

This paper presents a Bulk-Synchronous Parallel (BSP) algorithm to compute the discrete state space of structured models of security protocols. The BSP model of parallelism avoids concurrency related problems (mainly deadlocks and non-determinism) and allows us to design an efficient algorithm that is at the same time simple to express. A prototype implementation has been developed, allowing to run benchmarks showing the benefits of our algorithm.

提出了一种计算安全协议结构模型离散状态空间的批量同步并行(BSP)算法。并行性的BSP模型避免了并发相关的问题(主要是死锁和非确定性)，并允许我们设计一个高效的算法，同时又易于表达。已经开发了一个原型实现，允许运行基准测试，以显示我们算法的优点。

引用次数: 13

Enhancing the Scalability of Simulations by Embracing Multiple Levels of Parallelization 通过采用多层并行化来增强仿真的可扩展性

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.17

J. Himmelspach, Roland Ewald, Stefan Leye, A. Uhrmacher

Current and upcoming architectures of desktop and high performance computers offer increasing means for parallel execution. Since the computational demands induced by ever more realistic models increase steadily, this trend is of growing importance for systems biology. Simulations of these models may involve the consideration of multiple parameter combinations, their replications, data collection, and data analysis - all of which offer different opportunities for parallelization. We present a brief theoretical analysis of these opportunities in order to show their potential impact on the overall computation time. The benefits of using more than one opportunity for parallelization are illustrated by a set of benchmark experiments, which furthermore show that parallelizability should be exploited in a flexible manner to achieve speedup.

当前和即将到来的桌面和高性能计算机体系结构为并行执行提供了越来越多的手段。由于越来越现实的模型引起的计算需求稳步增长，这一趋势对系统生物学越来越重要。这些模型的模拟可能涉及考虑多个参数组合、它们的复制、数据收集和数据分析——所有这些都为并行化提供了不同的机会。我们对这些机会进行了简要的理论分析，以显示它们对总体计算时间的潜在影响。通过一组基准实验说明了使用多个并行机会的好处，这些实验进一步表明，应该以灵活的方式利用并行性来实现加速。

引用次数: 13

Predicting the Effects of Parameters Changes in Stochastic Models through Parallel Synthetic Experiments and Multivariate Analysis 通过平行综合实验和多元分析预测随机模型参数变化的影响

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.22

M. Forlin, T. Mazza, D. Prandi

Usually researchers require many experiments to verify how biological systems respond to stimuli. However, the high cost of reagents and facilities as well as the time required to carry out experiments are sometimes the main cause of failure. In this regards, Information Technology offers a valuable help: modeling and simulation are mathematical tools to execute virtual experiments on computing devices. Through synthetic experimentation, researchers can sample the parameters space of a biological system and obtain hundreds of potential results, ready to be reused to design and conduct more targeted wet-lab experiments. A non negligible achievement of this is the enormous saving of resources and time. In this paper, we present a plug-in-based software prototype that combines high performance computing and statistics. Our framework relies on parallel computing to run large numbers of synthetic experiments. Multivariate analysis is then used to interpret and validate results. The software is tested on two well-known oscillatory models: Predator-Prey (Lotka-Volterra) and Repressilator.

通常研究人员需要许多实验来验证生物系统对刺激的反应。然而，试剂和设备的高成本以及进行实验所需的时间有时是失败的主要原因。在这方面，信息技术提供了有价值的帮助:建模和仿真是在计算设备上执行虚拟实验的数学工具。通过合成实验，研究人员可以对生物系统的参数空间进行采样，并获得数百种潜在结果，准备重复使用以设计和进行更有针对性的湿实验室实验。这样做的一个不可忽视的成就是大量节省了资源和时间。在本文中，我们提出了一个结合高性能计算和统计的基于插件的软件原型。我们的框架依赖于并行计算来运行大量的合成实验。然后使用多变量分析来解释和验证结果。该软件在两个著名的振荡模型上进行了测试:捕食者-猎物(Lotka-Volterra)和Repressilator。

引用次数: 5

Parameter Scanning by Parallel Model Checking with Applications in Systems Biology 参数扫描并行模型检验及其在系统生物学中的应用

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.21

J. Barnat, L. Brim, David Šafránek, Martin Vejnar

In this paper, a novel scalable method for scanning of kinetic parameter values in continuous (ODE) models of biological networks is provided. The presented method is property-driven, in particular, parameter values are scanned in order to satisfy a given dynamic property. The key result – the parameter scanning method – is based on an innovative adaptation of parallel LTL model checking for the framework of parameterized Kripke structures (PKS). First, we introduce the notion of PKS and we identify the parameter scanning and robustness analysis problems in this framework. Second, we present the algorithms for parallel LTL model checking on PKSs. Finally, the evaluation is provided on case studies of mammalian cell-cycle genetic regulatory network model and E. Coli ammonium transport model.

本文提出了一种新的可扩展的生物网络连续(ODE)模型动力学参数值扫描方法。该方法是属性驱动的，通过扫描参数值来满足给定的动态属性。关键成果-参数扫描方法-是基于对参数化Kripke结构(PKS)框架的并行LTL模型检查的创新适应。首先，我们引入了PKS的概念，并识别了该框架中的参数扫描和鲁棒性分析问题。其次，我们提出了PKSs的并行LTL模型检查算法。最后，以哺乳动物细胞周期遗传调控网络模型和大肠杆菌铵转运模型为例进行了评价。

引用次数: 12

Parallel Particle-Based Reaction Diffusion: A GPU Implementation 基于平行粒子的反应扩散:一个GPU实现

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.18

Lorenzo Dematté

Space is a very important aspect in the simulation of biochemical models, recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and large models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localised fluctuations, transportation phenomena and diffusion. A common drawback of spatial models lies in their complexity: models could become very large, and their simulation could be time consuming, especially if we want to capture the systems behaviour in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to move from sequential to parallel simulation algorithms. In this paper we analyse Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of GPUs. The implementation offers good speedups (up to 130x) and real time, high quality graphics output at almost no performance penalties.

空间是生化模型仿真中非常重要的一个方面，近年来，对能够适应空间的仿真算法的需求越来越迫切。复杂和大型的生化系统模型需要处理单个分子和粒子的运动，同时考虑到局部波动、运输现象和扩散。空间模型的一个共同缺点在于它们的复杂性:模型可能变得非常大，并且它们的模拟可能非常耗时，特别是如果我们想要使用随机方法结合高空间分辨率以可靠的方式捕获系统行为。为了实现系统生物学所做的承诺，能够从整体上理解一个系统，我们需要从顺序模拟算法转向并行模拟算法。在本文中，我们分析了Smoldyn，一种广泛传播的具有空间分辨率和单分子细节的化学反应随机模拟算法，并提出了一种利用gpu并行性的替代创新实现。该实现提供了良好的加速(高达130倍)和实时，高质量的图形输出，几乎没有性能损失。

{"title":"Parallel Particle-Based Reaction Diffusion: A GPU Implementation","authors":"Lorenzo Dematté","doi":"10.1109/PDMC-HIBI.2010.18","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.18","url":null,"abstract":"Space is a very important aspect in the simulation of biochemical models, recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and large models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localised fluctuations, transportation phenomena and diffusion. A common drawback of spatial models lies in their complexity: models could become very large, and their simulation could be time consuming, especially if we want to capture the systems behaviour in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to move from sequential to parallel simulation algorithms. In this paper we analyse Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of GPUs. The implementation offers good speedups (up to 130x) and real time, high quality graphics output at almost no performance penalties.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"1 1","pages":"67-77"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86194958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing 基于大规模并行图形处理单元计算的生物信息学快速并行马尔可夫聚类

Infinity

Pub Date : 2010-09-30 DOI: 10.1109/PDMC-HIBI.2010.23

A. Bustamam, K. Burrage, N. Hamilton

Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.

马尔可夫聚类正在成为生物信息学中确定网络中聚类的关键算法。例如，聚类蛋白质相互作用网络有助于发现与癌症等疾病有关的基因。然而，随着快速测序和其他技术在生物网络上产生大量数据，性能和可扩展性问题正在成为应用的关键限制因素。与此同时，图形处理(GPU)计算，在GPU卡上使用大规模并行计算环境，正在成为一种非常强大、高效和低成本的选择，以实现比CPU方法更大的性能提升。介绍了一种基于GPU大规模并行计算的快速马尔可夫聚类算法。我们使用计算统一设备架构(CUDA)允许GPU执行并行稀疏矩阵-矩阵计算和并行稀疏马尔可夫矩阵归一化，这是聚类算法的核心。优化CUDA马尔可夫聚类(CUDAMCL)的关键是利用ELLACK-R稀疏数据格式进行有效和细粒度的大规模并行处理，以应对生物信息学应用中交互网络数据集的稀疏特性。CUDA还允许我们有效地使用GPU上的片上内存，以降低延迟时间，从而规避其他并行计算环境中的主要问题，例如消息传递接口(MPI)。在这里，我们描述了GPU算法及其在几个现实世界问题和人工数据集上的应用。我们发现导致GPU方法性能变化的主要因素是网络的相对稀疏性。在已发布的(相对稀疏的)标准BIOGRID蛋白质相互作用网络(5156和23175个节点)上，将GPU计算时间与现代四核CPU进行比较，得到的速度因子分别为4倍和9倍。在人类蛋白质参考数据库上，采用GPU算法对19599个蛋白质的聚类速度提高了7倍。然而，在人工生成的具有1600到4800个节点的密集连接网络上，很容易获得40到120倍的速度提升。结果表明，在所有情况下，GPU实现都比在CPU上运行的原始MCL快得多。这些方法允许在现成的台式计算机上进行大规模并行计算，而这些计算以前只能在超级计算架构上实现，并且有可能显著改变生物信息学家和生物学家计算和与数据交互的方式。

{"title":"Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing","authors":"A. Bustamam, K. Burrage, N. Hamilton","doi":"10.1109/PDMC-HIBI.2010.23","DOIUrl":"https://doi.org/10.1109/PDMC-HIBI.2010.23","url":null,"abstract":"Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factorin applications. Meanwhile, Graphics Processing (GPU)computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilizing ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published(relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.","PeriodicalId":31175,"journal":{"name":"Infinity","volume":"58 1","pages":"116-125"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80159883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Infinity

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀