基于MapReduce的人类基因组平行后缀树构建

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI:10.1109/PADSW.2014.7097867

Umesh Chandra Satish, Praveenkumar Kondikoppa, Seung-Jong Park, Manish Patil, R. Shah

{"title":"基于MapReduce的人类基因组平行后缀树构建","authors":"Umesh Chandra Satish, Praveenkumar Kondikoppa, Seung-Jong Park, Manish Patil, R. Shah","doi":"10.1109/PADSW.2014.7097867","DOIUrl":null,"url":null,"abstract":"Genome indexing is the basis for many bioinformatics applications. Read mapping(sequence alignment) is one such application where the goal is to align millions of short reads against reference genome. Several tools are available for read mapping which rely on different indexing techniques to expedite the alignment process. However, many of these contemporary alignment programs are sequential, memory intensive and cannot be easily scaled for larger genomes. Suffix tree is one of the most widely used data structures for indexing strings (genomes). Building a scalable suffix-tree based tool is particularly challenging due to the difficulties involved in parallel construction of the suffix tree. Several suffix tree construction techniques have been proposed till date with focus on space-time tradeoff. Most of these existing works address the construction issue for uniprocessor and cannot be easily extended to utilize modern multi-processor systems. In this paper we investigate and propose a MapReduce based parallel construction of suffix tree. We demonstrate the performance of the algorithm over commodity cluster using up to 32 nodes each having 8GB of primary memory.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"79 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"MapReduce based parallel suffix tree construction for human genome\",\"authors\":\"Umesh Chandra Satish, Praveenkumar Kondikoppa, Seung-Jong Park, Manish Patil, R. Shah\",\"doi\":\"10.1109/PADSW.2014.7097867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome indexing is the basis for many bioinformatics applications. Read mapping(sequence alignment) is one such application where the goal is to align millions of short reads against reference genome. Several tools are available for read mapping which rely on different indexing techniques to expedite the alignment process. However, many of these contemporary alignment programs are sequential, memory intensive and cannot be easily scaled for larger genomes. Suffix tree is one of the most widely used data structures for indexing strings (genomes). Building a scalable suffix-tree based tool is particularly challenging due to the difficulties involved in parallel construction of the suffix tree. Several suffix tree construction techniques have been proposed till date with focus on space-time tradeoff. Most of these existing works address the construction issue for uniprocessor and cannot be easily extended to utilize modern multi-processor systems. In this paper we investigate and propose a MapReduce based parallel construction of suffix tree. We demonstrate the performance of the algorithm over commodity cluster using up to 32 nodes each having 8GB of primary memory.\",\"PeriodicalId\":421740,\"journal\":{\"name\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"79 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PADSW.2014.7097867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

基因组索引是许多生物信息学应用的基础。读取映射(序列比对)就是这样一个应用程序，其目标是将数百万个短读取与参考基因组比对。有几种工具可用于读取映射，它们依赖于不同的索引技术来加快对齐过程。然而，许多这些当代的比对程序是顺序的，内存密集型的，不能很容易地扩展到更大的基因组。后缀树是索引字符串(基因组)最广泛使用的数据结构之一。构建一个可扩展的基于后缀树的工具尤其具有挑战性，因为涉及到并行构建后缀树的困难。迄今为止已经提出了几种后缀树构建技术，其重点是时空权衡。这些现有的工作大多解决了单处理器的构造问题，不能轻易地扩展到利用现代多处理器系统。本文研究并提出了一种基于MapReduce的后缀树并行构造方法。我们使用多达32个节点(每个节点具有8GB主内存)在商品集群上演示了该算法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MapReduce based parallel suffix tree construction for human genome

Genome indexing is the basis for many bioinformatics applications. Read mapping(sequence alignment) is one such application where the goal is to align millions of short reads against reference genome. Several tools are available for read mapping which rely on different indexing techniques to expedite the alignment process. However, many of these contemporary alignment programs are sequential, memory intensive and cannot be easily scaled for larger genomes. Suffix tree is one of the most widely used data structures for indexing strings (genomes). Building a scalable suffix-tree based tool is particularly challenging due to the difficulties involved in parallel construction of the suffix tree. Several suffix tree construction techniques have been proposed till date with focus on space-time tradeoff. Most of these existing works address the construction issue for uniprocessor and cannot be easily extended to utilize modern multi-processor systems. In this paper we investigate and propose a MapReduce based parallel construction of suffix tree. We demonstrate the performance of the algorithm over commodity cluster using up to 32 nodes each having 8GB of primary memory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

自引率

0.00%

发文量

期刊最新文献

Optimal bandwidth allocation with dynamic multi-path routing for non-critical traffic in AFDX networks Sensor-free corner shape detection by wireless networks Accelerated variance reduction methods on GPU Fault-Tolerant bi-directional communications in web-based applications Performance analysis of HPC applications with irregular tree data structures