Suffix Tree Construction based Mapreduce

Sihem Klai Soukehal, Karima Chibane, M. Khadir
{"title":"Suffix Tree Construction based Mapreduce","authors":"Sihem Klai Soukehal, Karima Chibane, M. Khadir","doi":"10.1109/ICTAACS48474.2019.8988123","DOIUrl":null,"url":null,"abstract":"The genome sequence indexing is a primary step in order to facilitate other further treatments such as patterns search or assembly with a reference genome etc. And the suffix tree is one of the most used data structures for indexing the genome sequence. However, the memory required for running the suffix tree construction algorithms may exceed the amount of available main memory. Despite the efforts made by the researchers, the construction of suffix tree remains very expensive with the use of data centres to ensure optimal parallelization of treatments and reduce the execution time without forgetting the risks of breakdown and the problems that it breeds. The parallelization performed by Hadoop and MapReduce gives solutions to storage and data processing capacity limitations as well as fault tolerance, all that at reasonable costs. The emergence of Hadoop, a framework related to big data and the paradigm MapReduce that allows to model parallel and distributed processing, is investigating many domains of science in order to effectively parallel their treatments. PWOTD (Partition and Write Only Top Down) algorithm, is chosen here as it has proven itself in textual algorithms for genome sequencing. In this paper, an approach to model the parallel construction of the suffix tree using the MapReduce paradigm is designed for implementation in Hadoop with a java API.","PeriodicalId":406766,"journal":{"name":"2019 International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAACS48474.2019.8988123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The genome sequence indexing is a primary step in order to facilitate other further treatments such as patterns search or assembly with a reference genome etc. And the suffix tree is one of the most used data structures for indexing the genome sequence. However, the memory required for running the suffix tree construction algorithms may exceed the amount of available main memory. Despite the efforts made by the researchers, the construction of suffix tree remains very expensive with the use of data centres to ensure optimal parallelization of treatments and reduce the execution time without forgetting the risks of breakdown and the problems that it breeds. The parallelization performed by Hadoop and MapReduce gives solutions to storage and data processing capacity limitations as well as fault tolerance, all that at reasonable costs. The emergence of Hadoop, a framework related to big data and the paradigm MapReduce that allows to model parallel and distributed processing, is investigating many domains of science in order to effectively parallel their treatments. PWOTD (Partition and Write Only Top Down) algorithm, is chosen here as it has proven itself in textual algorithms for genome sequencing. In this paper, an approach to model the parallel construction of the suffix tree using the MapReduce paradigm is designed for implementation in Hadoop with a java API.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于Mapreduce的后缀树构建
基因组序列索引是促进其他进一步治疗如模式搜索或与参考基因组组装等的首要步骤。后缀树是基因组序列索引中最常用的数据结构之一。但是,运行后缀树构造算法所需的内存可能会超过可用的主内存。尽管研究人员做出了努力,但后缀树的构建仍然非常昂贵,需要使用数据中心来确保处理的最佳并行化,减少执行时间,同时还要考虑崩溃的风险和由此产生的问题。Hadoop和MapReduce执行的并行化解决了存储和数据处理容量限制以及容错问题,所有这些都是在合理的成本下完成的。Hadoop的出现,一个与大数据相关的框架,以及允许并行和分布式处理建模的范式MapReduce,正在研究许多科学领域,以便有效地并行它们的处理。这里选择PWOTD (Partition and Write Only Top Down)算法,因为它已经在基因组测序的文本算法中证明了自己。本文设计了一种使用MapReduce范式对后缀树的并行构建建模的方法,并通过java API在Hadoop中实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Suffix Tree Construction based Mapreduce Online Adversarial Planning in μRTS : A Survey A New Approach for Computing the Matching Degree in the Paper-to-Reviewer Assignment Problem Multi-criteria-based relay election for Data Dissemination in urban VANET A Framework for implementing the interoperability of semantic web services: Case study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1