生物信息学中大规模序列数据分析的hadoop集群优化

IF 0.3 Q4 MATHEMATICS Annales Mathematicae et Informaticae Pub Date : 2019-01-01 DOI:10.33039/AMI.2019.01.002
Ádám Tóth, Ramin Karimi
{"title":"生物信息学中大规模序列数据分析的hadoop集群优化","authors":"Ádám Tóth, Ramin Karimi","doi":"10.33039/AMI.2019.01.002","DOIUrl":null,"url":null,"abstract":"Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all areas of modern biology. However, the ability to produce data continues to outpace the ability to analyze them. Therefore, continuous efforts are also needed to improve bioinformatics applications for a better use of these research opportunities. Due to the complexity and diver-sity of metagenomics data, it has been a major challenging field of bioinformatics. Sequence-based identification methods such as using DNA signature (unique k-mer) are the most recent popular methods of real-time analysis of raw sequencing data. DNA signature discovery is compute-intensive and time-consuming.Hadoop,the application of parallel and distributed computing is one of the popular applications for the analysis of large scale data in bioinformatics. Optimization of the time-consumption and computational resource usages such as CPU consumption and memory usage are the main goals of this paper, along with the management of the Hadoop cluster nodes.","PeriodicalId":43454,"journal":{"name":"Annales Mathematicae et Informaticae","volume":"178 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Optimization of hadoop cluster foranalyzing large-scale sequence data inbioinformatics\",\"authors\":\"Ádám Tóth, Ramin Karimi\",\"doi\":\"10.33039/AMI.2019.01.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all areas of modern biology. However, the ability to produce data continues to outpace the ability to analyze them. Therefore, continuous efforts are also needed to improve bioinformatics applications for a better use of these research opportunities. Due to the complexity and diver-sity of metagenomics data, it has been a major challenging field of bioinformatics. Sequence-based identification methods such as using DNA signature (unique k-mer) are the most recent popular methods of real-time analysis of raw sequencing data. DNA signature discovery is compute-intensive and time-consuming.Hadoop,the application of parallel and distributed computing is one of the popular applications for the analysis of large scale data in bioinformatics. Optimization of the time-consumption and computational resource usages such as CPU consumption and memory usage are the main goals of this paper, along with the management of the Hadoop cluster nodes.\",\"PeriodicalId\":43454,\"journal\":{\"name\":\"Annales Mathematicae et Informaticae\",\"volume\":\"178 1\",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annales Mathematicae et Informaticae\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33039/AMI.2019.01.002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annales Mathematicae et Informaticae","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33039/AMI.2019.01.002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 1

摘要

近年来,高通量测序平台的意外增长几乎影响了现代生物学的所有领域。然而,生成数据的能力继续超过分析数据的能力。因此,为了更好地利用这些研究机会,还需要不断努力提高生物信息学的应用。由于宏基因组学数据的复杂性和多样性,它一直是生物信息学的一个主要挑战领域。基于序列的鉴定方法,如使用DNA签名(独特的k-mer)是最新流行的实时分析原始测序数据的方法。DNA特征的发现需要大量计算,而且耗时。Hadoop是并行和分布式计算的应用,是生物信息学中大规模数据分析的热门应用之一。优化时间消耗和计算资源使用(如CPU消耗和内存使用)以及Hadoop集群节点的管理是本文的主要目标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimization of hadoop cluster foranalyzing large-scale sequence data inbioinformatics
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all areas of modern biology. However, the ability to produce data continues to outpace the ability to analyze them. Therefore, continuous efforts are also needed to improve bioinformatics applications for a better use of these research opportunities. Due to the complexity and diver-sity of metagenomics data, it has been a major challenging field of bioinformatics. Sequence-based identification methods such as using DNA signature (unique k-mer) are the most recent popular methods of real-time analysis of raw sequencing data. DNA signature discovery is compute-intensive and time-consuming.Hadoop,the application of parallel and distributed computing is one of the popular applications for the analysis of large scale data in bioinformatics. Optimization of the time-consumption and computational resource usages such as CPU consumption and memory usage are the main goals of this paper, along with the management of the Hadoop cluster nodes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
0
期刊最新文献
Using irreducible polynomials for random number generation Solving Hungarian natural language processing tasks with multilingual generative models Stability condition of multiclass classical retrials: a revised regenerative proof Sensitivity analysis of a single server finite-source retrial queueing system with two-way communication and catastrophic breakdown using simulation On the generalized Fibonacci like sequences and matrices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1