Next-generation sequencing revolution through big data analytics

Q1 Biochemistry, Genetics and Molecular Biology Frontiers in Life Science Pub Date : 2016-04-02 DOI:10.1080/21553769.2016.1178180
R. Tripathi, Pawan Sharma, P. Chakraborty, P. Varadwaj
{"title":"Next-generation sequencing revolution through big data analytics","authors":"R. Tripathi, Pawan Sharma, P. Chakraborty, P. Varadwaj","doi":"10.1080/21553769.2016.1178180","DOIUrl":null,"url":null,"abstract":"ABSTRACT Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.","PeriodicalId":12756,"journal":{"name":"Frontiers in Life Science","volume":"9 1","pages":"119 - 149"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21553769.2016.1178180","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Life Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/21553769.2016.1178180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 34

Abstract

ABSTRACT Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大数据分析带来的下一代测序革命
新一代测序(NGS)技术导致了基因组数据量的空前增长,同时也增加了共享、存档、整合和分析这些数据的挑战。NGS的规模和效率对这些庞大的基因组数据的分析、基因相互作用、注释和表达研究提出了挑战。然而,使用大数据框架的工具和算法可以安全地克服NGS的这一限制。在此框架下,我们回顾了NGS大数据算法的知识现状,以揭示测序、分析和注释等方面的隐藏模式。基于apache的Hadoop框架为大规模数据分析提供了一个感兴趣且可适应的环境。它有几个组件,用于以容错方式将大规模数据分区到商用硬件集群上。MapReduce、Cloudburst、Crossbow、Myrna、Eoulsan、DistMap、Seal和Contrail等软件包执行各种NGS应用程序,如适配器修剪、质量检查、读取映射、从头组装、量化、表达分析、变体分析和注释。本文从NGS的角度综述了Hadoop技术的应用现状、使用情况和局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Life Science
Frontiers in Life Science MULTIDISCIPLINARY SCIENCES-
CiteScore
5.50
自引率
0.00%
发文量
0
期刊介绍: Frontiers in Life Science publishes high quality and innovative research at the frontier of biology with an emphasis on interdisciplinary research. We particularly encourage manuscripts that lie at the interface of the life sciences and either the more quantitative sciences (including chemistry, physics, mathematics, and informatics) or the social sciences (philosophy, anthropology, sociology and epistemology). We believe that these various disciplines can all contribute to biological research and provide original insights to the most recurrent questions.
期刊最新文献
Factors affecting population structure and fruit production of Strychnos innocua Delile and Strychnos spinosa Lam. in Benin, West Africa Characterizing the hindgut microbiome in healthy and ketotic cows Cashmere cyclic growth affected by different photoperiods alters DNA methylation patterns Gene Polymorphisms of the antioxidant enzymes NOX, GSTP, and GPX and diabetic nephropathy risk in Saudi patients with type 2 diabetes Ferroptosis plays a role in osteoarthritis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1