High-performance data management for genome sequencing centers using Globus Online: A case study

Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé
{"title":"High-performance data management for genome sequencing centers using Globus Online: A case study","authors":"Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé","doi":"10.1109/eScience.2012.6404443","DOIUrl":null,"url":null,"abstract":"In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"127 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 8th International Conference on E-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2012.6404443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Globus Online的基因组测序中心的高性能数据管理:一个案例研究
在过去的几年里,在生物医学领域,以下一代测序形式出现的低成本测序方法已经彻底改变了生命科学研究人员为了更好地了解疾病的致病因素而采取的方法。随着生物医学研究人员对许多患者的DNA和RNA进行测序,测序中心正在与数百名研究人员合作,为每位研究人员提供tb到pb的数据。如今,高通量技术产生的基因组序列数据规模空前,这需要复杂和高性能的数据处理和管理方法。然而,在大多数情况下,最先进的技术是使用硬盘来传输数据。随着数据量达到数十甚至数百tb,这种方法变得越来越不切实际。存储在便携式媒体上的数据很容易丢失,并且通常不是协作的所有成员都可以轻松访问。在本文中,我们讨论了Globus Online在测序设施中的应用,以解决由于快速增长的研究小组产生的数据量呈指数级增长而产生的数据移动和管理挑战。我们还介绍了在测序中心环境中应用Globus Online解决方案的独特挑战以及我们如何克服这些挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scientific Workflow Interchanging through Patterns: Reversals and Lessons Learned Shape Analysis Using the Spectral Graph Wavelet Transform Provenance analysis: Towards quality provenance Fast confidential search for bio-medical data using Bloom filters and Homomorphic Cryptography Calibration of watershed models using cloud computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1