{"title":"High-performance data management for genome sequencing centers using Globus Online: A case study","authors":"Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé","doi":"10.1109/eScience.2012.6404443","DOIUrl":null,"url":null,"abstract":"In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"127 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 8th International Conference on E-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2012.6404443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.