{"title":"Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight","authors":"Philip D. Blood, Shoshana Marcus, M. Schatz","doi":"10.1145/2616498.2616502","DOIUrl":null,"url":null,"abstract":"Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"82 1","pages":"20:1-20:6"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.