Christopher Harrison, Sündüz Keleş, Rebecca Hudson, Sunyoung Shin, Inês Dutra
{"title":"atSNPInfrastructure,一项搜索数十亿条记录的案例研究,同时为云提供商节省了大量成本。","authors":"Christopher Harrison, Sündüz Keleş, Rebecca Hudson, Sunyoung Shin, Inês Dutra","doi":"10.1109/IPDPSW.2018.00086","DOIUrl":null,"url":null,"abstract":"<p><p>We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.</p>","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2018 ","pages":"497-506"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6195815/pdf/nihms-989639.pdf","citationCount":"0","resultStr":"{\"title\":\"atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers.\",\"authors\":\"Christopher Harrison, Sündüz Keleş, Rebecca Hudson, Sunyoung Shin, Inês Dutra\",\"doi\":\"10.1109/IPDPSW.2018.00086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.</p>\",\"PeriodicalId\":90848,\"journal\":{\"name\":\"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"2018 \",\"pages\":\"497-506\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6195815/pdf/nihms-989639.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2018.00086\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2018/8/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2018.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/8/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers.
We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.