Chih-Han Yang, Jhih-Wun Zeng, C. Liu, Shih-Hao Hung
{"title":"用并行化DeepVariant加速变量调用","authors":"Chih-Han Yang, Jhih-Wun Zeng, C. Liu, Shih-Hao Hung","doi":"10.1145/3400286.3418243","DOIUrl":null,"url":null,"abstract":"Due to the rapid evolution of the next-generation sequencing (NGS) technology, the sequence of an individual's genome can be determined from billions of short reads at a decreasing cost, which has advanced the fields of medical research and precision medicine with the ability to correlate mutations between genomes. Analysis of genome sequences, especially variant calling, is exceedingly computationally intensive, as it demands large storage capacity, computing power, and high-speed network to reduce the processing time. In the case of DeepVariant, an open-source software package which employs a deep neural network (DNN) to calls genetic variants, it took four hours to complete the analysis on a workstation with a high-performance GPU device to accelerate the DNN. Therefore, we profiled the performance of DeepVariant and refactored the code to reduce the time and cost of the NGS pipeline with a series of code optimization works. As a result, our distributed version of DeepVariant can finish the same job within 8 minutes on 8 dual-CPU nodes and 8 GPUs, which outperforms commercial versions in the market.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Accelerating Variant Calling with Parallelized DeepVariant\",\"authors\":\"Chih-Han Yang, Jhih-Wun Zeng, C. Liu, Shih-Hao Hung\",\"doi\":\"10.1145/3400286.3418243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the rapid evolution of the next-generation sequencing (NGS) technology, the sequence of an individual's genome can be determined from billions of short reads at a decreasing cost, which has advanced the fields of medical research and precision medicine with the ability to correlate mutations between genomes. Analysis of genome sequences, especially variant calling, is exceedingly computationally intensive, as it demands large storage capacity, computing power, and high-speed network to reduce the processing time. In the case of DeepVariant, an open-source software package which employs a deep neural network (DNN) to calls genetic variants, it took four hours to complete the analysis on a workstation with a high-performance GPU device to accelerate the DNN. Therefore, we profiled the performance of DeepVariant and refactored the code to reduce the time and cost of the NGS pipeline with a series of code optimization works. As a result, our distributed version of DeepVariant can finish the same job within 8 minutes on 8 dual-CPU nodes and 8 GPUs, which outperforms commercial versions in the market.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418243\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating Variant Calling with Parallelized DeepVariant
Due to the rapid evolution of the next-generation sequencing (NGS) technology, the sequence of an individual's genome can be determined from billions of short reads at a decreasing cost, which has advanced the fields of medical research and precision medicine with the ability to correlate mutations between genomes. Analysis of genome sequences, especially variant calling, is exceedingly computationally intensive, as it demands large storage capacity, computing power, and high-speed network to reduce the processing time. In the case of DeepVariant, an open-source software package which employs a deep neural network (DNN) to calls genetic variants, it took four hours to complete the analysis on a workstation with a high-performance GPU device to accelerate the DNN. Therefore, we profiled the performance of DeepVariant and refactored the code to reduce the time and cost of the NGS pipeline with a series of code optimization works. As a result, our distributed version of DeepVariant can finish the same job within 8 minutes on 8 dual-CPU nodes and 8 GPUs, which outperforms commercial versions in the market.