{"title":"在NUMA多核CPU上加速minimap2的长读排序","authors":"Qisheng Xu, Y. Dou, Yanjie Sun","doi":"10.1145/3569966.3570012","DOIUrl":null,"url":null,"abstract":"Recent advances in three-generation sequencing technology allow for the rapid generation of large throughput of long reads, and mapping these long reads to a reference sequence is one of the first and most time-consuming steps in the downstream application of genomics. Minimap2, the state-of-the-art long-read sequencing aligner available today, has the advantage of being fast and accurate. However, as NUMA multi-core CPU gradually becomes the processors of mainstream computers, minimap2 is not specifically optimised and adapted for the NUMA multi-core architecture. Frequent remote memory accesses, resource contention and idle hardware resources result in a performance far below the theoretical peak performance of NUMA multi-core CPU. Based on the above problems, we propose three optimisation strategies, namely copying index at each NUMA node and binding threads to the cores of NUMA node, designing new IO and computation overlap mechanism, and adaptively adjusting batch_size based on IO and computation time, to achieve full utilisation of resources. We obtain three sets of human genome sequencing data from the ENA database and performed performance tests on the FT 2000+ MCD-FP92 NUMA multi-core CPU system. The three-point strategies proposed in this paper are effective in improving the performance of minimap2, with a maximum speedup of 13 percentage points.","PeriodicalId":145580,"journal":{"name":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating minimap2 for long-read sequencing on NUMA multi-core CPU\",\"authors\":\"Qisheng Xu, Y. Dou, Yanjie Sun\",\"doi\":\"10.1145/3569966.3570012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in three-generation sequencing technology allow for the rapid generation of large throughput of long reads, and mapping these long reads to a reference sequence is one of the first and most time-consuming steps in the downstream application of genomics. Minimap2, the state-of-the-art long-read sequencing aligner available today, has the advantage of being fast and accurate. However, as NUMA multi-core CPU gradually becomes the processors of mainstream computers, minimap2 is not specifically optimised and adapted for the NUMA multi-core architecture. Frequent remote memory accesses, resource contention and idle hardware resources result in a performance far below the theoretical peak performance of NUMA multi-core CPU. Based on the above problems, we propose three optimisation strategies, namely copying index at each NUMA node and binding threads to the cores of NUMA node, designing new IO and computation overlap mechanism, and adaptively adjusting batch_size based on IO and computation time, to achieve full utilisation of resources. We obtain three sets of human genome sequencing data from the ENA database and performed performance tests on the FT 2000+ MCD-FP92 NUMA multi-core CPU system. The three-point strategies proposed in this paper are effective in improving the performance of minimap2, with a maximum speedup of 13 percentage points.\",\"PeriodicalId\":145580,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3569966.3570012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569966.3570012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating minimap2 for long-read sequencing on NUMA multi-core CPU
Recent advances in three-generation sequencing technology allow for the rapid generation of large throughput of long reads, and mapping these long reads to a reference sequence is one of the first and most time-consuming steps in the downstream application of genomics. Minimap2, the state-of-the-art long-read sequencing aligner available today, has the advantage of being fast and accurate. However, as NUMA multi-core CPU gradually becomes the processors of mainstream computers, minimap2 is not specifically optimised and adapted for the NUMA multi-core architecture. Frequent remote memory accesses, resource contention and idle hardware resources result in a performance far below the theoretical peak performance of NUMA multi-core CPU. Based on the above problems, we propose three optimisation strategies, namely copying index at each NUMA node and binding threads to the cores of NUMA node, designing new IO and computation overlap mechanism, and adaptively adjusting batch_size based on IO and computation time, to achieve full utilisation of resources. We obtain three sets of human genome sequencing data from the ENA database and performed performance tests on the FT 2000+ MCD-FP92 NUMA multi-core CPU system. The three-point strategies proposed in this paper are effective in improving the performance of minimap2, with a maximum speedup of 13 percentage points.