Ahmad Shokrani Baigi, Abdorreza Savadi, Mahmoud Naghibzadeh
{"title":"异构 CPU-GPU 环境中基于稀疏矩阵的高性能动态调度应用","authors":"Ahmad Shokrani Baigi, Abdorreza Savadi, Mahmoud Naghibzadeh","doi":"10.1007/s11227-024-06394-1","DOIUrl":null,"url":null,"abstract":"<p>Efficient utilization of processors in heterogeneous CPU–GPU systems is crucial for improving overall application performance by reducing workload completion time. This article introduces a framework designed to achieve maximum performance in scheduling the processing of sparse matrix-based applications within a heterogeneous CPU–GPU system. The framework suggests splitting the matrix into chunks, employing machine learning to find the optimal chunk size for scheduling efficiency, with the number of GPU streams regarded as a critical factor. The scheduling algorithm introduced is inspired by the concept of quartiles in statistics and is designed to operate in real-time, thereby striving to impose minimal overhead on the system. The evaluation of the proposed framework focused on the SpMV (Sparse Matrix–Vector Multiplication) kernel, essential for various applications such as matrix-based graph processing. This evaluation was conducted using a system equipped with an NVIDIA GTX 1070 GPU. Testing on real-world sparse matrices showed that the proposed scheduling algorithm significantly outperforms scenarios with no offloading, full offloading, and the Alternate Assignment method.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment\",\"authors\":\"Ahmad Shokrani Baigi, Abdorreza Savadi, Mahmoud Naghibzadeh\",\"doi\":\"10.1007/s11227-024-06394-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Efficient utilization of processors in heterogeneous CPU–GPU systems is crucial for improving overall application performance by reducing workload completion time. This article introduces a framework designed to achieve maximum performance in scheduling the processing of sparse matrix-based applications within a heterogeneous CPU–GPU system. The framework suggests splitting the matrix into chunks, employing machine learning to find the optimal chunk size for scheduling efficiency, with the number of GPU streams regarded as a critical factor. The scheduling algorithm introduced is inspired by the concept of quartiles in statistics and is designed to operate in real-time, thereby striving to impose minimal overhead on the system. The evaluation of the proposed framework focused on the SpMV (Sparse Matrix–Vector Multiplication) kernel, essential for various applications such as matrix-based graph processing. This evaluation was conducted using a system equipped with an NVIDIA GTX 1070 GPU. Testing on real-world sparse matrices showed that the proposed scheduling algorithm significantly outperforms scenarios with no offloading, full offloading, and the Alternate Assignment method.</p>\",\"PeriodicalId\":501596,\"journal\":{\"name\":\"The Journal of Supercomputing\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-024-06394-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06394-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment
Efficient utilization of processors in heterogeneous CPU–GPU systems is crucial for improving overall application performance by reducing workload completion time. This article introduces a framework designed to achieve maximum performance in scheduling the processing of sparse matrix-based applications within a heterogeneous CPU–GPU system. The framework suggests splitting the matrix into chunks, employing machine learning to find the optimal chunk size for scheduling efficiency, with the number of GPU streams regarded as a critical factor. The scheduling algorithm introduced is inspired by the concept of quartiles in statistics and is designed to operate in real-time, thereby striving to impose minimal overhead on the system. The evaluation of the proposed framework focused on the SpMV (Sparse Matrix–Vector Multiplication) kernel, essential for various applications such as matrix-based graph processing. This evaluation was conducted using a system equipped with an NVIDIA GTX 1070 GPU. Testing on real-world sparse matrices showed that the proposed scheduling algorithm significantly outperforms scenarios with no offloading, full offloading, and the Alternate Assignment method.