{"title":"求矩阵多项式的并行算法","authors":"Sivan Toledo, Amit Waisel","doi":"10.1145/3337821.3337871","DOIUrl":null,"url":null,"abstract":"We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Algorithms for Evaluating Matrix Polynomials\",\"authors\":\"Sivan Toledo, Amit Waisel\",\"doi\":\"10.1145/3337821.3337871\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.\",\"PeriodicalId\":405273,\"journal\":{\"name\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3337821.3337871\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Algorithms for Evaluating Matrix Polynomials
We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.