{"title":"Superlinear speedup for matrix multiplication","authors":"S. Ristov, M. Gusev","doi":"10.2498/iti.2012.0376","DOIUrl":null,"url":null,"abstract":"Amdahl has shown that multiprocessor execution performance is not proportional to the number of processors. Gustafson has found a way to show that there are algorithms which can have almost linear speedup. In this article we have found algorithms which can achieve a superlinear speedup. The idea is not based on changing the algorithm or executing smaller number of operations like in the parallel search. It is based on characteristics of using an structure persistent algorithm which efficiently exploits the cache in a shared multiprocessor and avoids cache misses as much as possible. Our experimental research shows results of superlinear speedup for algorithms which run on modern multicore and multi-chip architectures and perform beyond expectations of maximum linear speedup.","PeriodicalId":135105,"journal":{"name":"Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2498/iti.2012.0376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Amdahl has shown that multiprocessor execution performance is not proportional to the number of processors. Gustafson has found a way to show that there are algorithms which can have almost linear speedup. In this article we have found algorithms which can achieve a superlinear speedup. The idea is not based on changing the algorithm or executing smaller number of operations like in the parallel search. It is based on characteristics of using an structure persistent algorithm which efficiently exploits the cache in a shared multiprocessor and avoids cache misses as much as possible. Our experimental research shows results of superlinear speedup for algorithms which run on modern multicore and multi-chip architectures and perform beyond expectations of maximum linear speedup.