{"title":"Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization","authors":"P. Lacroute","doi":"10.1145/218327.218331","DOIUrl":null,"url":null,"abstract":"This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional","PeriodicalId":101947,"journal":{"name":"Proceedings of the IEEE symposium on Parallel rendering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"121","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE symposium on Parallel rendering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/218327.218331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 121
Abstract
This paper presents a new parallel volume rendering algorithm that can render 2563 voxel medical data sets at over 10 Hz and 1283 voxel data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recentlyreported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for sharedmemory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets. CR Categories: D.1.3 [Concurrent Programming]: Parallel Programming; 1.3.3 [Computer Graphics]: Picture/Image Generation--Display Algorithms; L3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism. Additional