{"title":"Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success","authors":"J. Fang","doi":"10.1145/1229428.1229430","DOIUrl":null,"url":null,"abstract":"Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a \"Tera-scale\" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as \"the way of the future?\" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple \"Thread + Lock\" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance","PeriodicalId":244171,"journal":{"name":"International Symposium on Code Generation and Optimization (CGO'07)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization (CGO'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1229428.1229430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a "Tera-scale" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as "the way of the future?" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple "Thread + Lock" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance