Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success

International Symposium on Code Generation and Optimization (CGO'07) Pub Date : 2007-03-14 DOI:10.1145/1229428.1229430

J. Fang

{"title":"Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success","authors":"J. Fang","doi":"10.1145/1229428.1229430","DOIUrl":null,"url":null,"abstract":"Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a \"Tera-scale\" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as \"the way of the future?\" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple \"Thread + Lock\" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance","PeriodicalId":244171,"journal":{"name":"International Symposium on Code Generation and Optimization (CGO'07)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization (CGO'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1229428.1229430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a "Tera-scale" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as "the way of the future?" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple "Thread + Lock" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

并行编程环境:将太规模平台转化为巨大成功的关键

只提供摘要形式。随着硅技术从今天的65纳米发展到未来的45纳米、32纳米和22纳米，摩尔定律将在未来几十年继续增加芯片上的晶体管数量。由于功率和热约束随着频率的增加而增加，多核或多核将是未来微处理器的发展方向。在不久的将来，硬件平台将在芯片上拥有多核(>6核)，以实现> 1tips的计算能力，这些计算能力将通过片上互连结构相互通信，片上带宽为> 1tb /s，延迟<30个周期。Off-die D-cache将采用3D堆叠内存技术，以极大地增加Off-die缓存/内存带宽并减少延迟。快速铜柔性电缆将连接插槽上的CPU-DRAM，光学硅光子学将在盒子之间提供高达1tb /s的I/O带宽。HW系统的计算能力在万亿字节的数据中运行，使其成为一个“万亿级”平台。硬件从单处理器到以多核为“未来趋势”的万亿级平台的转变对软件有什么影响?对于编程环境来说，帮助程序员为大多数客户端软件开发并发代码将是一个巨大的挑战。一个好的并发编程环境应该是对现有编程语言的扩展，是典型程序员所熟悉的，并为并发编程带来好处。有很多研究课题。这些主题的例子包括基于应用程序需求的灵活并行编程模型，更好的同步机制，如事务性内存，以取代简单的“线程+锁”结构，嵌套数据并行语言原语与新协议，支持硬件的细粒度同步机制，可能是细粒度消息传递，线程代码的高级编译器优化，以及并发编程环境中的软件工具。一个更有趣的问题是如何使用这样一个多核系统来提高单线程性能

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Symposium on Code Generation and Optimization (CGO'07)

自引率

0.00%

发文量

期刊最新文献

Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Graph-Based Procedural Abstraction Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success Microarchitecture Sensitive Empirical Models for Compiler Optimizations Loop Optimization using Hierarchical Compilation and Kernel Decomposition