Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success

J. Fang
{"title":"Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success","authors":"J. Fang","doi":"10.1145/1229428.1229430","DOIUrl":null,"url":null,"abstract":"Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a \"Tera-scale\" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as \"the way of the future?\" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple \"Thread + Lock\" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance","PeriodicalId":244171,"journal":{"name":"International Symposium on Code Generation and Optimization (CGO'07)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization (CGO'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1229428.1229430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Summary form only given. Moore's Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65nm today to 45nm, 32 nm and 22nm in the future. Since the power and thermal constraints increase with frequency, multi-core or many-core will be the way of the future microprocessor. In the near future, HW platforms will have many-cores (>16 cores) on die to achieve >1 TIPs computation power, which will communicate each other through an on-die interconnect fabric with >1 TB/s on-die bandwidth and <30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket and the optical silicon photonics will provide up to 1 Tb/s I/O bandwidth between boxes. The HW system with TIPs of compute power operating in Tera-bytes of data make this a "Tera-scale" platform. What are the SW implications with the HW changes from uniprocessor to Tera-scale platform with many-cores as "the way of the future?" It will be great challenge for programming environments to help programmers to develop concurrent code for most client software. A good concurrent programming environment should extend the existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are lots of research topics. Examples of these topics include flexible parallel programming models based on needs from applications, better synchronization mechanisms like Transactional Memory to replace simple "Thread + Lock" structure, nested data parallel language primitives with new protocols, fine-grained synchronization mechanisms with HW support, maybe fine-grained message passing, advanced compiler optimizations for the threaded code, and SW tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
并行编程环境:将太规模平台转化为巨大成功的关键
只提供摘要形式。随着硅技术从今天的65纳米发展到未来的45纳米、32纳米和22纳米,摩尔定律将在未来几十年继续增加芯片上的晶体管数量。由于功率和热约束随着频率的增加而增加,多核或多核将是未来微处理器的发展方向。在不久的将来,硬件平台将在芯片上拥有多核(>6核),以实现> 1tips的计算能力,这些计算能力将通过片上互连结构相互通信,片上带宽为> 1tb /s,延迟<30个周期。Off-die D-cache将采用3D堆叠内存技术,以极大地增加Off-die缓存/内存带宽并减少延迟。快速铜柔性电缆将连接插槽上的CPU-DRAM,光学硅光子学将在盒子之间提供高达1tb /s的I/O带宽。HW系统的计算能力在万亿字节的数据中运行,使其成为一个“万亿级”平台。硬件从单处理器到以多核为“未来趋势”的万亿级平台的转变对软件有什么影响?对于编程环境来说,帮助程序员为大多数客户端软件开发并发代码将是一个巨大的挑战。一个好的并发编程环境应该是对现有编程语言的扩展,是典型程序员所熟悉的,并为并发编程带来好处。有很多研究课题。这些主题的例子包括基于应用程序需求的灵活并行编程模型,更好的同步机制,如事务性内存,以取代简单的“线程+锁”结构,嵌套数据并行语言原语与新协议,支持硬件的细粒度同步机制,可能是细粒度消息传递,线程代码的高级编译器优化,以及并发编程环境中的软件工具。一个更有趣的问题是如何使用这样一个多核系统来提高单线程性能
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Graph-Based Procedural Abstraction Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success Microarchitecture Sensitive Empirical Models for Compiler Optimizations Loop Optimization using Hierarchical Compilation and Kernel Decomposition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1