Stream architectures - efficiency and programmability

M. Erez
{"title":"Stream architectures - efficiency and programmability","authors":"M. Erez","doi":"10.1109/ISSOC.2004.1411141","DOIUrl":null,"url":null,"abstract":"Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing \"stream scheduling\" to stage the high-level movement of streams, and \"communication scheduling\" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSOC.2004.1411141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing "stream scheduling" to stage the high-level movement of streams, and "communication scheduling" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
流架构——效率和可编程性
只提供摘要形式。流处理器是用高级语言完全可编程的,但能够实现与固定功能ASIC解决方案(约20 pJ/op)相当的计算效率,并且可以从Gop/s (20 mW)块扩展到当前半导体技术中的Top/s (20 W)芯片。流处理器的并行特性使得它们的性能可以随技术而扩展。在2010年的45纳米技术中,我们预计效率为1 pJ/op,性能高达20 Top/s (20 W)。流处理器包含一组算术单元,这些算术单元由深度和显式寄存器层次结构提供数据,这也有助于将指令执行与不可预测和长延迟的内存操作解耦。这种解耦和暴露的通信架构使编译器能够自动将流应用程序(例如信号流图)映射到处理数组:使用“流调度”来执行流的高级移动,并使用“通信调度”来调度低级内核中的数据移动。这种显式的通信优化导致几乎所有数据和指令的移动都在短线路上进行,因此几乎所有的能量都用于有用的计算。我们已经建立了一个流信号处理器的原型,Imagine,并演示了包括视频压缩/解压缩、无线通信和自适应波束形成的流应用。我们还在设计Merrimac超级计算机,它使用基于与Imagine相同架构原则的流处理器,说明了流概念的灵活性、通用性和可扩展性。本文描述了流架构、流编程系统和流应用。与传统的dsp、fpga和ASIC解决方案进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design reuse and design for reuse, a case study on HDSL2 Development of NSoC program in Taiwan Efficient barrier synchronization mechanism for emulated shared memory NOCs Evaluation of platform architecture performance using abstract instruction-level workload models Assertion based verification of PSL for SystemC designs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1