Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411181
A. Morgenshtein, I. Cidon, A. Kolodny, R. Ginosar
An analytical model is employed to characterize and compare serial and parallel communication techniques in NoC interconnects. Simulations that are based on 130 nm and 70 nm technology parameters reveal up to /spl times/5.5 and /spl times/17 reduction in power and area of serial vs. 32-bit multi-layer parallel links, respectively. Lower power is dissipated by a single-layer parallel link but it occupies a larger area. We conclude that long on-chip interconnects could benefit from serial links.
{"title":"Comparative analysis of serial vs parallel links in NoC","authors":"A. Morgenshtein, I. Cidon, A. Kolodny, R. Ginosar","doi":"10.1109/ISSOC.2004.1411181","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411181","url":null,"abstract":"An analytical model is employed to characterize and compare serial and parallel communication techniques in NoC interconnects. Simulations that are based on 130 nm and 70 nm technology parameters reveal up to /spl times/5.5 and /spl times/17 reduction in power and area of serial vs. 32-bit multi-layer parallel links, respectively. Lower power is dissipated by a single-layer parallel link but it occupies a larger area. We conclude that long on-chip interconnects could benefit from serial links.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117220640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411141
M. Erez
Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing "stream scheduling" to stage the high-level movement of streams, and "communication scheduling" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.
{"title":"Stream architectures - efficiency and programmability","authors":"M. Erez","doi":"10.1109/ISSOC.2004.1411141","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411141","url":null,"abstract":"Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing \"stream scheduling\" to stage the high-level movement of streams, and \"communication scheduling\" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129511538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411147
Koji Inoue, Hidekazu Tanaka, V. Moshnyaga, K. Murakami
This paper reports design and evaluation results of a low-energy I-cache architecture, called history-based tag-comparison (HBTC) cache. The HBTC cache attempts to re-use tag-comparison results to detect and eliminate unnecessary memory-array activations. We have performed cycle accurate simulations, and have designed an SRAM core based on a 0.18 /spl mu/m CMOS technology. As a result, it has been observed that the HBTC approach can achieve 60% of energy reduction, with only 0.3% performance degradation, compared to a conventional cache. Furthermore, we have also evaluated the potential of the HBTC cache by combining with other low-energy techniques.
{"title":"A low-power I-cache design with tag-comparison reuse","authors":"Koji Inoue, Hidekazu Tanaka, V. Moshnyaga, K. Murakami","doi":"10.1109/ISSOC.2004.1411147","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411147","url":null,"abstract":"This paper reports design and evaluation results of a low-energy I-cache architecture, called history-based tag-comparison (HBTC) cache. The HBTC cache attempts to re-use tag-comparison results to detect and eliminate unnecessary memory-array activations. We have performed cycle accurate simulations, and have designed an SRAM core based on a 0.18 /spl mu/m CMOS technology. As a result, it has been observed that the HBTC approach can achieve 60% of energy reduction, with only 0.3% performance degradation, compared to a conventional cache. Furthermore, we have also evaluated the potential of the HBTC cache by combining with other low-energy techniques.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129866841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411163
T. Ristimäki, J. Nurmi
An extensive survey concentrating totally on reconfigurable IP blocks is given. The most remarkable prevailing implementations are categorized according to the computational granularity, communication topology and source of block, i.e. academic vs. commercial. Also our own research results in this field are included in the classification.
{"title":"Reconfigurable IP blocks: a survey [SoC]","authors":"T. Ristimäki, J. Nurmi","doi":"10.1109/ISSOC.2004.1411163","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411163","url":null,"abstract":"An extensive survey concentrating totally on reconfigurable IP blocks is given. The most remarkable prevailing implementations are categorized according to the computational granularity, communication topology and source of block, i.e. academic vs. commercial. Also our own research results in this field are included in the classification.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411138
G. Smit, P. M. Heysters, Michèl A. J. Rosien, Egbert Molenkamp
In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wireless devices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project.
{"title":"Lessons learned from designing the MONTIUM - a coarse-grained reconfigurable processing tile","authors":"G. Smit, P. M. Heysters, Michèl A. J. Rosien, Egbert Molenkamp","doi":"10.1109/ISSOC.2004.1411138","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411138","url":null,"abstract":"In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wireless devices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126307146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411135
Jian Liu, Lirong Zheng, H. Tenhunen
This work presents a circuit-switched network architecture for network-on-chip. It uses the time-division-multiplexing (TDM) scheme to realize the circuits. The global routing (slot assignment at each switch) is done centrally while the slot mapping is done locally by the switches. The switches support multicast operation, which enables multicast traffic. Furthermore, the delay in the network is predictable before a circuit is established and in-order data delivery is guaranteed.
{"title":"Global routing for multicast-supporting TDM network-on-chip","authors":"Jian Liu, Lirong Zheng, H. Tenhunen","doi":"10.1109/ISSOC.2004.1411135","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411135","url":null,"abstract":"This work presents a circuit-switched network architecture for network-on-chip. It uses the time-division-multiplexing (TDM) scheme to realize the circuits. The global routing (slot assignment at each switch) is done centrally while the slot mapping is done locally by the switches. The switches support multicast operation, which enables multicast traffic. Furthermore, the delay in the network is predictable before a circuit is established and in-order data delivery is guaranteed.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411179
A. Habibi, A. Gawanmeh, S. Tahar
In this paper, we present an assertion based verification approach for SystemC designs, based on embedding the property specification language (PSL) using abstract state machines (ASM). Our approach utilizes an existing embedding of PSL in ASM in order to enable modeling of PSL assertions at the ASM level. Here, we propose to compile PSL assertions into C# code, and integrate them with the SystemC design. Assertions are then verified by simulating the new model that combines the original design and the integrated assertions. This enriches the SystemC language with a powerful and expressive assertion specification layer, and improves the verification of SystemC designs by targeting specific properties during simulation.
{"title":"Assertion based verification of PSL for SystemC designs","authors":"A. Habibi, A. Gawanmeh, S. Tahar","doi":"10.1109/ISSOC.2004.1411179","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411179","url":null,"abstract":"In this paper, we present an assertion based verification approach for SystemC designs, based on embedding the property specification language (PSL) using abstract state machines (ASM). Our approach utilizes an existing embedding of PSL in ASM in order to enable modeling of PSL assertions at the ASM level. Here, we propose to compile PSL assertions into C# code, and integrate them with the SystemC design. Assertions are then verified by simulating the new model that combines the original design and the integrated assertions. This enriches the SystemC language with a powerful and expressive assertion specification layer, and improves the verification of SystemC designs by targeting specific properties during simulation.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411180
Chin-Tung Chan, Yu-Hong Chang, Hsichi Ho, H. Chiueh
A novel thermal-aware power management (TAPM) software intellectual property (soft-IP) for modem platform-based SoC designs is presented. This research proposes the system-level architecture of thermal-aware power management, which includes a power management bus (PMB), TAPM soft-IP and interface circuitry for the proposed PMB. Each component of the proposed design is encapsulated into a soft-IP. With the above design, system architects are able to incorporate on-chip power-controls and sensors to achieve nominal power dissipation and ensure the targeted system works within specification. The design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as providing the flexibility to support different management schemes. The proposed system and its components are designed, implemented and verified by a prototype chip, which was fabricated in a TSMC 0.25 /spl mu/m 1P5M standard CMOS technology through the National Chip Implementation Center (CIC), Taiwan.
{"title":"A thermal-aware power management soft-IP for platform-based SoC designs","authors":"Chin-Tung Chan, Yu-Hong Chang, Hsichi Ho, H. Chiueh","doi":"10.1109/ISSOC.2004.1411180","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411180","url":null,"abstract":"A novel thermal-aware power management (TAPM) software intellectual property (soft-IP) for modem platform-based SoC designs is presented. This research proposes the system-level architecture of thermal-aware power management, which includes a power management bus (PMB), TAPM soft-IP and interface circuitry for the proposed PMB. Each component of the proposed design is encapsulated into a soft-IP. With the above design, system architects are able to incorporate on-chip power-controls and sensors to achieve nominal power dissipation and ensure the targeted system works within specification. The design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as providing the flexibility to support different management schemes. The proposed system and its components are designed, implemented and verified by a prototype chip, which was fabricated in a TSMC 0.25 /spl mu/m 1P5M standard CMOS technology through the National Chip Implementation Center (CIC), Taiwan.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130085364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411161
Tuukka Kasanko, J. Nurmi
Verification is currently the most time consuming task in the development of new designs. Automation must be introduced in order to achieve satisfactory results within reasonable time. This work presents how verification was conducted for one SoC component, a 32-bit RISC processor core. A wide variety of tools and methods were used in the process.
{"title":"Verification of a 32-bit RISC processor core","authors":"Tuukka Kasanko, J. Nurmi","doi":"10.1109/ISSOC.2004.1411161","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411161","url":null,"abstract":"Verification is currently the most time consuming task in the development of new designs. Automation must be introduced in order to achieve satisfactory results within reasonable time. This work presents how verification was conducted for one SoC component, a 32-bit RISC processor core. A wide variety of tools and methods were used in the process.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-11-16DOI: 10.1109/ISSOC.2004.1411169
Timo Rintakoski, M. Kuulusa, J. Nurmi
A hardware unit for producing binary orthogonal variable spreading factor (OVSF), Hadamard and Walsh codes for WCDMA/CDMA2000 systems is presented. The generator uses a spreading factor, mode select, and the code index as the control input. The synthesized hardware unit consumes 512 NAND2-equivalent logic gates.
{"title":"Hardware unit for OVSF/Walsh/Hadamard code generation [3G mobile communication applications]","authors":"Timo Rintakoski, M. Kuulusa, J. Nurmi","doi":"10.1109/ISSOC.2004.1411169","DOIUrl":"https://doi.org/10.1109/ISSOC.2004.1411169","url":null,"abstract":"A hardware unit for producing binary orthogonal variable spreading factor (OVSF), Hadamard and Walsh codes for WCDMA/CDMA2000 systems is presented. The generator uses a spreading factor, mode select, and the code index as the control input. The synthesized hardware unit consumes 512 NAND2-equivalent logic gates.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"AES-11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}