Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675277
U. Vishnoi, T. Noll
QR-decomposition accelerators are attractive SoC components for many applications with a wide range of specifications. A new family of highly area- and energy-efficient, modular two-way linear-array QRD architectures based on the Givens algorithm and CORDIC rotations is proposed. The template architecture allows for implementations of real-/complex-valued and integer/floating-point QRDs. An accurate algebraic cost model enables cross-level optimization over architecture, micro-architecture and circuit level using a rich set of parameters. Quantitative results for exemplary applications are presented for implementations in 40-nm CMOS, proving the significant improvement of efficiency.
{"title":"A family of modular area- and energy-efficient QRD-accelerator architectures","authors":"U. Vishnoi, T. Noll","doi":"10.1109/ISSoC.2013.6675277","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675277","url":null,"abstract":"QR-decomposition accelerators are attractive SoC components for many applications with a wide range of specifications. A new family of highly area- and energy-efficient, modular two-way linear-array QRD architectures based on the Givens algorithm and CORDIC rotations is proposed. The template architecture allows for implementations of real-/complex-valued and integer/floating-point QRDs. An accurate algebraic cost model enables cross-level optimization over architecture, micro-architecture and circuit level using a rich set of parameters. Quantitative results for exemplary applications are presented for implementations in 40-nm CMOS, proving the significant improvement of efficiency.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114101630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675280
M. Odendahl, J. Castrillón, Vitaliy Volevach, R. Leupers, G. Ascheid
Automated mapping of dataflow applications to state-of-the-art, heterogeneous Multiprocessor Systems on Chip (MPSoCs) with complex interconnects and communication means is an ongoing research endeavor. We implement, measure and analyze three different communication libraries for a representative, off-the-shelf platform of this kind. The results of the analysis are used to show the need of a new cost model to properly characterize inter-task communication. Afterwards, this paper presents an algorithm to solve the mapping problem jointly for computation and communication using this cost model. A case study with four real streaming applications shows that the obtained mapping is able to reduce the execution time. Compared to a mapping decision where all channels are mapped to shared memory, the makespan fell down up to 10% due to an automated selection of a more appropriate communication library.
{"title":"Split-cost communication model for improved MPSoC application mapping","authors":"M. Odendahl, J. Castrillón, Vitaliy Volevach, R. Leupers, G. Ascheid","doi":"10.1109/ISSoC.2013.6675280","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675280","url":null,"abstract":"Automated mapping of dataflow applications to state-of-the-art, heterogeneous Multiprocessor Systems on Chip (MPSoCs) with complex interconnects and communication means is an ongoing research endeavor. We implement, measure and analyze three different communication libraries for a representative, off-the-shelf platform of this kind. The results of the analysis are used to show the need of a new cost model to properly characterize inter-task communication. Afterwards, this paper presents an algorithm to solve the mapping problem jointly for computation and communication using this cost model. A case study with four real streaming applications shows that the obtained mapping is able to reduce the execution time. Compared to a mapping decision where all channels are mapped to shared memory, the makespan fell down up to 10% due to an automated selection of a more appropriate communication library.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675259
G. Panic, O. Schrape, T. Basmer, F. Vater, K. Tittelbach-Helmrich
In this paper we describe a sensor node crypto processor designed for use in wireless sensor networks with strong security demands. The presented system-on-chip is a mixed-signal processor-based design containing the hardware crypto accelerators (AES, ECC, SHA-1) that provide the means for secure communication in the network. The unique system architecture combines an asynchronous processor core with synchronous peripherals resulting in a low-power system operation. The designed chip integrates an embedded Flash memory and a 12-bit ADC making it a suitable solution for small-size sensor node devices. The paper describes the chip architecture and discusses the most important implementation and verification issues. Finally, the results of the chip measurement have been presented.
{"title":"TNODE: A low power sensor node processor for secure wireless networks","authors":"G. Panic, O. Schrape, T. Basmer, F. Vater, K. Tittelbach-Helmrich","doi":"10.1109/ISSoC.2013.6675259","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675259","url":null,"abstract":"In this paper we describe a sensor node crypto processor designed for use in wireless sensor networks with strong security demands. The presented system-on-chip is a mixed-signal processor-based design containing the hardware crypto accelerators (AES, ECC, SHA-1) that provide the means for secure communication in the network. The unique system architecture combines an asynchronous processor core with synchronous peripherals resulting in a low-power system operation. The designed chip integrates an embedded Flash memory and a 12-bit ADC making it a suitable solution for small-size sensor node devices. The paper describes the chip architecture and discusses the most important implementation and verification issues. Finally, the results of the chip measurement have been presented.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121030621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675261
J. Määttä, Mikko Honkonen, Tommi Korhonen, E. Salminen, T. Hämäläinen
Large-scale HW and SW projects contain thousands of source files, which requires proper file management in order to keep track of changes and keep the code in compilable state. Different parts of the system depend on each other, and even a small change in a certain part of the code may break the other parts. Dependency analysis can be used to prevent such problems by visualizing the SW structure so that dependencies are easily seen by the developer. This paper presents a novel tool for file dependency and change analysis and visualization that was implemented into our IP-XACT based Kactus2 design environment (GPL2). The tool is capable of sorting source files into IP-XACT file sets, extracting and visualizing file dependencies, and keeping track of changed files. It also offers the ability to create manual dependencies, e.g., between source code and documentation. The dependency and change analysis of 1k source code files containing 140k lines of code is performed in less than two minutes.
{"title":"Dependency analysis and visualization tool for Kactus2 IP-XACT design framework","authors":"J. Määttä, Mikko Honkonen, Tommi Korhonen, E. Salminen, T. Hämäläinen","doi":"10.1109/ISSoC.2013.6675261","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675261","url":null,"abstract":"Large-scale HW and SW projects contain thousands of source files, which requires proper file management in order to keep track of changes and keep the code in compilable state. Different parts of the system depend on each other, and even a small change in a certain part of the code may break the other parts. Dependency analysis can be used to prevent such problems by visualizing the SW structure so that dependencies are easily seen by the developer. This paper presents a novel tool for file dependency and change analysis and visualization that was implemented into our IP-XACT based Kactus2 design environment (GPL2). The tool is capable of sorting source files into IP-XACT file sets, extracting and visualizing file dependencies, and keeping track of changed files. It also offers the ability to create manual dependencies, e.g., between source code and documentation. The dependency and change analysis of 1k source code files containing 140k lines of code is performed in less than two minutes.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127283619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675262
Syed M. A. H. Jafri, S. Piestrak, A. Hemani, K. Paul, J. Plosila, H. Tenhunen
This paper investigates the overhead imposed by various configuration scrubbing techniques used in fault-tolerant Coarse Grained Reconfigurable Arrays (CGRAs). Today, reconfigurable architectures host large configuration memories. As we progress further in the nanometer regime, these configuration memories have become increasingly susceptible to single event upsets caused e.g. by cosmic radiation. Configuration scrubbing is a frequently used technique to protect these configuration memories against single event upsets. Existing works on configuration scrubbing deal only with FPGA without any reference to the CGRAs (in which configuration memories consume up to 50% of silicon area). Moreover, in the known literature lacks a comprehensive comparison of various configuration scrubbing techniques to guide system designers about the merits/demerits of different scrubbing methods which could be applied to CGRAs. To address these problems, in this paper we classify various configuration scrubbing techniques and quantify their trade-offs when implemented on a CGRA. Synthesis results reveal that scrubbing logic incurs negligible silicon overhead (up to 3% of the area of computational units). Simulation results obtained for a few algorithms/applications (FFT, FIR, matrix multiplication, and WLAN) show that the choice of the configuration scrubbing scheme (external vs. internal) has significant impact on both the size of configuration memory and the number of reconfiguration cycles (respectively 20-80% more and up to 38 times more for the former).
{"title":"Implementation and evaluation of configuration scrubbing on CGRAs: A case study","authors":"Syed M. A. H. Jafri, S. Piestrak, A. Hemani, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/ISSoC.2013.6675262","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675262","url":null,"abstract":"This paper investigates the overhead imposed by various configuration scrubbing techniques used in fault-tolerant Coarse Grained Reconfigurable Arrays (CGRAs). Today, reconfigurable architectures host large configuration memories. As we progress further in the nanometer regime, these configuration memories have become increasingly susceptible to single event upsets caused e.g. by cosmic radiation. Configuration scrubbing is a frequently used technique to protect these configuration memories against single event upsets. Existing works on configuration scrubbing deal only with FPGA without any reference to the CGRAs (in which configuration memories consume up to 50% of silicon area). Moreover, in the known literature lacks a comprehensive comparison of various configuration scrubbing techniques to guide system designers about the merits/demerits of different scrubbing methods which could be applied to CGRAs. To address these problems, in this paper we classify various configuration scrubbing techniques and quantify their trade-offs when implemented on a CGRA. Synthesis results reveal that scrubbing logic incurs negligible silicon overhead (up to 3% of the area of computational units). Simulation results obtained for a few algorithms/applications (FFT, FIR, matrix multiplication, and WLAN) show that the choice of the configuration scrubbing scheme (external vs. internal) has significant impact on both the size of configuration memory and the number of reconfiguration cycles (respectively 20-80% more and up to 38 times more for the former).","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131233224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675264
Antti Kamppi, Lauri Matilainen, J. Määttä, E. Salminen, T. Hämäläinen
Typical MPSoC FPGA product design is a rigid waterfall process proceeding one-way from HW to SW design. Any changes to HW trigger the SW project re-creation from the beginning. When several product variations or speculative development time exploration is required, the disk bloats easily with hundreds of Board Support Package (BSP), configuration and SW project files. In this paper, we present an IP-XACT based design flow that solves the problems by agile re-use of HW and SW components, automation and single golden reference source for information. We also present new extensions to IP-XACT since the standard lacks SW related features. Three use cases demonstrate how the BSP is changed, an application is moved to another processor and a function is moved from SW implementation to a HW accelerator. Our flow reduces the design time to one third compared to the conventional FPGA flow, the number of automated design phases is doubled and any manual error prone data transfer between HW and SW tools is completely avoided.
典型的MPSoC FPGA产品设计是一个严格的瀑布式过程,从硬件设计到软件设计是单向的。对硬件的任何更改都会从一开始触发软件项目的重新创建。当需要几个产品变体或推测开发时间探索时,磁盘很容易膨胀,因为有数百个Board Support Package (BSP)、配置和SW项目文件。在本文中,我们提出了一个基于IP-XACT的设计流程,该流程通过灵活重用硬件和软件组件、自动化和单一黄金信息源来解决这些问题。我们还为IP-XACT提供了新的扩展,因为该标准缺乏与软件相关的特性。三个用例演示了如何更改BSP,如何将应用程序移动到另一个处理器,以及如何将功能从软件实现移动到硬件加速器。与传统的FPGA流程相比,我们的流程将设计时间减少了三分之一,自动化设计阶段的数量增加了一倍,并且完全避免了硬件和软件工具之间容易出错的人工数据传输。
{"title":"Extending IP-XACT to embedded system HW/SW integration","authors":"Antti Kamppi, Lauri Matilainen, J. Määttä, E. Salminen, T. Hämäläinen","doi":"10.1109/ISSoC.2013.6675264","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675264","url":null,"abstract":"Typical MPSoC FPGA product design is a rigid waterfall process proceeding one-way from HW to SW design. Any changes to HW trigger the SW project re-creation from the beginning. When several product variations or speculative development time exploration is required, the disk bloats easily with hundreds of Board Support Package (BSP), configuration and SW project files. In this paper, we present an IP-XACT based design flow that solves the problems by agile re-use of HW and SW components, automation and single golden reference source for information. We also present new extensions to IP-XACT since the standard lacks SW related features. Three use cases demonstrate how the BSP is changed, an application is moved to another processor and a function is moved from SW implementation to a HW accelerator. Our flow reduces the design time to one third compared to the conventional FPGA flow, the number of automated design phases is doubled and any manual error prone data transfer between HW and SW tools is completely avoided.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/ISSoC.2013.6675281
Che-Chuan Kuo, Kun-Chih Chen, En-Jui Chang, A. Wu
The thermal problems of three-dimensional Network-on-Chip (3D NoC) systems become more serious because of die stacking. Besides, for high-performance requirement, the minimal adaptive routing algorithms result in unbalanced traffic load and worse temperature distribution in the system. On the other hand, the conventional selection strategies determine the routing path based on the traffic information, which leads to unawareness of the potential thermal hotspot and huge performance impact. To solve the problems, in this paper, we first define a novel thermal-aware routing index, Mean Time To Throttle (MTTT), which represents the remaining active time of the node before the temperature achieves the alarming level. Based on the information of MTTT, we propose a Proactive Thermal-Budget-Based Beltway Routing (PTB3R) to balance the temperature distribution of the NoC system. The experimental results show that the proposed PTB3R can help to reduce the number of throttled nodes by 25.56%~86.95% and improve network throughput by around 15.04%~19.87%.
由于芯片的堆叠,三维片上网络系统的热问题变得越来越严重。此外,由于对高性能的要求,最小自适应路由算法会导致系统的流量负载不均衡和温度分布变差。另一方面,传统的路由选择策略根据流量信息确定路由路径,导致无法意识到潜在的热热点,对性能影响很大。为了解决这些问题,本文首先定义了一种新的热感知路由指标——平均节流时间(Mean Time To Throttle, MTTT),它表示节点在温度达到告警水平之前的剩余活动时间。基于MTTT的信息,我们提出了一种基于主动热预算的环城公路路由(PTB3R)来平衡NoC系统的温度分布。实验结果表明,所提出的PTB3R可将受限节点数量减少25.56%~86.95%,将网络吞吐量提高15.04%~19.87%左右。
{"title":"Proactive Thermal-Budget-Based Beltway Routing algorithm for thermal-aware 3D NoC systems","authors":"Che-Chuan Kuo, Kun-Chih Chen, En-Jui Chang, A. Wu","doi":"10.1109/ISSoC.2013.6675281","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675281","url":null,"abstract":"The thermal problems of three-dimensional Network-on-Chip (3D NoC) systems become more serious because of die stacking. Besides, for high-performance requirement, the minimal adaptive routing algorithms result in unbalanced traffic load and worse temperature distribution in the system. On the other hand, the conventional selection strategies determine the routing path based on the traffic information, which leads to unawareness of the potential thermal hotspot and huge performance impact. To solve the problems, in this paper, we first define a novel thermal-aware routing index, Mean Time To Throttle (MTTT), which represents the remaining active time of the node before the temperature achieves the alarming level. Based on the information of MTTT, we propose a Proactive Thermal-Budget-Based Beltway Routing (PTB3R) to balance the temperature distribution of the NoC system. The experimental results show that the proposed PTB3R can help to reduce the number of throttled nodes by 25.56%~86.95% and improve network throughput by around 15.04%~19.87%.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121982074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}