Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363701
C. Sauer, Hans-Peter Löb
Virtual prototypes leverage SystemC/TLM for simulating programmable platforms comprising 100s of modules. Their efficient creation and configuration is vital for acceptable turnaround times, e.g., during performance exploration or software development. Therefore, our lightweight infrastructure provides a factory creating designs from abstract descriptions of module instances, properties, and connections. Modules mark properties as creation or runtime parameters. The resulting generic design descriptions are usable by non-experts and enable front-ends. The infrastructure is a small C++ library that can be combined with existing SystemC/TLM models and simulation kernels. An industrial case study of a complex multiprocessor SoC shows a distinct productivity gain.
{"title":"A lightweight infrastructure for the dynamic creation and configuration of virtual platforms","authors":"C. Sauer, Hans-Peter Löb","doi":"10.1109/SAMOS.2015.7363701","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363701","url":null,"abstract":"Virtual prototypes leverage SystemC/TLM for simulating programmable platforms comprising 100s of modules. Their efficient creation and configuration is vital for acceptable turnaround times, e.g., during performance exploration or software development. Therefore, our lightweight infrastructure provides a factory creating designs from abstract descriptions of module instances, properties, and connections. Modules mark properties as creation or runtime parameters. The resulting generic design descriptions are usable by non-experts and enable front-ends. The infrastructure is a small C++ library that can be combined with existing SystemC/TLM models and simulation kernels. An industrial case study of a complex multiprocessor SoC shows a distinct productivity gain.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363704
P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer
In this paper a SystemC simulator for Network-on-Chip (NoC) based Multiprocessor Systems-on-Chip (MPSoCs) is presented. The simulator currently supports mesh topology with wormhole switching and several routing algorithms such as XY-, a minimal West-First and an adaptive West-First algorithm. The impact of routing algorithms regarding performance can be analyzed by means of the presented simulator. In order to simulate a heterogeneous MPSoC, ARM processors and MicroBlazes can be attached to the NoC. Processor and peripheral models used within the test platforms are provided by Imperas/OVP. Moreover, traffic generators are available to analyze the system. An additional SystemC component enables the readout of simulation time from within the application. For evaluation of the simulator multiple platforms and applications were put under test and compared with a hardware implementation. The comparison shows that the simulator improves the development of MPSoCs by early estimation of system requirements.
{"title":"MPSoCSim: An extended OVP simulator for modeling and evaluation of Network-on-Chip based heterogeneous MPSoCs","authors":"P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer","doi":"10.1109/SAMOS.2015.7363704","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363704","url":null,"abstract":"In this paper a SystemC simulator for Network-on-Chip (NoC) based Multiprocessor Systems-on-Chip (MPSoCs) is presented. The simulator currently supports mesh topology with wormhole switching and several routing algorithms such as XY-, a minimal West-First and an adaptive West-First algorithm. The impact of routing algorithms regarding performance can be analyzed by means of the presented simulator. In order to simulate a heterogeneous MPSoC, ARM processors and MicroBlazes can be attached to the NoC. Processor and peripheral models used within the test platforms are provided by Imperas/OVP. Moreover, traffic generators are available to analyze the system. An additional SystemC component enables the readout of simulation time from within the application. For evaluation of the simulator multiple platforms and applications were put under test and compared with a hardware implementation. The comparison shows that the simulator improves the development of MPSoCs by early estimation of system requirements.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132438707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363667
D. Souza, A. Ilic, N. Roma, L. Sousa
The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embedded systems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU.According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).
{"title":"HEVC in-loop filters GPU parallelization in embedded systems","authors":"D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/SAMOS.2015.7363667","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363667","url":null,"abstract":"The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embedded systems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU.According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134530157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363673
D. Gadioli, G. Palermo, C. Silvano
In this work, we introduce an application autotuning framework to dynamically adapt applications in multicore architectures. In particular, the framework exploits design-time knowledge and multi-objective requirements expressed by the user, to drive the autotuning process at the runtime. It also exploits a monitoring infrastructure to get runtime feed-back and to adapt to external changing conditions. The intrusiveness of the autotuning framework in the application (in terms of refactoring and lines of code to be added) has been kept limited, also to minimize the integration cost. To assess the proposed framework, we carried out an experimental campaign to evaluate the overhead, the relevance of the described features and the efficiency of the framework.
{"title":"Application autotuning to support runtime adaptivity in multicore architectures","authors":"D. Gadioli, G. Palermo, C. Silvano","doi":"10.1109/SAMOS.2015.7363673","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363673","url":null,"abstract":"In this work, we introduce an application autotuning framework to dynamically adapt applications in multicore architectures. In particular, the framework exploits design-time knowledge and multi-objective requirements expressed by the user, to drive the autotuning process at the runtime. It also exploits a monitoring infrastructure to get runtime feed-back and to adapt to external changing conditions. The intrusiveness of the autotuning framework in the application (in terms of refactoring and lines of code to be added) has been kept limited, also to minimize the integration cost. To assess the proposed framework, we carried out an experimental campaign to evaluate the overhead, the relevance of the described features and the efficiency of the framework.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129615562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363694
J. Hannuksela, M. Niskanen, Markus Turtinen
Noise reduction is one of the most fundamental digital image processing challenges. On mobile devices, proper solutions for this task can significantly increase the output image quality making the use of a camera even more attractive for customers. The main challenge is that the processing time and energy efficiency must be optimized, since the response time and the battery life are critical factors for all mobile applications. To identify the solutions that maximizes the real-time performance, we compare several different implementations in terms of computational performance and energy efficiency. Specifically, we compare the OpenCL based design with multithreaded and NEON accelerated implementations and analyze them on the mobile platform. Based on the results of this study, the OpenCL framework provides a viable energy efficient alternative for implementing computer vision algorithms.
{"title":"Performance evaluation of image noise reduction computing on a mobile platform","authors":"J. Hannuksela, M. Niskanen, Markus Turtinen","doi":"10.1109/SAMOS.2015.7363694","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363694","url":null,"abstract":"Noise reduction is one of the most fundamental digital image processing challenges. On mobile devices, proper solutions for this task can significantly increase the output image quality making the use of a camera even more attractive for customers. The main challenge is that the processing time and energy efficiency must be optimized, since the response time and the battery life are critical factors for all mobile applications. To identify the solutions that maximizes the real-time performance, we compare several different implementations in terms of computational performance and energy efficiency. Specifically, we compare the OpenCL based design with multithreaded and NEON accelerated implementations and analyze them on the mobile platform. Based on the results of this study, the OpenCL framework provides a viable energy efficient alternative for implementing computer vision algorithms.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132244624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363687
Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten
This paper presents a proof-of-concept for a modular SystemC SIL (Software-in-the-Loop) simulation environment, using a blackboard-like architecture. The proposed SIL framework integrates embedded control software with simulators developed in SystemC/SystemC-AMS or external tools, like MATLAB. The environment has been validated by a heating application for a professional printer, as example of an MDVP (Multi-Domain Virtual Prototyping) application. Our goal is to evaluate the use of SystemC/SystemC-AMS and to address the challenges in developing multiple-domain prototypes and blackboard-like SIL frameworks using this technology.
本文提出了一个模块化SystemC SIL (software -in- loop)仿真环境的概念验证,使用类似黑板的架构。所提出的SIL框架将嵌入式控制软件与用SystemC/SystemC- ams或外部工具(如MATLAB)开发的模拟器集成在一起。作为MDVP(多域虚拟样机)应用程序的示例,该环境已通过专业打印机的加热应用程序进行了验证。我们的目标是评估SystemC/SystemC- ams的使用,并解决使用该技术开发多领域原型和类似黑板的SIL框架的挑战。
{"title":"Multi-Domain Virtual Prototyping in a SystemC SIL framework: A heating system case study","authors":"Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten","doi":"10.1109/SAMOS.2015.7363687","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363687","url":null,"abstract":"This paper presents a proof-of-concept for a modular SystemC SIL (Software-in-the-Loop) simulation environment, using a blackboard-like architecture. The proposed SIL framework integrates embedded control software with simulators developed in SystemC/SystemC-AMS or external tools, like MATLAB. The environment has been validated by a heating application for a professional printer, as example of an MDVP (Multi-Domain Virtual Prototyping) application. Our goal is to evaluate the use of SystemC/SystemC-AMS and to address the challenges in developing multiple-domain prototypes and blackboard-like SIL frameworks using this technology.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363650
O. Mutlu
Summary form only given. The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques. In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we may also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.
{"title":"Rethinking memory system design for data-intensive computing","authors":"O. Mutlu","doi":"10.1109/SAMOS.2015.7363650","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363650","url":null,"abstract":"Summary form only given. The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques. In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we may also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363665
T. Spink, Harry Wagstaff, Björn Franke, N. Topham
Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-in Just-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting.
动态二进制转换(DBT)允许为一种指令集架构(ISA)编译的软件在支持不同ISA的处理器上执行。一些现代DBT系统将其主执行循环与内置JIT编译器解耦,即JIT编译器可以在不同线程中异步操作而不会阻塞程序执行。然而,这对于具有双ISA支持的目标体系结构(如ARM/THUMB)产生了一个问题,其中当前执行的指令流的ISA可能与JIT编译器处理的ISA不同,因为它们的解耦操作和动态模式更改。本文提出了一种在异步DBT系统中支持双ISA的新方法,该方法集成了ISA模式跟踪和软件指令解码器热插拔。我们将演示如何在可重定向DBT系统中实现这一点,其中目标ISA不是硬编码的,而是从高级体系结构描述生成特定于处理器的模块。我们已经在DBT中实现了对ARM V5T的支持,并演示了在针对ARM/THUMB编译的SPEC CPU 2006基准测试中高达1148 MIPS的执行速度,平均达到了QEMU速度的192%,最高可达323%,而QEMU需要大量的手动性能调优,并且需要大量的低水平工作来重新定位。
{"title":"Efficient dual-ISA support in a retargetable, asynchronous Dynamic Binary Translator","authors":"T. Spink, Harry Wagstaff, Björn Franke, N. Topham","doi":"10.1109/SAMOS.2015.7363665","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363665","url":null,"abstract":"Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-in Just-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363703
Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos
The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.
{"title":"A virtual platform for exploring hierarchical interconnection for many-accelerator systems","authors":"Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos","doi":"10.1109/SAMOS.2015.7363703","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363703","url":null,"abstract":"The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-19DOI: 10.1109/SAMOS.2015.7363661
Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker
Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.
{"title":"A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models","authors":"Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker","doi":"10.1109/SAMOS.2015.7363661","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363661","url":null,"abstract":"Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}