首页 > 最新文献

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)最新文献

英文 中文
A lightweight infrastructure for the dynamic creation and configuration of virtual platforms 用于动态创建和配置虚拟平台的轻量级基础设施
C. Sauer, Hans-Peter Löb
Virtual prototypes leverage SystemC/TLM for simulating programmable platforms comprising 100s of modules. Their efficient creation and configuration is vital for acceptable turnaround times, e.g., during performance exploration or software development. Therefore, our lightweight infrastructure provides a factory creating designs from abstract descriptions of module instances, properties, and connections. Modules mark properties as creation or runtime parameters. The resulting generic design descriptions are usable by non-experts and enable front-ends. The infrastructure is a small C++ library that can be combined with existing SystemC/TLM models and simulation kernels. An industrial case study of a complex multiprocessor SoC shows a distinct productivity gain.
虚拟原型利用SystemC/TLM来模拟包含100个模块的可编程平台。它们的高效创建和配置对于可接受的周转时间至关重要,例如,在性能探索或软件开发期间。因此,我们的轻量级基础设施提供了一个从模块实例、属性和连接的抽象描述创建设计的工厂。模块将属性标记为创建参数或运行时参数。生成的通用设计描述可供非专家使用,并支持前端。该基础设施是一个小型c++库,可以与现有的SystemC/TLM模型和仿真内核相结合。复杂多处理器SoC的工业案例研究显示了明显的生产率提高。
{"title":"A lightweight infrastructure for the dynamic creation and configuration of virtual platforms","authors":"C. Sauer, Hans-Peter Löb","doi":"10.1109/SAMOS.2015.7363701","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363701","url":null,"abstract":"Virtual prototypes leverage SystemC/TLM for simulating programmable platforms comprising 100s of modules. Their efficient creation and configuration is vital for acceptable turnaround times, e.g., during performance exploration or software development. Therefore, our lightweight infrastructure provides a factory creating designs from abstract descriptions of module instances, properties, and connections. Modules mark properties as creation or runtime parameters. The resulting generic design descriptions are usable by non-experts and enable front-ends. The infrastructure is a small C++ library that can be combined with existing SystemC/TLM models and simulation kernels. An industrial case study of a complex multiprocessor SoC shows a distinct productivity gain.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MPSoCSim: An extended OVP simulator for modeling and evaluation of Network-on-Chip based heterogeneous MPSoCs MPSoCSim:一个扩展的OVP模拟器,用于基于片上网络的异构mpsoc的建模和评估
P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer
In this paper a SystemC simulator for Network-on-Chip (NoC) based Multiprocessor Systems-on-Chip (MPSoCs) is presented. The simulator currently supports mesh topology with wormhole switching and several routing algorithms such as XY-, a minimal West-First and an adaptive West-First algorithm. The impact of routing algorithms regarding performance can be analyzed by means of the presented simulator. In order to simulate a heterogeneous MPSoC, ARM processors and MicroBlazes can be attached to the NoC. Processor and peripheral models used within the test platforms are provided by Imperas/OVP. Moreover, traffic generators are available to analyze the system. An additional SystemC component enables the readout of simulation time from within the application. For evaluation of the simulator multiple platforms and applications were put under test and compared with a hardware implementation. The comparison shows that the simulator improves the development of MPSoCs by early estimation of system requirements.
本文介绍了一种基于片上网络(NoC)的多处理器片上系统(mpsoc)的SystemC模拟器。该模拟器目前支持网格拓扑与虫洞交换和几种路由算法,如XY-,一个最小的西优先和自适应西优先算法。利用所设计的仿真器可以分析路由算法对性能的影响。为了模拟异构MPSoC, ARM处理器和microblaze可以附加到NoC上。测试平台中使用的处理器和外设模型由Imperas/OVP提供。此外,还可以使用流量生成器对系统进行分析。一个额外的SystemC组件可以从应用程序中读出模拟时间。为了对模拟器进行评估,对多个平台和应用进行了测试,并与硬件实现进行了比较。仿真结果表明,该仿真器通过对系统需求的早期估计,提高了mpsoc的开发效率。
{"title":"MPSoCSim: An extended OVP simulator for modeling and evaluation of Network-on-Chip based heterogeneous MPSoCs","authors":"P. Wehner, J. Rettkowski, Tobias Kleinschmidt, D. Göhringer","doi":"10.1109/SAMOS.2015.7363704","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363704","url":null,"abstract":"In this paper a SystemC simulator for Network-on-Chip (NoC) based Multiprocessor Systems-on-Chip (MPSoCs) is presented. The simulator currently supports mesh topology with wormhole switching and several routing algorithms such as XY-, a minimal West-First and an adaptive West-First algorithm. The impact of routing algorithms regarding performance can be analyzed by means of the presented simulator. In order to simulate a heterogeneous MPSoC, ARM processors and MicroBlazes can be attached to the NoC. Processor and peripheral models used within the test platforms are provided by Imperas/OVP. Moreover, traffic generators are available to analyze the system. An additional SystemC component enables the readout of simulation time from within the application. For evaluation of the simulator multiple platforms and applications were put under test and compared with a hardware implementation. The comparison shows that the simulator improves the development of MPSoCs by early estimation of system requirements.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132438707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
HEVC in-loop filters GPU parallelization in embedded systems 嵌入式系统中HEVC环内滤波GPU并行化
D. Souza, A. Ilic, N. Roma, L. Sousa
The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embedded systems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU.According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).
最新的HEVC标准所提供的编码效率和视觉质量的提高,主要是以编码器和解码器的计算复杂度显著增加为代价的。然而,这种增加的复杂性极大地影响了该标准在计算和能量受限设备中的实现,包括嵌入式系统、移动设备和电池供电设备。为了规避这一限制,本文提出利用嵌入式GPU设备已经装备了许多最先进的soc来加速HEVC环内滤波器(即去块滤波器和样本自适应偏移)。提出的方法综合利用了NVIDIA Tegra GPU中这些滤波器的细粒度和粗粒度并行化机会。实验结果表明,该方法能够在不到20 ms(约50 fps)的时间内过滤出每个超高清4K帧内帧,能够满足HEVC解码器的实时性要求。
{"title":"HEVC in-loop filters GPU parallelization in embedded systems","authors":"D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/SAMOS.2015.7363667","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363667","url":null,"abstract":"The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embedded systems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU.According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134530157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Application autotuning to support runtime adaptivity in multicore architectures 应用程序自动调优以支持多核架构中的运行时适应性
D. Gadioli, G. Palermo, C. Silvano
In this work, we introduce an application autotuning framework to dynamically adapt applications in multicore architectures. In particular, the framework exploits design-time knowledge and multi-objective requirements expressed by the user, to drive the autotuning process at the runtime. It also exploits a monitoring infrastructure to get runtime feed-back and to adapt to external changing conditions. The intrusiveness of the autotuning framework in the application (in terms of refactoring and lines of code to be added) has been kept limited, also to minimize the integration cost. To assess the proposed framework, we carried out an experimental campaign to evaluate the overhead, the relevance of the described features and the efficiency of the framework.
在这项工作中,我们引入了一个应用程序自动调优框架来动态地适应多核架构中的应用程序。特别是,该框架利用设计时知识和用户表达的多目标需求来驱动运行时的自动调优过程。它还利用监视基础设施来获取运行时反馈并适应外部不断变化的条件。自动调优框架在应用程序中的侵入性(就重构和要添加的代码行而言)一直受到限制,这也是为了最小化集成成本。为了评估所提出的框架,我们进行了一项实验来评估开销、所描述特征的相关性和框架的效率。
{"title":"Application autotuning to support runtime adaptivity in multicore architectures","authors":"D. Gadioli, G. Palermo, C. Silvano","doi":"10.1109/SAMOS.2015.7363673","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363673","url":null,"abstract":"In this work, we introduce an application autotuning framework to dynamically adapt applications in multicore architectures. In particular, the framework exploits design-time knowledge and multi-objective requirements expressed by the user, to drive the autotuning process at the runtime. It also exploits a monitoring infrastructure to get runtime feed-back and to adapt to external changing conditions. The intrusiveness of the autotuning framework in the application (in terms of refactoring and lines of code to be added) has been kept limited, also to minimize the integration cost. To assess the proposed framework, we carried out an experimental campaign to evaluate the overhead, the relevance of the described features and the efficiency of the framework.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129615562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Performance evaluation of image noise reduction computing on a mobile platform 基于移动平台的图像降噪计算性能评价
J. Hannuksela, M. Niskanen, Markus Turtinen
Noise reduction is one of the most fundamental digital image processing challenges. On mobile devices, proper solutions for this task can significantly increase the output image quality making the use of a camera even more attractive for customers. The main challenge is that the processing time and energy efficiency must be optimized, since the response time and the battery life are critical factors for all mobile applications. To identify the solutions that maximizes the real-time performance, we compare several different implementations in terms of computational performance and energy efficiency. Specifically, we compare the OpenCL based design with multithreaded and NEON accelerated implementations and analyze them on the mobile platform. Based on the results of this study, the OpenCL framework provides a viable energy efficient alternative for implementing computer vision algorithms.
降噪是数字图像处理中最基本的挑战之一。在移动设备上,针对这项任务的适当解决方案可以显着提高输出图像质量,使相机的使用对客户更具吸引力。主要的挑战是必须优化处理时间和能源效率,因为响应时间和电池寿命是所有移动应用程序的关键因素。为了确定最大化实时性能的解决方案,我们从计算性能和能源效率方面比较了几种不同的实现。具体来说,我们将基于OpenCL的设计与多线程和NEON加速实现进行了比较,并在移动平台上进行了分析。基于本研究的结果,OpenCL框架为实现计算机视觉算法提供了一种可行的节能替代方案。
{"title":"Performance evaluation of image noise reduction computing on a mobile platform","authors":"J. Hannuksela, M. Niskanen, Markus Turtinen","doi":"10.1109/SAMOS.2015.7363694","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363694","url":null,"abstract":"Noise reduction is one of the most fundamental digital image processing challenges. On mobile devices, proper solutions for this task can significantly increase the output image quality making the use of a camera even more attractive for customers. The main challenge is that the processing time and energy efficiency must be optimized, since the response time and the battery life are critical factors for all mobile applications. To identify the solutions that maximizes the real-time performance, we compare several different implementations in terms of computational performance and energy efficiency. Specifically, we compare the OpenCL based design with multithreaded and NEON accelerated implementations and analyze them on the mobile platform. Based on the results of this study, the OpenCL framework provides a viable energy efficient alternative for implementing computer vision algorithms.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132244624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Domain Virtual Prototyping in a SystemC SIL framework: A heating system case study SystemC SIL框架中的多域虚拟样机:一个供热系统案例研究
Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten
This paper presents a proof-of-concept for a modular SystemC SIL (Software-in-the-Loop) simulation environment, using a blackboard-like architecture. The proposed SIL framework integrates embedded control software with simulators developed in SystemC/SystemC-AMS or external tools, like MATLAB. The environment has been validated by a heating application for a professional printer, as example of an MDVP (Multi-Domain Virtual Prototyping) application. Our goal is to evaluate the use of SystemC/SystemC-AMS and to address the challenges in developing multiple-domain prototypes and blackboard-like SIL frameworks using this technology.
本文提出了一个模块化SystemC SIL (software -in- loop)仿真环境的概念验证,使用类似黑板的架构。所提出的SIL框架将嵌入式控制软件与用SystemC/SystemC- ams或外部工具(如MATLAB)开发的模拟器集成在一起。作为MDVP(多域虚拟样机)应用程序的示例,该环境已通过专业打印机的加热应用程序进行了验证。我们的目标是评估SystemC/SystemC- ams的使用,并解决使用该技术开发多领域原型和类似黑板的SIL框架的挑战。
{"title":"Multi-Domain Virtual Prototyping in a SystemC SIL framework: A heating system case study","authors":"Nikolaos Ilieskou, M. Blom, L. Somers, M. Reniers, T. Basten","doi":"10.1109/SAMOS.2015.7363687","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363687","url":null,"abstract":"This paper presents a proof-of-concept for a modular SystemC SIL (Software-in-the-Loop) simulation environment, using a blackboard-like architecture. The proposed SIL framework integrates embedded control software with simulators developed in SystemC/SystemC-AMS or external tools, like MATLAB. The environment has been validated by a heating application for a professional printer, as example of an MDVP (Multi-Domain Virtual Prototyping) application. Our goal is to evaluate the use of SystemC/SystemC-AMS and to address the challenges in developing multiple-domain prototypes and blackboard-like SIL frameworks using this technology.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rethinking memory system design for data-intensive computing 面向数据密集型计算的内存系统设计反思
O. Mutlu
Summary form only given. The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques. In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we may also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.
只提供摘要形式。存储系统是几乎所有计算系统的基本性能和能量瓶颈。最近的系统设计、应用程序和技术趋势要求内存系统具有更大的容量、带宽、效率和可预测性,这使其成为更重要的系统瓶颈。与此同时,DRAM和闪存技术正面临着艰难的技术扩展挑战,这使得维护和增强其容量、能效和可靠性的成本比传统技术要高得多。在这次演讲中,我们将探讨一些有前途的研究和设计方向,以克服内存缩放带来的挑战。具体来说,我们讨论了三个关键的解决方案方向:1)实现新的存储器架构、功能、接口,以及存储器和系统其余部分更好的集成;2)设计一个智能地采用多种存储器技术并使用非易失性存储器技术协调存储器和存储管理的存储器系统;3)为共享存储器/存储系统的应用程序提供可预测的性能和QoS。如果时间允许,我们还可以简要介绍我们在应对NAND闪存缩放挑战方面正在进行的相关工作。
{"title":"Rethinking memory system design for data-intensive computing","authors":"O. Mutlu","doi":"10.1109/SAMOS.2015.7363650","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363650","url":null,"abstract":"Summary form only given. The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy-efficiency, and reliability significantly more costly with conventional techniques. In this talk, we examine some promising research and design directions to overcome challenges posed by memory scaling. Specifically, we discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of the memory and the rest of the system, 2) designing a memory system that intelligently employs multiple memory technologies and coordinates memory and storage management using non-volatile memory technologies, 3) providing predictable performance and QoS to applications sharing the memory/storage system. If time permits, we may also briefly describe our ongoing related work in combating scaling challenges of NAND flash memory.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient dual-ISA support in a retargetable, asynchronous Dynamic Binary Translator 有效的双isa支持在一个可重目标,异步动态二进制转换器
T. Spink, Harry Wagstaff, Björn Franke, N. Topham
Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-in Just-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting.
动态二进制转换(DBT)允许为一种指令集架构(ISA)编译的软件在支持不同ISA的处理器上执行。一些现代DBT系统将其主执行循环与内置JIT编译器解耦,即JIT编译器可以在不同线程中异步操作而不会阻塞程序执行。然而,这对于具有双ISA支持的目标体系结构(如ARM/THUMB)产生了一个问题,其中当前执行的指令流的ISA可能与JIT编译器处理的ISA不同,因为它们的解耦操作和动态模式更改。本文提出了一种在异步DBT系统中支持双ISA的新方法,该方法集成了ISA模式跟踪和软件指令解码器热插拔。我们将演示如何在可重定向DBT系统中实现这一点,其中目标ISA不是硬编码的,而是从高级体系结构描述生成特定于处理器的模块。我们已经在DBT中实现了对ARM V5T的支持,并演示了在针对ARM/THUMB编译的SPEC CPU 2006基准测试中高达1148 MIPS的执行速度,平均达到了QEMU速度的192%,最高可达323%,而QEMU需要大量的手动性能调优,并且需要大量的低水平工作来重新定位。
{"title":"Efficient dual-ISA support in a retargetable, asynchronous Dynamic Binary Translator","authors":"T. Spink, Harry Wagstaff, Björn Franke, N. Topham","doi":"10.1109/SAMOS.2015.7363665","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363665","url":null,"abstract":"Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-in Just-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes. In this paper we present a new approach for dual-ISA support in such an asynchronous DBT system, which integrates ISA mode tracking and hot-swapping of software instruction decoders. We demonstrate how this can be achieved in a retargetable DBT system, where the target ISA is not hard-coded, but a processor-specific module is generated from a high-level architecture description. We have implemented ARM V5T support in our DBT and demonstrate execution rates of up to 1148 MIPS for the SPEC CPU 2006 benchmarks compiled for ARM/THUMB, achieving on average 192%, and up to 323%, of the speed of QEMU, which has been subject to intensive manual performance tuning and requires significant low-level effort for retargeting.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A virtual platform for exploring hierarchical interconnection for many-accelerator systems 探索多加速器系统分层互连的虚拟平台
Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos
The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.
由于对高性能和能效的要求不断提高,多加速器片上系统(SoC)的出现导致了对系统组件之间新的互连方案的需求,从而最大限度地减少了通信开销。针对这种需求,分层片上网络(hnoc)可以为这样的系统提供一种有效的通信范式:每个节点都是一个自治的子网,包括各自应用程序线程所需的硬件加速器,从而保留数据局域性并最大限度地减少拥塞。然而,由于子网和整体HNoC的参数组合众多,HNoC设计可能导致设计空间大小呈指数级增长。此外,需要一个支持具有真实刺激的HNoC仿真的原型框架对于准确的系统评估至关重要。因此,本文的目标是提出(a)一个用于循环精确模拟分层NoC的SystemC框架,并附带用于在HNoC上进行节点映射的NoC API;(b)旨在减少增加的设计空间大小的探索流程。以MRI扫描的专家去噪算法为例,与典型的DSE流程相比,所提出的DSE流程可以分别实现2倍和1.48倍的时间和功率改进。
{"title":"A virtual platform for exploring hierarchical interconnection for many-accelerator systems","authors":"Efstathios Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos","doi":"10.1109/SAMOS.2015.7363703","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363703","url":null,"abstract":"The advent of many-accelerator Systems-on-Chip (SoC), as a result of the ever increasing demands for high performance and energy efficiency, has lead to the need for new interconnection schemes among the system components, which minimize the communication overhead. Towards this need, Hierarchical Networks-on-Chip (HNoCs) can provide an efficient communication paradigm for such systems: Each node is an autonomous sub-network including the hardware accelerators needed by the respective application thread, thus retaining data locality and minimizing congestion. However, HNoC design may lead to exponential increase in the design space size, due to the numerous parameter combinations of the sub-networks and the overall HNoC. In addition, the need for a prototyping framework supporting HNoC simulation with real stimuli is crucial for the accurate system evaluation. Therefore, the goal of this paper is to present (a) a SystemC framework for cycle-accurate simulation of Hierarchical NoCs, accompanied with a NoC API for node mapping on the HNoC; and (b) an exploration flow that targets to reduce the increased design space size. By using the Rician Denoising algorithm for MRI scans as a case study, the proposed DSE flow could achieve up to 2× and 1.48× time and power improvements respectively, as compared to a typical DSE flow.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models 周期精确的高抽象基于systemc的CPU模型的功率估计技术
Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker
Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.
由于嵌入式系统设计的复杂性不断增加,并且需要在早期设计阶段对系统进行快速评估,因此使用被称为虚拟平台(vp)的仿真模型至关重要,因为它们可以在更高的抽象级别上对系统进行建模。由于典型的VP具有多个相互依赖的组件,因此使用VP库是为了提供常用硬件组件(如cpu)的现成模型。但是,现有的VP库不能充分支持CPU功率估计。此外,现有的功率特性技术需要在早期设计阶段并不总是可用的架构细节。为了解决这一问题,本文提出了一种针对SystemC/TLM库的CPU模型功率标注技术,以便在更高的抽象层次上实现准确的功率估计。通过在Xilinx Microblaze软处理器的功率标注的SystemC/TLM模型上的一组基准测试表明,与实际系统功率测量值相比,该方法可以实现准确的功率估计,估计误差范围为0.47% ~ 6.11%,平均为2%。
{"title":"A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models","authors":"Efstathios Sotiriou-Xanthopoulos, Shalina Percy Delicia, P. Figuli, K. Siozios, G. Economakos, J. Becker","doi":"10.1109/SAMOS.2015.7363661","DOIUrl":"https://doi.org/10.1109/SAMOS.2015.7363661","url":null,"abstract":"Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been of utmost importance as they enable system modeling at higher abstraction levels. Since a typical VP features multiple interdependent components, VP libraries have been utilized in order to provide off-the-shelf models of commonly-used hardware components, such as CPUs. However, CPU power estimation is not adequately supported by existing VP libraries. In addition, existing power characterization techniques require architectural details which are not always available in early design stages. To address this issue, this paper proposes a technique for power annotation of CPU models targeting SystemC/TLM libraries in order to enable the accurate power estimation at higher abstraction levels. By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the real-system power measurements as the estimation error ranges from 0.47% up to 6.11% with an average of 2%.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1