首页 > 最新文献

International Conference on Hardware/Software Codesign and System Synthesis最新文献

英文 中文
Stack oriented data cache filtering 面向堆栈的数据缓存过滤
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629472
Rodrígo González-Alberquilla, Fernando Castro, L. Piñuel, F. Tirado
The L1 data cache is one of the most frequently accessed structures in the processor. Because of this and its moderate size it is a major consumer of power. In order to reduce its power consumption, in this paper a small filter structure that exploits the special features of the references to the stack region is proposed. This filter, which acts as a top -non-inclusive- level of the data memory hierarchy, consists of a register set that keeps the data stored in the neighborhood of the top of the stack. Our simulation results show that using a small Stack Filter (SF) of only a few registers, 15% to 30% data cache power savings can be achieved on average, with a negligible performance penalty.
L1数据缓存是处理器中访问最频繁的结构之一。由于这一点和它的中等大小,它是一个主要的电力消费者。为了降低滤波器的功耗,本文提出了一种小型滤波器结构,该结构充分利用了参考点在堆栈区域的特殊特性。此筛选器充当数据内存层次结构的顶级(不包含)级别,由一个寄存器集组成,该寄存器集将数据存储在堆栈顶部的邻域中。我们的模拟结果表明,使用只有几个寄存器的小型堆栈滤波器(SF),可以平均节省15%到30%的数据缓存功率,而性能损失可以忽略不计。
{"title":"Stack oriented data cache filtering","authors":"Rodrígo González-Alberquilla, Fernando Castro, L. Piñuel, F. Tirado","doi":"10.1145/1629435.1629472","DOIUrl":"https://doi.org/10.1145/1629435.1629472","url":null,"abstract":"The L1 data cache is one of the most frequently accessed structures in the processor. Because of this and its moderate size it is a major consumer of power. In order to reduce its power consumption, in this paper a small filter structure that exploits the special features of the references to the stack region is proposed. This filter, which acts as a top -non-inclusive- level of the data memory hierarchy, consists of a register set that keeps the data stored in the neighborhood of the top of the stack. Our simulation results show that using a small Stack Filter (SF) of only a few registers, 15% to 30% data cache power savings can be achieved on average, with a negligible performance penalty.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122506449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Portable SystemC-on-a-chip 便携式SystemC-on-a-chip
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629439
Scott Sirowy, Bailey Miller, F. Vahid
SystemC allows description of a digital system using traditional programming features as well as spatial connectivity features common in hardware description languages. We describe an approach for in-system emulation of circuits described in SystemC. SystemC emulation provides a number of benefits over synthesis, including fast compilation, faster design time, and lower tool cost. The approach involves a new SystemC bytecode format that executes on an emulation engine running on the microprocessor and/or FPGA of a development platform. Portability is enhanced via a USB flash-drive approach to loading the bytecode format onto the platform. Performance is improved using emulation accelerators on an FPGA. We describe our SystemC-to-bytecode compiler, bytecode format, emulation engine, and emulation accelerators. We illustrate use of the approach on a variety of examples, showing easy porting of a single application across various platforms, and showing emulation speed on an FPGA that is comparable to SystemC execution on a PC.
SystemC允许使用传统的编程功能以及硬件描述语言中常见的空间连接功能来描述数字系统。我们描述了SystemC中描述的电路的系统内仿真方法。SystemC仿真提供了许多优于合成的优点,包括快速编译、更快的设计时间和更低的工具成本。该方法涉及一种新的SystemC字节码格式,该格式在运行在微处理器和/或开发平台的FPGA上的仿真引擎上执行。通过将字节码格式加载到平台上的USB闪存驱动器方法增强了可移植性。在FPGA上使用仿真加速器可以提高性能。我们描述了我们的systemc到字节码编译器、字节码格式、仿真引擎和仿真加速器。我们在各种示例中说明了该方法的使用,展示了跨各种平台移植单个应用程序的简单性,并展示了FPGA上与PC上的SystemC执行速度相当的仿真速度。
{"title":"Portable SystemC-on-a-chip","authors":"Scott Sirowy, Bailey Miller, F. Vahid","doi":"10.1145/1629435.1629439","DOIUrl":"https://doi.org/10.1145/1629435.1629439","url":null,"abstract":"SystemC allows description of a digital system using traditional programming features as well as spatial connectivity features common in hardware description languages. We describe an approach for in-system emulation of circuits described in SystemC. SystemC emulation provides a number of benefits over synthesis, including fast compilation, faster design time, and lower tool cost. The approach involves a new SystemC bytecode format that executes on an emulation engine running on the microprocessor and/or FPGA of a development platform. Portability is enhanced via a USB flash-drive approach to loading the bytecode format onto the platform. Performance is improved using emulation accelerators on an FPGA. We describe our SystemC-to-bytecode compiler, bytecode format, emulation engine, and emulation accelerators. We illustrate use of the approach on a variety of examples, showing easy porting of a single application across various platforms, and showing emulation speed on an FPGA that is comparable to SystemC execution on a PC.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117164375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
SARA: StreAm register allocation 流寄存器分配
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629442
P. Raghavan, F. Catthoor
Low power design criteria for embedded systems have lead to many innovative architectures. One of the core architectural changes that have come in the recent past are streaming registers. These architectures have been shown to be both power efficient and performance efficient. However code has to be efficiently mapped on them to make maximal use of their potential. This paper introduces a novel technique for compiling C code on streaming registers. The proposed technique not only uses the temporal locality in arrays but also spatial locality to map code on streaming registers. The proposed Stream Register Allocation (SARA) technique is also shown to provide good mapping efficiency as well as it is shown to be scalable on realistic applications.
嵌入式系统的低功耗设计标准导致了许多创新的架构。最近出现的核心架构变化之一是流寄存器。这些架构已被证明既节能又高效。然而,代码必须有效地映射到它们上,以最大限度地利用它们的潜力。本文介绍了一种在流寄存器上编译C代码的新技术。该技术不仅利用阵列的时间局部性,而且利用空间局部性来映射流寄存器上的代码。本文提出的流寄存器分配(SARA)技术不仅具有良好的映射效率,而且在实际应用中具有可扩展性。
{"title":"SARA: StreAm register allocation","authors":"P. Raghavan, F. Catthoor","doi":"10.1145/1629435.1629442","DOIUrl":"https://doi.org/10.1145/1629435.1629442","url":null,"abstract":"Low power design criteria for embedded systems have lead to many innovative architectures. One of the core architectural changes that have come in the recent past are streaming registers. These architectures have been shown to be both power efficient and performance efficient. However code has to be efficiently mapped on them to make maximal use of their potential. This paper introduces a novel technique for compiling C code on streaming registers. The proposed technique not only uses the temporal locality in arrays but also spatial locality to map code on streaming registers. The proposed Stream Register Allocation (SARA) technique is also shown to provide good mapping efficiency as well as it is shown to be scalable on realistic applications.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"16 1-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120996163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building heterogeneous reconfigurable systems with a hardware microkernel 用硬件微内核构建异构可重构系统
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629489
J. Agron, D. Andrews
Field Programmable Gate Arrays (FPGAs) have long held the promise of allowing designers to create systems with performance levels close to custom circuits but with a softwarelike productivity for reconfiguring the gates. Unfortunately achieving this promise has been elusive. Modern platform FPGAs are now large enough to support complete heterogeneous Multiprocessor System-On-Chips (MPSoCs), however standardized design flows and programming models for such platforms do not yet exist. To achieve true softwarelike levels of productivity, the design flow and development environment for heterogeneous MPSoCs must resemble that of standard homogeneous systems. In this paper we present a new design flow and run-time system that enables developers to program a heterogeneous MPSoC using standard POSIX-compatible programming abstractions. The ability to use a standard programming model is achieved by using a hardware-based microkernel to provide OS services to all heterogeneous components. This approach makes programming heterogeneous MPSoCs transparent, and can increase programmer productivity by replacing synthesis of custom components with faster compilation of heterogeneous executables. The use of a hardware microkernel provides OS services in an ISA-neutral manner, which allows for seamless synchronization and communication amongst heterogeneous threads.
现场可编程门阵列(fpga)长期以来一直承诺允许设计人员创建性能水平接近定制电路的系统,但具有类似软件的生产力,可以重新配置门。不幸的是,实现这一承诺一直难以实现。现代平台fpga现在足够大,可以支持完整的异构多处理器片上系统(mpsoc),但是这种平台的标准化设计流程和编程模型还不存在。为了实现真正的类似软件的生产力水平,异构mpsoc的设计流程和开发环境必须类似于标准的同构系统。在本文中,我们提出了一个新的设计流程和运行时系统,使开发人员能够使用标准的posix兼容编程抽象来编程异构MPSoC。使用标准编程模型的能力是通过使用基于硬件的微内核向所有异构组件提供操作系统服务来实现的。这种方法使异构mpsoc的编程变得透明,并且可以通过用异构可执行文件的更快编译取代自定义组件的合成来提高程序员的工作效率。硬件微内核的使用以isa中立的方式提供OS服务,这允许异构线程之间的无缝同步和通信。
{"title":"Building heterogeneous reconfigurable systems with a hardware microkernel","authors":"J. Agron, D. Andrews","doi":"10.1145/1629435.1629489","DOIUrl":"https://doi.org/10.1145/1629435.1629489","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) have long held the promise of allowing designers to create systems with performance levels close to custom circuits but with a softwarelike productivity for reconfiguring the gates. Unfortunately achieving this promise has been elusive. Modern platform FPGAs are now large enough to support complete heterogeneous Multiprocessor System-On-Chips (MPSoCs), however standardized design flows and programming models for such platforms do not yet exist. To achieve true softwarelike levels of productivity, the design flow and development environment for heterogeneous MPSoCs must resemble that of standard homogeneous systems. In this paper we present a new design flow and run-time system that enables developers to program a heterogeneous MPSoC using standard POSIX-compatible programming abstractions. The ability to use a standard programming model is achieved by using a hardware-based microkernel to provide OS services to all heterogeneous components. This approach makes programming heterogeneous MPSoCs transparent, and can increase programmer productivity by replacing synthesis of custom components with faster compilation of heterogeneous executables. The use of a hardware microkernel provides OS services in an ISA-neutral manner, which allows for seamless synchronization and communication amongst heterogeneous threads.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
On-the-fly hardware acceleration for protocol stack processing in next generation mobile devices 下一代移动设备协议栈处理的实时硬件加速
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629457
David Szczesny, S. Hessel, Felix Bruns, A. Bilgic
In this paper we present a new on-the-fly hardware acceleration approach, based on a smart Direct Memory Access (sDMA) controller, for the layer 2 (L2) downlink protocol stack processing in Long Term Evolution (LTE) and beyond mobile devices. We use virtual prototyping in order to simulate an ARM1176 processor based hardware platform together with the executed software comprising an LTE protocol stack model. The sDMA controller with diff erent hardware accelerator units for the time critical algorithms in the protocol stack is implemented and integrated in the hardware platform. We prove our new hardware/software partitioning concept for the LTE L2 by measuring the average execution time per transport block in the protocol stack at di fferent activated on-the-fly hardware acceleration stages in the sDMA controller. At LTE data rates of 100 Mbit/s, we achieve a speedup of 24% compared to a pure software implementation by enabling the sDMA hardware support for header processing in the protocol stack. Furthermore, an activation of the complete on-the-fly hardware acceleration in the sDMA controller, including on-the-fly deciphering, leads to a speedup of more than 50 %. Finally, at transmission conditions with more computational demands and data rates up to 320 Mbit/s, we obtain acceleration ratios of almost 80 %. Investigations show that our new sDMA on-the-fly hardware acceleration approach in combination with a single-core processor off ers the required computational power for next generation mobile devices.
在本文中,我们提出了一种新的实时硬件加速方法,基于智能直接存储器访问(sDMA)控制器,用于长期演进(LTE)和超越移动设备的第2层(L2)下行链路协议堆栈处理。我们使用虚拟原型来模拟基于ARM1176处理器的硬件平台以及包含LTE协议栈模型的执行软件。针对协议栈中的时间关键型算法,采用不同硬件加速单元的sDMA控制器实现并集成在硬件平台上。通过测量sDMA控制器中不同激活的动态硬件加速阶段协议栈中每个传输块的平均执行时间,我们证明了我们针对LTE L2的新硬件/软件分区概念。在100mbit /s的LTE数据速率下,通过在协议栈中启用sDMA硬件支持报头处理,与纯软件实现相比,我们实现了24%的加速。此外,在sDMA控制器中激活完整的实时硬件加速,包括实时解码,导致加速超过50%。最后,在具有更多计算需求和高达320 Mbit/s的数据速率的传输条件下,我们获得了几乎80%的加速比。调查表明,我们新的sDMA实时硬件加速方法与单核处理器相结合,可以为下一代移动设备提供所需的计算能力。
{"title":"On-the-fly hardware acceleration for protocol stack processing in next generation mobile devices","authors":"David Szczesny, S. Hessel, Felix Bruns, A. Bilgic","doi":"10.1145/1629435.1629457","DOIUrl":"https://doi.org/10.1145/1629435.1629457","url":null,"abstract":"In this paper we present a new on-the-fly hardware acceleration approach, based on a smart Direct Memory Access (sDMA) controller, for the layer 2 (L2) downlink protocol stack processing in Long Term Evolution (LTE) and beyond mobile devices. We use virtual prototyping in order to simulate an ARM1176 processor based hardware platform together with the executed software comprising an LTE protocol stack model. The sDMA controller with diff erent hardware accelerator units for the time critical algorithms in the protocol stack is implemented and integrated in the hardware platform. We prove our new hardware/software partitioning concept for the LTE L2 by measuring the average execution time per transport block in the protocol stack at di fferent activated on-the-fly hardware acceleration stages in the sDMA controller. At LTE data rates of 100 Mbit/s, we achieve a speedup of 24% compared to a pure software implementation by enabling the sDMA hardware support for header processing in the protocol stack. Furthermore, an activation of the complete on-the-fly hardware acceleration in the sDMA controller, including on-the-fly deciphering, leads to a speedup of more than 50 %. Finally, at transmission conditions with more computational demands and data rates up to 320 Mbit/s, we obtain acceleration ratios of almost 80 %. Investigations show that our new sDMA on-the-fly hardware acceleration approach in combination with a single-core processor off ers the required computational power for next generation mobile devices.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133431268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
TotalProf: a fast and accurate retargetable source code profiler TotalProf:一个快速和准确的可重新定位的源代码分析器
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629477
L. Gao, Jia Huang, J. Ceng, R. Leupers, G. Ascheid, H. Meyr
Profilers play an important role in software/hardware design, optimization, and verification. Various approaches have been proposed to implement profilers. The most widespread approach adopted in the embedded domain is Instruction Set Simulation (ISS) based profiling, which provides uncompromised accuracy but limited execution speed. Source code profilers, on the contrary, are fast but less accurate. This paper introduces TotalProf, a fast and accurate source code cross profiler that estimates the performance of an application from three aspects: First, code optimization and a novel virtual compiler backend are employed to resemble the course of target compilation. Second, an optimistic static scheduler is introduced to estimate the behavior of the target processor's datapath. Last but not least, dynamic events, such as cache misses, bus contention and branch prediction failures, are simulated at runtime. With an abstract architecture description, the tool can be easily retargeted in a performance characteristics oriented way to estimate different processor architectures, including DSPs and VLIW machines. Multiple instances of TotalProf can be integrated with SystemC to support heterogeneous Multi-Processor System-on-Chip (MPSoC) profiling. With only about a 5 to 15% error rate introduced to the major performance metrics, such as cycle count, memory accesses and cache misses, a more than one Giga-Instruction-Per-Second (GIPS) execution speed is achieved.
分析器在软件/硬件设计、优化和验证中扮演着重要的角色。已经提出了各种实现分析器的方法。在嵌入式领域采用的最广泛的方法是基于指令集仿真(ISS)的分析,它提供了不打折扣的准确性,但限制了执行速度。相反,源代码分析器速度快,但不太准确。本文介绍了一种快速、准确的源代码交叉分析器TotalProf,它从三个方面对应用程序的性能进行评估:首先,采用代码优化和一种新颖的虚拟编译器后端来模拟目标编译过程;其次,引入乐观静态调度器来估计目标处理器数据路径的行为。最后但并非最不重要的是,动态事件,如缓存丢失、总线争用和分支预测失败,在运行时进行模拟。通过抽象的体系结构描述,该工具可以很容易地以面向性能特征的方式重新定位,以估计不同的处理器体系结构,包括dsp和VLIW机器。TotalProf的多个实例可以与SystemC集成,以支持异构多处理器片上系统(MPSoC)分析。在主要性能指标(如周期计数、内存访问和缓存丢失)中只引入5%到15%的错误率,就可以实现超过每秒千兆指令(GIPS)的执行速度。
{"title":"TotalProf: a fast and accurate retargetable source code profiler","authors":"L. Gao, Jia Huang, J. Ceng, R. Leupers, G. Ascheid, H. Meyr","doi":"10.1145/1629435.1629477","DOIUrl":"https://doi.org/10.1145/1629435.1629477","url":null,"abstract":"Profilers play an important role in software/hardware design, optimization, and verification. Various approaches have been proposed to implement profilers. The most widespread approach adopted in the embedded domain is Instruction Set Simulation (ISS) based profiling, which provides uncompromised accuracy but limited execution speed. Source code profilers, on the contrary, are fast but less accurate. This paper introduces TotalProf, a fast and accurate source code cross profiler that estimates the performance of an application from three aspects: First, code optimization and a novel virtual compiler backend are employed to resemble the course of target compilation. Second, an optimistic static scheduler is introduced to estimate the behavior of the target processor's datapath. Last but not least, dynamic events, such as cache misses, bus contention and branch prediction failures, are simulated at runtime. With an abstract architecture description, the tool can be easily retargeted in a performance characteristics oriented way to estimate different processor architectures, including DSPs and VLIW machines. Multiple instances of TotalProf can be integrated with SystemC to support heterogeneous Multi-Processor System-on-Chip (MPSoC) profiling. With only about a 5 to 15% error rate introduced to the major performance metrics, such as cycle count, memory accesses and cache misses, a more than one Giga-Instruction-Per-Second (GIPS) execution speed is achieved.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131070714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Configuration and control of SystemC models using TLM middleware 使用TLM中间件配置和控制SystemC模型
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629447
C. Schröder, Wolfgang Klingauf, Robert Günzel, M. Burton, Eric Roesler
With the emergance of ESL design methodologies, frameworks are being developed to enable engineers to easily configure and control models-under-simulation. Each of these frameworks has proven good for its specific use case, but they are incompatible. ESL engineers must be able to leverage models and tools from different sources in order to be successful. But with today's diversity of configuration mechanisms, engineers spend too much time writing adapters between models that have been developed using different tools. We see a need for making the various existing configuration mechanisms cooperate. We present a solution based on a SystemC middleware. The middleware uses a generic transaction passing mechanism based on TLM-2 concepts and provides inter-operability between the different configuration interfaces in a heterogeneous design. The paper analyses configuration in general and explains the technical consideration for our middleware and shows how it makes the state-of-the-art configuration frameworks inter-operable.
随着ESL设计方法的出现,开发框架使工程师能够轻松地配置和控制仿真模型。这些框架中的每一个都被证明适合其特定的用例,但是它们是不兼容的。为了取得成功,ESL工程师必须能够利用来自不同来源的模型和工具。但是由于今天配置机制的多样性,工程师花费了太多的时间在使用不同工具开发的模型之间编写适配器。我们认为有必要使现有的各种配置机制相互合作。我们提出了一个基于SystemC中间件的解决方案。中间件使用基于TLM-2概念的通用事务传递机制,并在异构设计中提供不同配置接口之间的互操作性。本文从总体上分析了配置,解释了中间件的技术考虑,并展示了如何使最先进的配置框架具有互操作性。
{"title":"Configuration and control of SystemC models using TLM middleware","authors":"C. Schröder, Wolfgang Klingauf, Robert Günzel, M. Burton, Eric Roesler","doi":"10.1145/1629435.1629447","DOIUrl":"https://doi.org/10.1145/1629435.1629447","url":null,"abstract":"With the emergance of ESL design methodologies, frameworks are being developed to enable engineers to easily configure and control models-under-simulation. Each of these frameworks has proven good for its specific use case, but they are incompatible.\u0000 ESL engineers must be able to leverage models and tools from different sources in order to be successful. But with today's diversity of configuration mechanisms, engineers spend too much time writing adapters between models that have been developed using different tools. We see a need for making the various existing configuration mechanisms cooperate.\u0000 We present a solution based on a SystemC middleware. The middleware uses a generic transaction passing mechanism based on TLM-2 concepts and provides inter-operability between the different configuration interfaces in a heterogeneous design. The paper analyses configuration in general and explains the technical consideration for our middleware and shows how it makes the state-of-the-art configuration frameworks inter-operable.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115372578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Efficient dynamic voltage/frequency scaling through algorithmic loop transformation 通过算法环变换实现高效的动态电压/频率缩放
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629464
M. Ghodrat, T. Givargis
We present a novel loop transformation technique, particularly well suited for optimizing embedded compilers, where an increase in compilation time is acceptable in exchange for significant reduction in energy consumption. Our technique transforms loops containing nested conditional blocks. Specifically, the transformation takes advantage of the fact that the Boolean value of a conditional expression, determining the true/false paths, can be statically analyzed and this information, combined with loop dependency information, can be used to break up the original loop, containing conditional expressions, into a number of smaller loops without conditional expressions. Subsequently, each of the smaller loops can be executed at the lowest voltage/frequency setting yielding overall energy reduction. Our experiments with loop kernels from mpeg4, mpeg-decoder, mpeg-encoder, mp3, qsdpcm and gimp show an impressive energy reduction of 26.56% (average) and 66% (best case) when running on a StrongARM embedded processor. The energy reduction was obtained at no additional performance penalty.
我们提出了一种新颖的循环转换技术,特别适合于优化嵌入式编译器,其中可以接受增加编译时间以换取显著降低能耗。我们的技术转换包含嵌套条件块的循环。具体来说,该转换利用了这样一个事实,即可以静态分析条件表达式的布尔值(确定真/假路径),并且该信息与循环依赖信息相结合,可用于将包含条件表达式的原始循环分解为许多不包含条件表达式的较小循环。随后,每个较小的回路可以在最低电压/频率设置下执行,从而降低整体能量。我们对来自mpeg4、mpeg-解码器、mpeg-编码器、mp3、qsdpcm和gimp的循环内核进行的实验显示,在StrongARM嵌入式处理器上运行时,能量降低了26.56%(平均)和66%(最佳情况)。在没有额外性能损失的情况下获得了能量减少。
{"title":"Efficient dynamic voltage/frequency scaling through algorithmic loop transformation","authors":"M. Ghodrat, T. Givargis","doi":"10.1145/1629435.1629464","DOIUrl":"https://doi.org/10.1145/1629435.1629464","url":null,"abstract":"We present a novel loop transformation technique, particularly well suited for optimizing embedded compilers, where an increase in compilation time is acceptable in exchange for significant reduction in energy consumption. Our technique transforms loops containing nested conditional blocks. Specifically, the transformation takes advantage of the fact that the Boolean value of a conditional expression, determining the true/false paths, can be statically analyzed and this information, combined with loop dependency information, can be used to break up the original loop, containing conditional expressions, into a number of smaller loops without conditional expressions. Subsequently, each of the smaller loops can be executed at the lowest voltage/frequency setting yielding overall energy reduction. Our experiments with loop kernels from mpeg4, mpeg-decoder, mpeg-encoder, mp3, qsdpcm and gimp show an impressive energy reduction of 26.56% (average) and 66% (best case) when running on a StrongARM embedded processor. The energy reduction was obtained at no additional performance penalty.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring hybrid photonic networks-on-chip foremerging chip multiprocessors 探索片上混合光子网络-融合片上多处理器
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629453
Shirish Bahirat, S. Pasricha
Increasing application complexity and improvements in process technology have today enabled chip multiprocessors (CMPs) with tens to hundreds of cores on a chip. Networks on Chip (NoCs) have emerged as scalable communication fabrics that can support high bandwidths for these massively parallel systems. However, traditional electrical NoC implementations still need to overcome the challenges of high data transfer latencies and large power consumption. On-chip photonic interconnects have recently been proposed as an alternative to address these challenges, with high performance-per-watt characteristics for intra-chip communication. In this paper, we explore using photonic interconnects on a chip to enhance traditional electrical NoCs. Our proposed hybrid photonic NoC utilizes a photonic ring waveguide to enhance a traditional 2D electrical mesh NoC. Experimental results indicate a strong motivation for considering the proposed hybrid photonic NoC for future CMPs -- as much as a 13× reduction in power consumption and improved throughput and access latencies, compared to traditional electrical 2D mesh and torus NoC architectures.
随着应用程序复杂性的增加和工艺技术的改进,如今的芯片多处理器(cmp)在一个芯片上拥有数十到数百个内核。片上网络(noc)已经成为可扩展的通信结构,可以支持这些大规模并行系统的高带宽。然而,传统的电气NoC实现仍然需要克服高数据传输延迟和大功耗的挑战。片上光子互连最近被提出作为解决这些挑战的替代方案,具有芯片内通信的高每瓦性能特性。在本文中,我们探索了在芯片上使用光子互连来增强传统的电子noc。我们提出的混合光子NoC利用光子环形波导来增强传统的二维电网格NoC。实验结果表明,在未来的cmp中考虑混合光子NoC的强烈动机-与传统的电二维网格和环面NoC架构相比,功耗降低了13倍,吞吐量和访问延迟提高了。
{"title":"Exploring hybrid photonic networks-on-chip foremerging chip multiprocessors","authors":"Shirish Bahirat, S. Pasricha","doi":"10.1145/1629435.1629453","DOIUrl":"https://doi.org/10.1145/1629435.1629453","url":null,"abstract":"Increasing application complexity and improvements in process technology have today enabled chip multiprocessors (CMPs) with tens to hundreds of cores on a chip. Networks on Chip (NoCs) have emerged as scalable communication fabrics that can support high bandwidths for these massively parallel systems. However, traditional electrical NoC implementations still need to overcome the challenges of high data transfer latencies and large power consumption. On-chip photonic interconnects have recently been proposed as an alternative to address these challenges, with high performance-per-watt characteristics for intra-chip communication. In this paper, we explore using photonic interconnects on a chip to enhance traditional electrical NoCs. Our proposed hybrid photonic NoC utilizes a photonic ring waveguide to enhance a traditional 2D electrical mesh NoC. Experimental results indicate a strong motivation for considering the proposed hybrid photonic NoC for future CMPs -- as much as a 13× reduction in power consumption and improved throughput and access latencies, compared to traditional electrical 2D mesh and torus NoC architectures.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121449986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Memory-efficient distribution of regular expressions for fast deep packet inspection 用于快速深度包检测的正则表达式的内存高效分布
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629456
J. Rohrer, K. Atasu, J. V. Lunteren, C. Hagleitner
Current trends in network security force network intrusion detection systems (NIDS) to scan network traffic at wirespeed beyond 10 Gbps against increasingly complex patterns, often specified using regular expressions. As a result, dedicated regular-expression accelerators have recently received considerable attention. The storage efficiency of the compiled patterns is a key factor in the overall performance and critically depends on the distribution of the patterns to a limited number of parallel pattern-matching engines. In this work, we first present a formal definition and complexity analysis of the pattern distribution problem and then introduce optimal and heuristic methods to solve it. Our experiments with five sets of regular expressions from both public and proprietary NIDS result in an up to 8.8x better storage efficiency than the state of the art. The average improvement is 2.3x.
当前的网络安全趋势迫使网络入侵检测系统(NIDS)以超过10gbps的无线速度扫描网络流量,以应对日益复杂的模式(通常使用正则表达式指定)。因此,专用的正则表达式加速器最近受到了相当大的关注。编译模式的存储效率是影响整体性能的一个关键因素,它严重依赖于模式在有限数量的并行模式匹配引擎中的分布。在这项工作中,我们首先提出了模式分布问题的形式化定义和复杂性分析,然后引入了最优和启发式方法来解决它。我们对来自公共和专有NIDS的五组正则表达式进行了实验,结果比现有的存储效率提高了8.8倍。平均改进是2.3倍。
{"title":"Memory-efficient distribution of regular expressions for fast deep packet inspection","authors":"J. Rohrer, K. Atasu, J. V. Lunteren, C. Hagleitner","doi":"10.1145/1629435.1629456","DOIUrl":"https://doi.org/10.1145/1629435.1629456","url":null,"abstract":"Current trends in network security force network intrusion detection systems (NIDS) to scan network traffic at wirespeed beyond 10 Gbps against increasingly complex patterns, often specified using regular expressions. As a result, dedicated regular-expression accelerators have recently received considerable attention. The storage efficiency of the compiled patterns is a key factor in the overall performance and critically depends on the distribution of the patterns to a limited number of parallel pattern-matching engines. In this work, we first present a formal definition and complexity analysis of the pattern distribution problem and then introduce optimal and heuristic methods to solve it. Our experiments with five sets of regular expressions from both public and proprietary NIDS result in an up to 8.8x better storage efficiency than the state of the art. The average improvement is 2.3x.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122078585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
International Conference on Hardware/Software Codesign and System Synthesis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1