首页 > 最新文献

2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation最新文献

英文 中文
Simulative Buffer Analysis of Local Image Processing Algorithms Described by Windowed Synchronous Data Flow 有窗同步数据流描述的局部图像处理算法的仿真缓冲分析
J. Keinert, C. Haubelt, J. Teich
Embedded real-time image processing applications working on large images have to process and store huge amounts of data. Consequently the organization of the memory buffers and the precise determination of the required buffer sizes are critical steps for efficient system implementation. In this paper, we propose a new method, that permits the analysis to be performed automatically for local image processing algorithms. The latter ones are specified by help of the windowed synchronous data flow (WSDF) model, a multi-dimensional model of computation which has been especially designed to represent local image processing algorithms. This paper introduces a corresponding buffer organization leading to solutions comparable to hand-built designs concerning the required memory. Special care is taken, so that also large problems in terms of the image size can be analyzed. The applicability of our approach is demonstrated by help of a JPEG2000 decoder model.
处理大型图像的嵌入式实时图像处理应用程序必须处理和存储大量数据。因此,内存缓冲区的组织和所需缓冲区大小的精确确定是有效实现系统的关键步骤。在本文中,我们提出了一种新的方法,可以自动地对局部图像处理算法进行分析。后者是通过窗口同步数据流(WSDF)模型来指定的,这是一种多维计算模型,专门用于表示本地图像处理算法。本文介绍了一种相应的缓冲区组织,其解决方案可与手工构建的所需内存设计相媲美。特别注意的是,这样在图像尺寸方面也可以分析出较大的问题。通过JPEG2000解码器模型验证了该方法的适用性。
{"title":"Simulative Buffer Analysis of Local Image Processing Algorithms Described by Windowed Synchronous Data Flow","authors":"J. Keinert, C. Haubelt, J. Teich","doi":"10.1109/ICSAMOS.2007.4285747","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285747","url":null,"abstract":"Embedded real-time image processing applications working on large images have to process and store huge amounts of data. Consequently the organization of the memory buffers and the precise determination of the required buffer sizes are critical steps for efficient system implementation. In this paper, we propose a new method, that permits the analysis to be performed automatically for local image processing algorithms. The latter ones are specified by help of the windowed synchronous data flow (WSDF) model, a multi-dimensional model of computation which has been especially designed to represent local image processing algorithms. This paper introduces a corresponding buffer organization leading to solutions comparable to hand-built designs concerning the required memory. Special care is taken, so that also large problems in terms of the image size can be analyzed. The applicability of our approach is demonstrated by help of a JPEG2000 decoder model.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134269536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The Weight-Watcher Service and its Lightweight Implementation Weight-Watcher服务及其轻量级实现
B. Garbinato, R. Guerraoui, J. Hulaas, A. Kounine, Maxime Monod, J. H. Spring
This paper presents the weight-watcher service. This service aims at providing resource consumption measurements and estimations for software executing on resource-constrained devices. By using the weight-watcher, software components can continuously adapt and optimize their quality of service with respect to resource availability. The interface of the service is composed of a profiler and a predictor. We present an implementation that is lightweight in terms of CPU and memory. We also performed various experiments that convey (a) the tradeoff between the memory consumption of the service and the accuracy of the prediction, as well as (b) a maximum overhead of 10% on the execution speed of the VM for the profiler to provide accurate measurements.
本文介绍了减肥服务。该服务旨在为在资源受限设备上执行的软件提供资源消耗度量和估计。通过使用weight- watchcher,软件组件可以根据资源可用性不断调整和优化其服务质量。该服务的接口由分析器和预测器组成。我们提出了一个在CPU和内存方面轻量级的实现。我们还执行了各种实验,以传达(a)服务的内存消耗和预测准确性之间的权衡,以及(b) VM执行速度的最大开销为10%,以便profiler提供准确的测量。
{"title":"The Weight-Watcher Service and its Lightweight Implementation","authors":"B. Garbinato, R. Guerraoui, J. Hulaas, A. Kounine, Maxime Monod, J. H. Spring","doi":"10.1109/ICSAMOS.2007.4285742","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285742","url":null,"abstract":"This paper presents the weight-watcher service. This service aims at providing resource consumption measurements and estimations for software executing on resource-constrained devices. By using the weight-watcher, software components can continuously adapt and optimize their quality of service with respect to resource availability. The interface of the service is composed of a profiler and a predictor. We present an implementation that is lightweight in terms of CPU and memory. We also performed various experiments that convey (a) the tradeoff between the memory consumption of the service and the accuracy of the prediction, as well as (b) a maximum overhead of 10% on the execution speed of the VM for the profiler to provide accurate measurements.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122881901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance and Power Analysis of Parallelized Implementations on an MPCore Multiprocessor Platform MPCore多处理器平台上并行化实现的性能和功耗分析
H. Blume, Jörg von Livonius, Lisa Rotenberg, T. Noll, Harald Bothe, J. Brakensiek
In this contribution, the potential of parallelized software that implements algorithms of digital signal processing on a multicore processor platform is analyzed. For this purpose various digital signal processing tasks have been implemented on a prototyping platform i.e. an ARM MPCore featuring four ARM 11 processor cores. In order to analyze the effect of parallelization on the resulting performance-power ratio, influencing parameters like e.g. the number of issued program threads have been studied. For paralllelization issues the OpenMP programming model has been used which can be efficiently applied on C- level. In order to elaborate power efficient code also a functional and instruction level power model of the MPCore has been derived which features a high estimation accuracy. Using this power model and exploiting the capabilities of OpenMP a variety of exemplary tasks could be efficiently parallelized. The general efficiency potential of parallelization for multiprocessor architectures can be assembled.
在这篇贡献中,分析了在多核处理器平台上实现数字信号处理算法的并行软件的潜力。为此,在原型平台上实现了各种数字信号处理任务,即具有四个ARM 11处理器内核的ARM MPCore。为了分析并行化对产生的性能-功率比的影响,研究了诸如发出的程序线程数等影响参数。对于并行问题,采用了OpenMP编程模型,该模型可以有效地应用于C级。为了编写高效节能的代码,本文还推导了一个具有较高估计精度的功能级和指令级的MPCore功耗模型。使用这个强大的模型并利用OpenMP的功能,可以有效地并行化各种示例任务。多处理器架构的并行化的一般效率潜力可以组装。
{"title":"Performance and Power Analysis of Parallelized Implementations on an MPCore Multiprocessor Platform","authors":"H. Blume, Jörg von Livonius, Lisa Rotenberg, T. Noll, Harald Bothe, J. Brakensiek","doi":"10.1109/ICSAMOS.2007.4285736","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285736","url":null,"abstract":"In this contribution, the potential of parallelized software that implements algorithms of digital signal processing on a multicore processor platform is analyzed. For this purpose various digital signal processing tasks have been implemented on a prototyping platform i.e. an ARM MPCore featuring four ARM 11 processor cores. In order to analyze the effect of parallelization on the resulting performance-power ratio, influencing parameters like e.g. the number of issued program threads have been studied. For paralllelization issues the OpenMP programming model has been used which can be efficiently applied on C- level. In order to elaborate power efficient code also a functional and instruction level power model of the MPCore has been derived which features a high estimation accuracy. Using this power model and exploiting the capabilities of OpenMP a variety of exemplary tasks could be efficiently parallelized. The general efficiency potential of parallelization for multiprocessor architectures can be assembled.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129444424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Simulation-Based Methodology for Evaluating the DPA-Resistance of Cryptographic Functional Units with Application to CMOS and MCML Technologies 基于仿真的密码功能单元抗dpa性评估方法及其在CMOS和MCML技术中的应用
F. Regazzoni, S. Badel, T. Eisenbarth, J. Großschädl, A. Poschmann, Z. Deniz, Marco Macchetti, L. Pozzi, C. Paar, Y. Leblebici, P. Ienne
This paper explores the resistance of MOS current mode logic (MCML) against differential power analysis (DPA) attacks. Circuits implemented in MCML, in fact, have unique characteristics both in terms of power consumption and the dependency of the power profile from the input signal pattern. Therefore, MCML is suitable to protect cryptographic hardware from DPA and similar side-channel attacks. In order to demonstrate the effectiveness of different logic styles against power analysis attacks, the non-linear bijective function of the Kasumi algorithm (known as substitution box S7) was implemented with CMOS and MCML technology, and a set of attacks was performed using power traces derived from SPICE-level simulations. Although all keys were discovered for CMOS, only very few attacks to MCML were successful.
本文探讨了MOS电流模式逻辑(MCML)抵抗差分功率分析(DPA)攻击的能力。实际上,在MCML中实现的电路在功耗和功率分布与输入信号模式的依赖关系方面都具有独特的特性。因此,MCML适用于保护加密硬件免受DPA和类似的侧信道攻击。为了证明不同逻辑风格对功率分析攻击的有效性,采用CMOS和MCML技术实现了Kasumi算法的非线性双目标函数(称为替换盒S7),并使用spice级仿真得出的功率走线执行了一组攻击。虽然所有的密钥都是针对CMOS发现的,但对MCML的攻击只有极少数是成功的。
{"title":"A Simulation-Based Methodology for Evaluating the DPA-Resistance of Cryptographic Functional Units with Application to CMOS and MCML Technologies","authors":"F. Regazzoni, S. Badel, T. Eisenbarth, J. Großschädl, A. Poschmann, Z. Deniz, Marco Macchetti, L. Pozzi, C. Paar, Y. Leblebici, P. Ienne","doi":"10.1109/ICSAMOS.2007.4285753","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285753","url":null,"abstract":"This paper explores the resistance of MOS current mode logic (MCML) against differential power analysis (DPA) attacks. Circuits implemented in MCML, in fact, have unique characteristics both in terms of power consumption and the dependency of the power profile from the input signal pattern. Therefore, MCML is suitable to protect cryptographic hardware from DPA and similar side-channel attacks. In order to demonstrate the effectiveness of different logic styles against power analysis attacks, the non-linear bijective function of the Kasumi algorithm (known as substitution box S7) was implemented with CMOS and MCML technology, and a set of attacks was performed using power traces derived from SPICE-level simulations. Although all keys were discovered for CMOS, only very few attacks to MCML were successful.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Energy efficiency of mobile video decoding 移动视频解码的能效
Tero Rintaluoma, O. Silvén
In this paper, we consider the energy efficiency of implementations of video codecs for mobile devices in a top-down manner. We start from typical applications and analyse device architectures, codec implementations, and software platforms. The physical size of mobile devices limits their heat dissipation, while the battery capacity needs to be used conservingly to provide for satisfactory untethered active use time. Together with the required versatile capabilities of the devices, these are essential constraints that must be taken into account from hardware to application software design. In video decoding additional constraints come from the need to support multiple digital video coding standards, and the platform oriented design regimes of the device manufacturers.
在本文中,我们以自顶向下的方式考虑移动设备视频编解码器实现的能效。我们从典型应用开始,分析设备架构、编解码器实现和软件平台。移动设备的物理尺寸限制了它们的散热,而电池容量需要被保守地使用,以提供令人满意的不受束缚的主动使用时间。再加上设备所需的多功能,这些都是从硬件到应用软件设计必须考虑的基本限制。在视频解码中,额外的限制来自支持多种数字视频编码标准的需要,以及设备制造商面向平台的设计制度。
{"title":"Energy efficiency of mobile video decoding","authors":"Tero Rintaluoma, O. Silvén","doi":"10.1109/ICSAMOS.2007.4285740","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285740","url":null,"abstract":"In this paper, we consider the energy efficiency of implementations of video codecs for mobile devices in a top-down manner. We start from typical applications and analyse device architectures, codec implementations, and software platforms. The physical size of mobile devices limits their heat dissipation, while the battery capacity needs to be used conservingly to provide for satisfactory untethered active use time. Together with the required versatile capabilities of the devices, these are essential constraints that must be taken into account from hardware to application software design. In video decoding additional constraints come from the need to support multiple digital video coding standards, and the platform oriented design regimes of the device manufacturers.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125699557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Maximum and Sorted Cache Occupation Using Array Padding 使用数组填充的最大和排序缓存占用
E. Herruzo, E. Zapata, O. Plata
The paper describes a framework for analyzing the cache content on affine references to arrays in loops. The framework is based on a small set of key cache parameters. We study the relation between these cache parameters and the data memory layout of arrays to demonstrate how to use array padding (static array re-dimensioning) to optimize the use of the cache. Based on the cache model we present a method to carry out intra-array padding for a maximum cache occupation and for a maximum sorted cache occupation, and a simple method to carry out inter-array padding. We also present an experimental evaluation of our techniques using a cache simulator and actual code executions on the MIPS R10K processor.
本文描述了一个分析循环中对数组仿射引用的缓存内容的框架。该框架基于一小组关键缓存参数。我们研究了这些缓存参数与数组的数据存储布局之间的关系,以演示如何使用数组填充(静态数组重维)来优化缓存的使用。基于缓存模型,我们提出了一种针对最大缓存占用和最大排序缓存占用进行数组内填充的方法,以及一种实现数组间填充的简单方法。我们还在MIPS R10K处理器上使用缓存模拟器和实际代码执行对我们的技术进行了实验评估。
{"title":"Maximum and Sorted Cache Occupation Using Array Padding","authors":"E. Herruzo, E. Zapata, O. Plata","doi":"10.1109/ICSAMOS.2007.4285749","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285749","url":null,"abstract":"The paper describes a framework for analyzing the cache content on affine references to arrays in loops. The framework is based on a small set of key cache parameters. We study the relation between these cache parameters and the data memory layout of arrays to demonstrate how to use array padding (static array re-dimensioning) to optimize the use of the cache. Based on the cache model we present a method to carry out intra-array padding for a maximum cache occupation and for a maximum sorted cache occupation, and a simple method to carry out inter-array padding. We also present an experimental evaluation of our techniques using a cache simulator and actual code executions on the MIPS R10K processor.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133426511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instruction Set Encoding Optimization for Code Size Reduction 指令集编码优化的代码大小减少
M. Med, A. Krall
In an embedded system, the cost of storing a program on-chip can be as high as the cost of the microprocessor itself. We examine how much a given application's program size can be reduced when an instruction set is tailored to the application. We provide different algorithms for calculating an optimized instruction set and evaluate their impact on the size of several benchmark programs. Our results show that an average reduction of 11% is possible, and further improvement can be achieved by changing the instruction length of the given architecture. However compiling other applications with such an optimized instruction set might produce larger code sizes.
在嵌入式系统中,在芯片上存储程序的成本可能与微处理器本身的成本一样高。我们将研究在为应用程序量身定制指令集时,给定应用程序的程序大小可以减少多少。我们提供了不同的算法来计算优化的指令集,并评估了它们对几个基准程序大小的影响。我们的结果表明,平均减少11%是可能的,并且可以通过改变给定架构的指令长度来实现进一步的改进。但是,用这种优化的指令集编译其他应用程序可能会产生更大的代码大小。
{"title":"Instruction Set Encoding Optimization for Code Size Reduction","authors":"M. Med, A. Krall","doi":"10.1109/ICSAMOS.2007.4285728","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285728","url":null,"abstract":"In an embedded system, the cost of storing a program on-chip can be as high as the cost of the microprocessor itself. We examine how much a given application's program size can be reduced when an instruction set is tailored to the application. We provide different algorithms for calculating an optimized instruction set and evaluate their impact on the size of several benchmark programs. Our results show that an average reduction of 11% is possible, and further improvement can be achieved by changing the instruction length of the given architecture. However compiling other applications with such an optimized instruction set might produce larger code sizes.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Application Case Studies on HS-Scale, a MP-SOC for Embbeded Systems 嵌入式系统MP-SOC HS-Scale的应用案例研究
N. Saint-Jean, P. Benoit, G. Sassatelli, L. Torres, M. Robert
Scalability of architecture, programming model and task control management will be a major challenge for MP-SOC designs in the coming years. The contribution presented in this paper is HS-Scale, a hardware/software framework to study, define and experiment scalable solutions for next generation MP-SOC. The hardware architecture, H-Scale, is a homogeneous MP-SOC based on RISC processors, distributed memories and a globally asynchronous/locally synchronous network on chip. S-Scale is the software support to program H-Scale. It is a multithreaded sequential programming model with dedicated communication primitives handled at run-time by a simple operating system we developed. The hardware validations on FPGA and CMOS 90 nm technology and the experimental case studies on several applications (FIR, DES and MJPEG) demonstrate the scalability of our approach and draws interesting perspectives to automate task placement and duplication.
架构、编程模型和任务控制管理的可扩展性将是未来几年MP-SOC设计的主要挑战。本文提出的贡献是HS-Scale,一个硬件/软件框架,用于研究,定义和实验下一代MP-SOC的可扩展解决方案。硬件架构H-Scale是基于RISC处理器、分布式存储器和芯片上的全局异步/本地同步网络的同质MP-SOC。S-Scale是对H-Scale进行编程的软件支持。它是一个多线程顺序编程模型,具有专用的通信原语,由我们开发的简单操作系统在运行时处理。FPGA和CMOS 90纳米技术的硬件验证以及几种应用(FIR, DES和MJPEG)的实验案例研究证明了我们方法的可扩展性,并为自动化任务放置和复制提供了有趣的视角。
{"title":"Application Case Studies on HS-Scale, a MP-SOC for Embbeded Systems","authors":"N. Saint-Jean, P. Benoit, G. Sassatelli, L. Torres, M. Robert","doi":"10.1109/ICSAMOS.2007.4285738","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285738","url":null,"abstract":"Scalability of architecture, programming model and task control management will be a major challenge for MP-SOC designs in the coming years. The contribution presented in this paper is HS-Scale, a hardware/software framework to study, define and experiment scalable solutions for next generation MP-SOC. The hardware architecture, H-Scale, is a homogeneous MP-SOC based on RISC processors, distributed memories and a globally asynchronous/locally synchronous network on chip. S-Scale is the software support to program H-Scale. It is a multithreaded sequential programming model with dedicated communication primitives handled at run-time by a simple operating system we developed. The hardware validations on FPGA and CMOS 90 nm technology and the experimental case studies on several applications (FIR, DES and MJPEG) demonstrate the scalability of our approach and draws interesting perspectives to automate task placement and duplication.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126949400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Flexibility Inlining into Arithmetic Data-paths Exploiting A Regular Interconnection Scheme 利用规则互连方案将算术数据路径灵活内联
S. Xydis, G. Economakos, K. Pekmestzi
This paper presents a design technique for coarse grained reconfigurable cores targeting mostly DSP applications. The proposed technique inlines flexibility into custom carry-save-arithmetic (CSA) datapaths exploiting a stable and canonical interconnection scheme. The canonical interconnection is revealed by a uniformity transformation imposed on the basic architectures of CSA multipliers and CSA chain-adders/subtractors. The design flow for the implementation of the core is analyzed in detail, and a novel reconfigurable architecture prototype is presented. The paper concludes with the experimental results showing that our architecture performs an average latency reduction of 32.63%, compared with datapaths of primitive computational resources, with a tolerable overhead in hardware utilization.
本文提出了一种针对DSP应用的粗粒度可重构核设计技术。所提出的技术将灵活性内联到利用稳定和规范互连方案的自定义进位保存算法(CSA)数据路径中。通过对CSA乘法器和CSA链加/减法器的基本结构进行均匀性变换,揭示了典型互连。详细分析了核心实现的设计流程,提出了一种新的可重构架构原型。实验结果表明,与原始计算资源的数据路径相比,我们的架构的平均延迟降低了32.63%,硬件利用率的开销是可以容忍的。
{"title":"Flexibility Inlining into Arithmetic Data-paths Exploiting A Regular Interconnection Scheme","authors":"S. Xydis, G. Economakos, K. Pekmestzi","doi":"10.1109/ICSAMOS.2007.4285744","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285744","url":null,"abstract":"This paper presents a design technique for coarse grained reconfigurable cores targeting mostly DSP applications. The proposed technique inlines flexibility into custom carry-save-arithmetic (CSA) datapaths exploiting a stable and canonical interconnection scheme. The canonical interconnection is revealed by a uniformity transformation imposed on the basic architectures of CSA multipliers and CSA chain-adders/subtractors. The design flow for the implementation of the core is analyzed in detail, and a novel reconfigurable architecture prototype is presented. The paper concludes with the experimental results showing that our architecture performs an average latency reduction of 32.63%, compared with datapaths of primitive computational resources, with a tolerable overhead in hardware utilization.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116263494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems 面向入侵检测系统的高效内存可重构Aho-Corasick FSM实现
Vassilis Dimopoulos, I. Papaefstathiou, D. Pnevmatikatos
The Aho-Corasick (AC) algorithm is a very flexible and efficient but memory-hungry pattern matching algorithm that can scan the existence of a query string among multiple test strings looking at each character exactly once, making it one of the main options for software-base intrusion detection systems such as SNORT. We present the Split-AC algorithm, which is a reconfigurable variation of the AC algorithm that exploits domain-specific characteristics of intrusion detection to reduce considerably the FSM memory requirements. SplitAC achieves an overall reduction between 28-75% compared to the best proposed implementation.
Aho-Corasick (AC)算法是一种非常灵活和高效的模式匹配算法,但需要大量内存,它可以在多个测试字符串中扫描查询字符串的存在性,对每个字符只查看一次,使其成为SNORT等基于软件的入侵检测系统的主要选择之一。我们提出了Split-AC算法,它是AC算法的可重构变体,利用入侵检测的领域特定特征来大大减少FSM的内存需求。与最佳方案相比,SplitAC实现了28-75%的总体减少。
{"title":"A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems","authors":"Vassilis Dimopoulos, I. Papaefstathiou, D. Pnevmatikatos","doi":"10.1109/ICSAMOS.2007.4285750","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285750","url":null,"abstract":"The Aho-Corasick (AC) algorithm is a very flexible and efficient but memory-hungry pattern matching algorithm that can scan the existence of a query string among multiple test strings looking at each character exactly once, making it one of the main options for software-base intrusion detection systems such as SNORT. We present the Split-AC algorithm, which is a reconfigurable variation of the AC algorithm that exploits domain-specific characteristics of intrusion detection to reduce considerably the FSM memory requirements. SplitAC achieves an overall reduction between 28-75% compared to the best proposed implementation.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128088576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
期刊
2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1