首页 > 最新文献

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文 中文
Optimization of Area and Delay at Gate-Level in Multiple Constant Multiplications 多常数乘法中门级面积和时延的优化
L. Aksoy, E. Costa, P. Flores, J. Monteiro
Although many efficient high-level algorithms have been proposed for the realization of Multiple Constant Multiplications (MCM) using the fewest number of addition and subtraction operations, they do not consider the low-level implementation issues that directly affect the area, delay, and power dissipation of the MCM design. In this paper, we initially present area efficient addition and subtraction architectures used in the design of the MCM operation. Then, we propose an algorithm that searches an MCM design with the smallest area taking into account the cost of each operation at gate-level. To address the area and delay tradeoff in MCM design, the proposed algorithm is improved to find the smallest area solution under a delay constraint. The experimental results show that the proposed algorithms yield low-complexity and high-speed MCM designs with respect to those obtained by the prominent algorithms designed for the optimization of the number of operations and the optimization of area at gate-level.
虽然已经提出了许多高效的高级算法来使用最少数量的加减法操作来实现多重常数乘法(Multiple Constant multiplication, MCM),但它们没有考虑直接影响MCM设计的面积、延迟和功耗的低级实现问题。在本文中,我们首先提出了用于MCM操作设计的面积高效加法和减法架构。然后,我们提出了一种算法,该算法考虑到每个操作在门级的成本,搜索具有最小面积的MCM设计。为了解决MCM设计中面积和延迟的权衡问题,改进了该算法,在时延约束下找到最小面积解。实验结果表明,与优化运算次数和优化门级面积的现有算法相比,本文提出的算法可实现低复杂度和高速的MCM设计。
{"title":"Optimization of Area and Delay at Gate-Level in Multiple Constant Multiplications","authors":"L. Aksoy, E. Costa, P. Flores, J. Monteiro","doi":"10.1109/DSD.2010.32","DOIUrl":"https://doi.org/10.1109/DSD.2010.32","url":null,"abstract":"Although many efficient high-level algorithms have been proposed for the realization of Multiple Constant Multiplications (MCM) using the fewest number of addition and subtraction operations, they do not consider the low-level implementation issues that directly affect the area, delay, and power dissipation of the MCM design. In this paper, we initially present area efficient addition and subtraction architectures used in the design of the MCM operation. Then, we propose an algorithm that searches an MCM design with the smallest area taking into account the cost of each operation at gate-level. To address the area and delay tradeoff in MCM design, the proposed algorithm is improved to find the smallest area solution under a delay constraint. The experimental results show that the proposed algorithms yield low-complexity and high-speed MCM designs with respect to those obtained by the prominent algorithms designed for the optimization of the number of operations and the optimization of area at gate-level.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128237466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Optimising Self-Timed FPGA Circuits 优化自定时FPGA电路
P. Ferguson, A. Efthymiou, T. Arslan, Danny Hume
This paper introduces a novel synchronous to asynchronous logic conversion tool targeted specifically for a synchronous field programmable gate array (FPGA). This tool augments the synchronous FPGA design flow and removes the clock network to implement an asynchronous control network in its place. We evaluate the timing performance benefits of the methods used to implement the asynchronous control network on synchronous FPGA fabric. Industrial video processing circuits are used to demonstrate the iterative timing improvements the tool makes to asynchronous control networks in each circuit. The targeted design constraints used in the tool are intended to improve the robustness and predictability of the placed circuits. This allows the timing benefits of asynchronous bundled data circuits easier to achieve, making asynchronous circuits a viable design option on modern FPGAs.
本文介绍了一种针对同步现场可编程门阵列(FPGA)的新型同步到异步逻辑转换工具。该工具增强了同步FPGA设计流程,并删除了时钟网络,以实现异步控制网络。我们评估了在同步FPGA结构上实现异步控制网络的方法的时序性能优势。使用工业视频处理电路来演示该工具对每个电路中的异步控制网络的迭代定时改进。工具中使用的目标设计约束旨在提高所放置电路的鲁棒性和可预测性。这使得异步捆绑数据电路的时序优势更容易实现,使异步电路成为现代fpga上可行的设计选择。
{"title":"Optimising Self-Timed FPGA Circuits","authors":"P. Ferguson, A. Efthymiou, T. Arslan, Danny Hume","doi":"10.1109/DSD.2010.97","DOIUrl":"https://doi.org/10.1109/DSD.2010.97","url":null,"abstract":"This paper introduces a novel synchronous to asynchronous logic conversion tool targeted specifically for a synchronous field programmable gate array (FPGA). This tool augments the synchronous FPGA design flow and removes the clock network to implement an asynchronous control network in its place. We evaluate the timing performance benefits of the methods used to implement the asynchronous control network on synchronous FPGA fabric. Industrial video processing circuits are used to demonstrate the iterative timing improvements the tool makes to asynchronous control networks in each circuit. The targeted design constraints used in the tool are intended to improve the robustness and predictability of the placed circuits. This allows the timing benefits of asynchronous bundled data circuits easier to achieve, making asynchronous circuits a viable design option on modern FPGAs.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Modular Peripheral to Support Self-Reconfiguration in SoCs 在soc中支持自重构的模块化外设
A. Otero, Angel Morales-Cas, J. Portilla, E. D. L. Torre, T. Riesgo
In this paper, a solution to support the run-time read back, relocation and replication of cores in embedded systems with dynamic and partial reconfiguration capabilities is presented. The proposal shows a peripheral structure that allows an easy integration and communication with the rest of the system, including an API to make the reconfiguration details to be more transparent to software applications. Differently to other proposals, all functionality is implemented in hardware, achieving a higher reconfiguration speed. In addition, different design decisions have been taken in order to increase the portability of the solution to existing and, possibly, future FPGAs. Finally, a use case is provided, which shows the features of this module applied to the run-time scaling of a hardware coprocessor.
本文提出了一种在具有动态和局部重构能力的嵌入式系统中支持内核运行时回读、重定位和复制的解决方案。该提案展示了一个外围结构,允许与系统的其余部分轻松集成和通信,包括一个API,使重新配置细节对软件应用程序更加透明。与其他建议不同的是,所有功能都在硬件中实现,实现了更高的重新配置速度。此外,为了增加解决方案对现有和可能的未来fpga的可移植性,已经采取了不同的设计决策。最后,给出了一个用例,该用例显示了该模块应用于硬件协处理器的运行时扩展的特性。
{"title":"A Modular Peripheral to Support Self-Reconfiguration in SoCs","authors":"A. Otero, Angel Morales-Cas, J. Portilla, E. D. L. Torre, T. Riesgo","doi":"10.1109/DSD.2010.100","DOIUrl":"https://doi.org/10.1109/DSD.2010.100","url":null,"abstract":"In this paper, a solution to support the run-time read back, relocation and replication of cores in embedded systems with dynamic and partial reconfiguration capabilities is presented. The proposal shows a peripheral structure that allows an easy integration and communication with the rest of the system, including an API to make the reconfiguration details to be more transparent to software applications. Differently to other proposals, all functionality is implemented in hardware, achieving a higher reconfiguration speed. In addition, different design decisions have been taken in order to increase the portability of the solution to existing and, possibly, future FPGAs. Finally, a use case is provided, which shows the features of this module applied to the run-time scaling of a hardware coprocessor.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131408379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Area-Efficient Multi-moduli Squarers for RNS RNS的面积高效多模平方
D. Bakalis, H. T. Vergos
Multi-moduli architectures are very useful for reconfigurable digital processors and fault-tolerant systems that are based on the Residue Number System (RNS). In this paper we propose two architectures for multi-moduli squaring that support the most common moduli cases in RNS channels, that is, 2^n-1, 2^n and 2^n+1. The proposed architectures are based on the modified Booth encoding of the input operand for deriving the required partial products and on Dadda adder trees for their addition. Experimental results show that the proposed squarers offer significant savings in area compared to previous proposals while a small improvement in delay is achieved in most cases as well.
多模体系结构对于基于剩余数系统(RNS)的可重构数字处理器和容错系统非常有用。在本文中,我们提出了两种支持RNS通道中最常见的模情况的多模平方架构,即2^n-1, 2^n和2^n+1。所提出的体系结构是基于输入操作数的改进Booth编码来推导所需的部分乘积,并基于adda加法器树来进行它们的加法。实验结果表明,与先前的建议相比,所提出的正方形提供了显着的面积节省,同时在大多数情况下也实现了延迟的小幅改善。
{"title":"Area-Efficient Multi-moduli Squarers for RNS","authors":"D. Bakalis, H. T. Vergos","doi":"10.1109/DSD.2010.25","DOIUrl":"https://doi.org/10.1109/DSD.2010.25","url":null,"abstract":"Multi-moduli architectures are very useful for reconfigurable digital processors and fault-tolerant systems that are based on the Residue Number System (RNS). In this paper we propose two architectures for multi-moduli squaring that support the most common moduli cases in RNS channels, that is, 2^n-1, 2^n and 2^n+1. The proposed architectures are based on the modified Booth encoding of the input operand for deriving the required partial products and on Dadda adder trees for their addition. Experimental results show that the proposed squarers offer significant savings in area compared to previous proposals while a small improvement in delay is achieved in most cases as well.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"20 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120870596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Evaluating OpenMP Support Costs on MPSoCs 在mpsoc上评估OpenMP支持成本
A. Marongiu, P. Burgio, L. Benini
The ever-increasing complexity of MPSoCs is making the production of software the critical path in embedded system development. Several programming models and tools have been proposed in the recent past that aim at facilitating application development for embedded MPSoCs. OpenMP is a mature and easy-to-use standard for shared memory programming, which has recently been successfully adopted in embedded MPSoC programming as well. To achieve performance, however, it is necessary that the implementation of OpenMP constructs efficiently exploits the many peculiarities of MPSoC hardware. In this paper we present an extensive evaluation of the cost associated with supporting OpenMP on such a machine, investigating several implementative variants that efficiently exploit the memory hierarchy. Experimental results on different benchmarks confirm the effectiveness of the optimizations in terms of performance improvements.
随着单片机复杂性的不断提高,软件开发成为嵌入式系统开发的关键环节。最近提出了一些编程模型和工具,旨在促进嵌入式mpsoc的应用程序开发。OpenMP是一种成熟且易于使用的共享内存编程标准,最近也成功地应用于嵌入式MPSoC编程。然而,为了实现性能,OpenMP结构的实现必须有效地利用MPSoC硬件的许多特性。在本文中,我们对在这样一台机器上支持OpenMP的相关成本进行了广泛的评估,研究了几种有效利用内存层次结构的实现变体。在不同基准测试上的实验结果证实了优化在性能改进方面的有效性。
{"title":"Evaluating OpenMP Support Costs on MPSoCs","authors":"A. Marongiu, P. Burgio, L. Benini","doi":"10.1109/DSD.2010.99","DOIUrl":"https://doi.org/10.1109/DSD.2010.99","url":null,"abstract":"The ever-increasing complexity of MPSoCs is making the production of software the critical path in embedded system development. Several programming models and tools have been proposed in the recent past that aim at facilitating application development for embedded MPSoCs. OpenMP is a mature and easy-to-use standard for shared memory programming, which has recently been successfully adopted in embedded MPSoC programming as well. To achieve performance, however, it is necessary that the implementation of OpenMP constructs efficiently exploits the many peculiarities of MPSoC hardware. In this paper we present an extensive evaluation of the cost associated with supporting OpenMP on such a machine, investigating several implementative variants that efficiently exploit the memory hierarchy. Experimental results on different benchmarks confirm the effectiveness of the optimizations in terms of performance improvements.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"110 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120870985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Network-on-Multi-Chip (NoMC) for Multi-FPGA Multimedia Systems 多fpga多媒体系统的多片网络(NoMC)
M. Stepniewska, A. Luczak, J. Siast
Some applications, especially in the area of multimedia processing, need to be implemented in a multichip platform, due to their size. An efficient communication infrastructure for such systems may be designed with the use of the Networks-on-Chip (NoCs). However, a network for multi-chip systems require a scalable architecture. Moreover, for multimedia purposes, such NoC should support a multicast transmission mode. In order to meet this requirements, we propose the NoMC (Network-on-Multi-Chip) which is a hierarchical interconnect system, designed for multi-chip systems. A performance of the proposed network is assessed utilizing a model of the MVC (Multiview Video Coding) coder. In such system, the multicast transmission mode may yield an overall bandwidth gain up to 30%. Moreover, the synthesis results show that the proposed network elements are easily synthesizable for the FPGA devices.
一些应用程序,特别是在多媒体处理领域,由于其规模,需要在多芯片平台中实现。使用片上网络(noc)可以为这种系统设计有效的通信基础设施。然而,多芯片系统的网络需要可扩展的体系结构。此外,对于多媒体目的,这样的NoC应该支持多播传输模式。为了满足这一需求,我们提出了NoMC (network on- multi-chip),这是一种针对多芯片系统设计的分层互连系统。利用MVC(多视图视频编码)编码器模型评估了所提出网络的性能。在这种系统中,多播传输模式可产生高达30%的总带宽增益。综合结果表明,所提出的网元易于在FPGA器件上进行综合。
{"title":"Network-on-Multi-Chip (NoMC) for Multi-FPGA Multimedia Systems","authors":"M. Stepniewska, A. Luczak, J. Siast","doi":"10.1109/DSD.2010.106","DOIUrl":"https://doi.org/10.1109/DSD.2010.106","url":null,"abstract":"Some applications, especially in the area of multimedia processing, need to be implemented in a multichip platform, due to their size. An efficient communication infrastructure for such systems may be designed with the use of the Networks-on-Chip (NoCs). However, a network for multi-chip systems require a scalable architecture. Moreover, for multimedia purposes, such NoC should support a multicast transmission mode. In order to meet this requirements, we propose the NoMC (Network-on-Multi-Chip) which is a hierarchical interconnect system, designed for multi-chip systems. A performance of the proposed network is assessed utilizing a model of the MVC (Multiview Video Coding) coder. In such system, the multicast transmission mode may yield an overall bandwidth gain up to 30%. Moreover, the synthesis results show that the proposed network elements are easily synthesizable for the FPGA devices.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121307241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Persistence Management Model for Dynamically Reconfigurable Hardware 动态可重构硬件的持久性管理模型
J. Dondo, Fernando Rincón Calle, Jesús Barba, F. Moya, Francisco Sánchez, J. C. López
This document presents a persistence management model for reconfigurable SoC. This model provides an efficient mechanism for persistence to preserve data information of hardware components that are swapped out of dynamically reconfigurable areas, in order to allow the reinsertion of these components and to restart the execution path from the same point where they were interrupted when reinserted. This mechanism allows state management of components instantiated not only in reconfigurable areas, but also for those instantiated in static areas, that are feasible to be stopped and replaced for new versions instantiated in hardware or implemented in software migrating their state to the new ones.
本文提出了一个可重构SoC的持久性管理模型。该模型提供了一种有效的持久性机制,用于保存从动态可重构区域交换出的硬件组件的数据信息,以便允许重新插入这些组件,并从重新插入时中断它们的同一点重新启动执行路径。该机制不仅允许在可重构区域中实例化组件的状态管理,还允许在静态区域中实例化组件的状态管理,这些组件可以在硬件中实例化的新版本中停止和替换,或者在软件中实现将其状态迁移到新版本。
{"title":"Persistence Management Model for Dynamically Reconfigurable Hardware","authors":"J. Dondo, Fernando Rincón Calle, Jesús Barba, F. Moya, Francisco Sánchez, J. C. López","doi":"10.1109/DSD.2010.90","DOIUrl":"https://doi.org/10.1109/DSD.2010.90","url":null,"abstract":"This document presents a persistence management model for reconfigurable SoC. This model provides an efficient mechanism for persistence to preserve data information of hardware components that are swapped out of dynamically reconfigurable areas, in order to allow the reinsertion of these components and to restart the execution path from the same point where they were interrupted when reinserted. This mechanism allows state management of components instantiated not only in reconfigurable areas, but also for those instantiated in static areas, that are feasible to be stopped and replaced for new versions instantiated in hardware or implemented in software migrating their state to the new ones.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Generated Cycle-Accurate Profiler for C Language 生成周期精确的C语言分析器
Zdenek Prikryl, Karel Masarík, Tomás Hruska, A. Husár
Application-specific instruction set processors used in embedded systems are highly optimized for a given task. On this type of processors runs a specific application. Therefore, the designer should have a tool which helps him or her in the task of processor and application optimization. One of such tools is profiler. It can discover problematic parts, such as bottleneck points, in the processor and application design. Then, the designer can easily find which parts of the processor or application should be modified, so that performance gets better or power-consumption is reduced. In this paper, a way how to generate cycle-accurate profiler for C language from a processor model described with an architecture description language is proposed.
嵌入式系统中使用的专用指令集处理器针对给定任务进行了高度优化。在这种类型的处理器上运行特定的应用程序。因此,设计人员应该有一个工具来帮助他或她在处理器和应用程序优化的任务。其中一个工具是profiler。它可以发现处理器和应用程序设计中的问题部件,例如瓶颈点。然后,设计人员可以很容易地找到处理器或应用程序的哪些部分需要修改,从而提高性能或降低功耗。本文提出了一种从用体系结构描述语言描述的处理器模型生成周期精确的C语言分析器的方法。
{"title":"Generated Cycle-Accurate Profiler for C Language","authors":"Zdenek Prikryl, Karel Masarík, Tomás Hruska, A. Husár","doi":"10.1109/DSD.2010.39","DOIUrl":"https://doi.org/10.1109/DSD.2010.39","url":null,"abstract":"Application-specific instruction set processors used in embedded systems are highly optimized for a given task. On this type of processors runs a specific application. Therefore, the designer should have a tool which helps him or her in the task of processor and application optimization. One of such tools is profiler. It can discover problematic parts, such as bottleneck points, in the processor and application design. Then, the designer can easily find which parts of the processor or application should be modified, so that performance gets better or power-consumption is reduced. In this paper, a way how to generate cycle-accurate profiler for C language from a processor model described with an architecture description language is proposed.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"513 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116008861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Arithmetic Units for RNS Moduli {2n-3} and {2n+3} Operations RNS模{2n-3}和{2n+3}运算的算术单位
P. M. Matutino, R. Chaves, L. Sousa
A new moduli set {2n-1, 2n+3, 2n+1, 2n-3} has recently been proposed to represent numbers in Residue Number Systems (RNS), increasing the number of channels. With this, the processing time can be reduced by simultaneously exploiting the carry-free characteristic of the modular arithmetic and improving the parallelism. In this paper, hardware structures for addition and multiplication operation in RNS for the moduli {2n-3} and {2n+3} are proposed and analyzed. In order to evaluate the performance of the proposed units they were implemented on an ASIC technology. The obtained experimental results suggest that the performance of the moduli {2npm3} are acceptable but demand more area resource and impose a larger delay than the typically used {2npm1} arithmetic units. Addition units require at least 42% more area for a performance identical to the {2n+1} modulo adder. The multiplication units require up to 37% more area and impose a delay 25% higher. This paper also suggests that more balanced moduli sets should be developed in order to achieve more efficient RNS.
最近提出了一个新的模集{2n- 1,2n + 3,2n + 1,2n -3}来表示剩余数系统(RNS)中的数,从而增加了信道的数量。利用模块化算法的无携带特性,提高并行性,缩短了处理时间。本文提出并分析了模{2n-3}和模{2n+3}的RNS中加乘运算的硬件结构。为了评估所提出的单元的性能,它们在ASIC技术上实现。得到的实验结果表明,模{2npm3}的性能是可以接受的,但比通常使用的{2npm1}算术单元需要更多的面积资源和施加更大的延迟。为了获得与{2n+1}模加法器相同的性能,加法单元至少需要多42%的面积。乘法单元需要的面积增加37%,延迟增加25%。本文还提出,为了实现更高效的RNS,需要开发更多的平衡模集。
{"title":"Arithmetic Units for RNS Moduli {2n-3} and {2n+3} Operations","authors":"P. M. Matutino, R. Chaves, L. Sousa","doi":"10.1109/DSD.2010.77","DOIUrl":"https://doi.org/10.1109/DSD.2010.77","url":null,"abstract":"A new moduli set {2n-1, 2n+3, 2n+1, 2n-3} has recently been proposed to represent numbers in Residue Number Systems (RNS), increasing the number of channels. With this, the processing time can be reduced by simultaneously exploiting the carry-free characteristic of the modular arithmetic and improving the parallelism. In this paper, hardware structures for addition and multiplication operation in RNS for the moduli {2n-3} and {2n+3} are proposed and analyzed. In order to evaluate the performance of the proposed units they were implemented on an ASIC technology. The obtained experimental results suggest that the performance of the moduli {2npm3} are acceptable but demand more area resource and impose a larger delay than the typically used {2npm1} arithmetic units. Addition units require at least 42% more area for a performance identical to the {2n+1} modulo adder. The multiplication units require up to 37% more area and impose a delay 25% higher. This paper also suggests that more balanced moduli sets should be developed in order to achieve more efficient RNS.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Adaptive Cache Memories for SMT Processors SMT处理器的自适应缓存存储器
S. López, O. Garnica, D. Albonesi, S. Dropsho, J. Lanchares, J. Hidalgo
Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In Simultaneous Multi-Threaded (SMT) cores, the caching needs can vary greatly across the number of threads and their characteristics, offering opportunities to dynamically adjust cache resources to the workload. In this paper we propose the use of resizable caches in order to improve the performance of SMT cores, and introduce a new control algorithm that provides good results independent of the number of running threads. In workloads with a single thread, the resizable cache control algorithm should optimize for cache miss behavior because misses typically form the critical path. In contrast, with several independent threads running, we show that optimizing for cache hit behavior has more impact, since large SMT workloads have other threads to run during a cache miss. Moreover, we demonstrate that these seemingly diametrically opposed policies can be simultaneously satisfied by using the harmonic mean of the per-thread speedups as the metric to evaluate the system performance, and to smoothly and naturally adjust to the degree of multithreading.
可调整大小的缓存可以权衡访问速度的容量,以动态匹配工作负载的需求。在同步多线程(Simultaneous Multi-Threaded, SMT)内核中,缓存需求可能会因线程数量及其特征的不同而有很大差异,从而提供了根据工作负载动态调整缓存资源的机会。在本文中,我们提出使用可调整大小的缓存来提高SMT内核的性能,并引入了一种新的控制算法,该算法可以提供与运行线程数量无关的良好结果。在单线程工作负载中,可调整大小的缓存控制算法应该针对缓存缺失行为进行优化,因为缺失通常会形成关键路径。相比之下,在运行多个独立线程的情况下,我们表明,优化缓存命中行为具有更大的影响,因为大型SMT工作负载在缓存丢失期间有其他线程要运行。此外,我们还证明,通过使用每线程加速的调和平均值作为评估系统性能的指标,可以同时满足这些看似截然相反的策略,并顺利自然地调整到多线程的程度。
{"title":"Adaptive Cache Memories for SMT Processors","authors":"S. López, O. Garnica, D. Albonesi, S. Dropsho, J. Lanchares, J. Hidalgo","doi":"10.1109/DSD.2010.69","DOIUrl":"https://doi.org/10.1109/DSD.2010.69","url":null,"abstract":"Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In Simultaneous Multi-Threaded (SMT) cores, the caching needs can vary greatly across the number of threads and their characteristics, offering opportunities to dynamically adjust cache resources to the workload. In this paper we propose the use of resizable caches in order to improve the performance of SMT cores, and introduce a new control algorithm that provides good results independent of the number of running threads. In workloads with a single thread, the resizable cache control algorithm should optimize for cache miss behavior because misses typically form the critical path. In contrast, with several independent threads running, we show that optimizing for cache hit behavior has more impact, since large SMT workloads have other threads to run during a cache miss. Moreover, we demonstrate that these seemingly diametrically opposed policies can be simultaneously satisfied by using the harmonic mean of the per-thread speedups as the metric to evaluate the system performance, and to smoothly and naturally adjust to the degree of multithreading.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128788885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1