首页 > 最新文献

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文 中文
Composable Dynamic Voltage and Frequency Scaling and Power Management for Dataflow Applications 数据流应用的可组合动态电压和频率缩放和电源管理
K. Goossens, Dongrui She, Aleksandar Milutinovic, A. Molnos
Composability means that the behaviour of an application, including its timing, is not affected by the absence or presence of other applications. It is required to be able to design, test, and verify applications independently. In this paper we define composable dynamic voltage and frequency scaling (DVFS) hardware, and composable power management. We ensure that the functional and temporal behaviours of an application are not affected by other applications, even when they are power managed. For dataflow applications with worst-case execution times per task, our power management is also predictable, i.e. guarantees end-to-end real-time requirements, even when the application is mapped on multiple processors that are power managed independently. Our method can be used with various DVFS architectures, such as on-chip and off-chip VF regulators. Our FPGA implementation models a system with multiple tiles, each containing a processor with local memory running a real-time operating system (RTOS) and power management. Tiles are interconnected by a network on chip, and communicate using shared memories. Experiments indicate energy savings of 68% w.r.t. no power management, and 40% w.r.t. power gating only. We also demonstrate composability and predictability on the platform in the presence of power management.
可组合性意味着应用程序的行为(包括其计时)不受其他应用程序存在与否的影响。它需要能够独立设计、测试和验证应用程序。本文定义了可组合动态电压和频率缩放(DVFS)硬件和可组合电源管理。我们确保一个应用程序的功能和时间行为不受其他应用程序的影响,即使它们是电源管理的。对于每个任务的最坏情况执行时间的数据流应用程序,我们的电源管理也是可预测的,即保证端到端的实时需求,即使应用程序映射到多个独立电源管理的处理器上。我们的方法可用于各种DVFS架构,如片内和片外VF稳压器。我们的FPGA实现模拟了一个具有多个块的系统,每个块包含一个具有本地内存的处理器,运行实时操作系统(RTOS)和电源管理。磁片通过芯片上的网络相互连接,并使用共享存储器进行通信。实验表明,在没有电源管理的情况下,节能68%,只有电源门控的情况下节能40%。我们还演示了在电源管理的情况下平台上的可组合性和可预测性。
{"title":"Composable Dynamic Voltage and Frequency Scaling and Power Management for Dataflow Applications","authors":"K. Goossens, Dongrui She, Aleksandar Milutinovic, A. Molnos","doi":"10.1109/DSD.2010.61","DOIUrl":"https://doi.org/10.1109/DSD.2010.61","url":null,"abstract":"Composability means that the behaviour of an application, including its timing, is not affected by the absence or presence of other applications. It is required to be able to design, test, and verify applications independently. In this paper we define composable dynamic voltage and frequency scaling (DVFS) hardware, and composable power management. We ensure that the functional and temporal behaviours of an application are not affected by other applications, even when they are power managed. For dataflow applications with worst-case execution times per task, our power management is also predictable, i.e. guarantees end-to-end real-time requirements, even when the application is mapped on multiple processors that are power managed independently. Our method can be used with various DVFS architectures, such as on-chip and off-chip VF regulators. Our FPGA implementation models a system with multiple tiles, each containing a processor with local memory running a real-time operating system (RTOS) and power management. Tiles are interconnected by a network on chip, and communicate using shared memories. Experiments indicate energy savings of 68% w.r.t. no power management, and 40% w.r.t. power gating only. We also demonstrate composability and predictability on the platform in the presence of power management.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Hardware-Based Speed Up of Face Recognition Towards Real-Time Performance 基于硬件的人脸识别实时性提升
I. Sajid, Sotirios G. Ziavras, M. M. Ahmed
Real-time face recognition by computer systems is required in many commercial and security applications since it is the only way to protect privacy and security. On the other hand, face recognition generates huge amounts of data in real-time. Filtering out meaningful data from this raw data with high accuracy is a complex task. Most of the existing techniques primarily focus on the accuracy aspect using extensive matrix-oriented computations. Efficient realizations primarily reduce the computational space using eigenvalues. On the other hand, an eigenvalues oriented evaluation has minimum time complexity of O (n3), where n is the rank of the covariance matrix, the computation cost for co-variance generation is extra. Our frequency distribution curve (FDC) technique avoids matrix decomposition and other high computationally intensive matrix operations. FDC is formulated with a bias towards efficient hardware realization and high accuracy by using simple vector operations. FDC requires pattern vector (PV) extraction from an image within O (n2) time. Our enhanced FDC-based architecture proposed in this paper further shifts a computationally expensive component of FDC to the offline layer of the system, thus resulting in very fast online evaluation of the input data. Furthermore, efficient online testing is pursued as well using an adaptive controller (AC) for PV classification utilizing the Euclidian vector norm length. The pipelined AC architecture adapts to the availability of resources in the target silicon device. Our implementation on an XC5VSX50t FPGA demonstrates a high accuracy of 99% in face recognition for 400 images in the ORL database, generally requiring less than 200 nsec per image.
计算机系统的实时人脸识别在许多商业和安全应用中都是必需的,因为它是保护隐私和安全的唯一途径。另一方面,人脸识别会实时产生大量数据。从这些原始数据中高精度地过滤出有意义的数据是一项复杂的任务。现有的大多数技术主要集中在精度方面,使用大量的面向矩阵的计算。有效的实现主要是利用特征值减少计算空间。另一方面,面向特征值的评估具有最小的时间复杂度O (n3),其中n为协方差矩阵的秩,协方差生成的计算成本是额外的。我们的频率分布曲线(FDC)技术避免了矩阵分解和其他高计算强度的矩阵运算。FDC的制定偏向于使用简单的矢量运算来实现高效的硬件实现和高精度。FDC要求在O (n2)时间内从图像中提取模式向量(PV)。我们在本文中提出的基于FDC的增强型架构进一步将FDC的计算成本较高的组件转移到系统的离线层,从而导致对输入数据的非常快速的在线评估。此外,利用欧几里得向量范数长度的自适应控制器(AC)进行PV分类,也追求有效的在线测试。流水线交流架构适应目标硅器件中资源的可用性。我们在xc5vs50t FPGA上的实现表明,对于ORL数据库中的400张图像,人脸识别的准确率高达99%,通常每张图像所需的时间小于200 nsec。
{"title":"Hardware-Based Speed Up of Face Recognition Towards Real-Time Performance","authors":"I. Sajid, Sotirios G. Ziavras, M. M. Ahmed","doi":"10.1109/DSD.2010.45","DOIUrl":"https://doi.org/10.1109/DSD.2010.45","url":null,"abstract":"Real-time face recognition by computer systems is required in many commercial and security applications since it is the only way to protect privacy and security. On the other hand, face recognition generates huge amounts of data in real-time. Filtering out meaningful data from this raw data with high accuracy is a complex task. Most of the existing techniques primarily focus on the accuracy aspect using extensive matrix-oriented computations. Efficient realizations primarily reduce the computational space using eigenvalues. On the other hand, an eigenvalues oriented evaluation has minimum time complexity of O (n3), where n is the rank of the covariance matrix, the computation cost for co-variance generation is extra. Our frequency distribution curve (FDC) technique avoids matrix decomposition and other high computationally intensive matrix operations. FDC is formulated with a bias towards efficient hardware realization and high accuracy by using simple vector operations. FDC requires pattern vector (PV) extraction from an image within O (n2) time. Our enhanced FDC-based architecture proposed in this paper further shifts a computationally expensive component of FDC to the offline layer of the system, thus resulting in very fast online evaluation of the input data. Furthermore, efficient online testing is pursued as well using an adaptive controller (AC) for PV classification utilizing the Euclidian vector norm length. The pipelined AC architecture adapts to the availability of resources in the target silicon device. Our implementation on an XC5VSX50t FPGA demonstrates a high accuracy of 99% in face recognition for 400 images in the ORL database, generally requiring less than 200 nsec per image.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126184990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A New High-Level Methodology for Programming FPGA-Based Smart Camera 一种基于fpga的智能摄像机高级编程方法
Nicolas Roudel, F. Berry, J. Sérot, L. Eck
Due to the various devices composing a smart camera system, various languages have to be known by the designer (like HDL and C/C++). Most of vision applications designers are software programmers and do not have a good knowledge of HDLs (VHDL). This paper presents a new high-level methodology for implementing vision applications on smart camera platforms. This methodology is based on a soft-core approach to manage the whole system and a dataflow (actor-oriented) language to design the processing elements. We discuss in particular interfacing constraints.
由于组成智能摄像头系统的各种设备,设计人员必须了解各种语言(如HDL和C/ c++)。大多数视觉应用程序设计人员是软件程序员,对hdl (VHDL)没有很好的了解。本文提出了在智能相机平台上实现视觉应用的一种新的高级方法。该方法基于管理整个系统的软核方法和设计处理元素的数据流(面向参与者)语言。我们将特别讨论接口约束。
{"title":"A New High-Level Methodology for Programming FPGA-Based Smart Camera","authors":"Nicolas Roudel, F. Berry, J. Sérot, L. Eck","doi":"10.1109/DSD.2010.68","DOIUrl":"https://doi.org/10.1109/DSD.2010.68","url":null,"abstract":"Due to the various devices composing a smart camera system, various languages have to be known by the designer (like HDL and C/C++). Most of vision applications designers are software programmers and do not have a good knowledge of HDLs (VHDL). This paper presents a new high-level methodology for implementing vision applications on smart camera platforms. This methodology is based on a soft-core approach to manage the whole system and a dataflow (actor-oriented) language to design the processing elements. We discuss in particular interfacing constraints.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121720070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Design of Trace-Based Split Array Caches for Embedded Applications 嵌入式应用中基于跟踪的分割阵列缓存设计
A. Tokarnia, Marina Tachibana
Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.
由于许多嵌入式系统执行一组预定义的程序,因此将系统组件调优到应用程序和数据是许多设计技术选择的方法,以优化性能和功耗。在本文中,我们提出了一种基于对向量、数组和其他复杂数据结构的访问分析的方法来设计一个大小受限的双分区数组缓存。该方法将集合关联数组缓存的方式重新组织成不同行大小的分区,并定义数组-分区映射,使平均存储器访问能量延迟积最小。实验结果表明,与相同大小的统一集合关联数组缓存相比,这些分割数组缓存具有更低的内存访问平均能量延迟积。对于MPEG-2解码器,即使没有并行访问缓存分区,与具有最低能量延迟积的统一集关联数组缓存相比,8k字节基于跟踪的分割数组缓存的平均内存访问能量延迟积减少了50%。如果25%的访问是成对进行的,则会额外减少9%。
{"title":"Design of Trace-Based Split Array Caches for Embedded Applications","authors":"A. Tokarnia, Marina Tachibana","doi":"10.1109/DSD.2010.33","DOIUrl":"https://doi.org/10.1109/DSD.2010.33","url":null,"abstract":"Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128021811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ALOE-Based Flexible LDPC Decoder 基于芦荟的柔性LDPC解码器
Ismael Gómez Miguelez, Massimo Camatel, J. Bracke, V. Marojevic, A. Gelonch, F. Vacca, G. Masera
Radio communications terminals and infrastructure tend to support an increasing range of algorithms and radio access technologies. Flexible processing platforms are therefore needed for supporting multi-standard or heterogeneous radios. Channel decoding is one of the most computing demanding digital signal processing blocks of a radio transceiver. At the same time, it provides a high degree of implementation flexibility as well as facilitates dynamic parameter adjustments. This paper presents a flexible LDPC decoder implemented on an FPGA device following the ALOE middleware design paradigm. We analyse the middleware efficiency in terms of flexibility versus resource requirements. The results show a relative middleware area overhead of 32 %.
无线电通信终端和基础设施倾向于支持越来越多的算法和无线电接入技术。因此,需要灵活的处理平台来支持多标准或异构无线电。信道解码是无线电收发器中对计算量要求最高的数字信号处理模块之一。同时,它提供了高度的实现灵活性以及便于动态参数调整。本文提出了一种基于ALOE中间件设计范例的灵活LDPC解码器。我们从灵活性和资源需求的角度分析中间件效率。结果显示中间件的相对区域开销为32%。
{"title":"ALOE-Based Flexible LDPC Decoder","authors":"Ismael Gómez Miguelez, Massimo Camatel, J. Bracke, V. Marojevic, A. Gelonch, F. Vacca, G. Masera","doi":"10.1109/DSD.2010.107","DOIUrl":"https://doi.org/10.1109/DSD.2010.107","url":null,"abstract":"Radio communications terminals and infrastructure tend to support an increasing range of algorithms and radio access technologies. Flexible processing platforms are therefore needed for supporting multi-standard or heterogeneous radios. Channel decoding is one of the most computing demanding digital signal processing blocks of a radio transceiver. At the same time, it provides a high degree of implementation flexibility as well as facilitates dynamic parameter adjustments. This paper presents a flexible LDPC decoder implemented on an FPGA device following the ALOE middleware design paradigm. We analyse the middleware efficiency in terms of flexibility versus resource requirements. The results show a relative middleware area overhead of 32 %.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133345681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design Methodology for a High Performance Robust DVB-S2 Decoder Implementation 一种高性能稳健DVB-S2解码器实现的设计方法
F. Berthelot, François Charot, Charles Wagner, C. Wolinski
The new Digital Video Broadcasting Satellite (DVB-S2) standard is able to provide capacity gains of about30% over the previous standard by using a powerfull Forward Error Correction (FEC) scheme based on very large LDPC code words and BCH codes. The implementation of the DVBS2FEC decoder is a big challenge. The designer must deal with the overall design complexity and the decoding throughput in order to obtain a high decoding performance in terms of bit error rate (BER). We present in detail a complete design flow allowing a better understanding of the algorithm in terms of complexity, performance and its hardware implementation. We focus on complexity-performance trade-offs due to message quantizations and we compare its effects on several algorithm corrections used to check nodes for DVB-S2 decoding. The simulation results show that the best compromise between complexity and performance is obtained for the FOMS algorithm approximation.
新的数字视频广播卫星(DVB-S2)标准通过使用基于非常大的LDPC码字和BCH码的强大前向纠错(FEC)方案,能够比以前的标准提供大约30%的容量增益。DVBS2FEC解码器的实现是一个很大的挑战。为了在误码率(BER)方面获得较高的译码性能,设计者必须处理好总体设计复杂度和译码吞吐量。我们详细介绍了一个完整的设计流程,以便更好地理解算法的复杂性,性能和硬件实现。我们关注由于消息量化而导致的复杂性-性能权衡,并比较了其对用于检查DVB-S2解码节点的几种算法修正的影响。仿真结果表明,FOMS算法在复杂度和性能之间取得了最佳折衷。
{"title":"Design Methodology for a High Performance Robust DVB-S2 Decoder Implementation","authors":"F. Berthelot, François Charot, Charles Wagner, C. Wolinski","doi":"10.1109/DSD.2010.40","DOIUrl":"https://doi.org/10.1109/DSD.2010.40","url":null,"abstract":"The new Digital Video Broadcasting Satellite (DVB-S2) standard is able to provide capacity gains of about30% over the previous standard by using a powerfull Forward Error Correction (FEC) scheme based on very large LDPC code words and BCH codes. The implementation of the DVBS2FEC decoder is a big challenge. The designer must deal with the overall design complexity and the decoding throughput in order to obtain a high decoding performance in terms of bit error rate (BER). We present in detail a complete design flow allowing a better understanding of the algorithm in terms of complexity, performance and its hardware implementation. We focus on complexity-performance trade-offs due to message quantizations and we compare its effects on several algorithm corrections used to check nodes for DVB-S2 decoding. The simulation results show that the best compromise between complexity and performance is obtained for the FOMS algorithm approximation.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130576842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Class of Recursive Networks on a Chip for Enhancing Intercluster Parallelism 一类增强集群间并行性的芯片递归网络
Masaru Takesue
Future VLSI technologies will allow for multiple clusters each of a number of processing nodes to be put on a single chip. Although we may then be able to select a network topology matching an application assigned to each cluster, it may be difficult to decide the topologies of connections between the (intra)cluster networks for effective parallel processing by the cooperation of clusters. To alleviate the problem, this paper proposes a class of recursive networks, RNs, of which constituent networks can have different topologies and sizes in different recursive levels but also in the same level. In RN, the last-level networks define the cluster networks, and the level-i network associated with a cluster network defines the i-th intercluster network between the cluster and another cluster. The cluster and intercluster networks can be any kinds of standard networks, such as the mesh and bus. The paper presents a partition-based method of generating RN and its routing and layout methods.
未来的VLSI技术将允许在单个芯片上放置多个处理节点的多个集群。尽管我们可以选择与分配给每个集群的应用程序相匹配的网络拓扑,但很难确定集群网络之间(内部)的连接拓扑,以便通过集群之间的合作进行有效的并行处理。为了缓解这一问题,本文提出了一类递归网络——RNs,其组成网络在不同递归层次上可以具有不同的拓扑和大小,但在同一层次上也可以具有不同的拓扑和大小。在RN中,最后一级网络定义集群网络,与集群网络相关联的第一级网络定义集群与另一个集群之间的第一级集群间网络。集群和集群间网络可以是任何类型的标准网络,如网状和总线。提出了一种基于分区的网格生成方法及其路由和布局方法。
{"title":"A Class of Recursive Networks on a Chip for Enhancing Intercluster Parallelism","authors":"Masaru Takesue","doi":"10.1109/DSD.2010.46","DOIUrl":"https://doi.org/10.1109/DSD.2010.46","url":null,"abstract":"Future VLSI technologies will allow for multiple clusters each of a number of processing nodes to be put on a single chip. Although we may then be able to select a network topology matching an application assigned to each cluster, it may be difficult to decide the topologies of connections between the (intra)cluster networks for effective parallel processing by the cooperation of clusters. To alleviate the problem, this paper proposes a class of recursive networks, RNs, of which constituent networks can have different topologies and sizes in different recursive levels but also in the same level. In RN, the last-level networks define the cluster networks, and the level-i network associated with a cluster network defines the i-th intercluster network between the cluster and another cluster. The cluster and intercluster networks can be any kinds of standard networks, such as the mesh and bus. The paper presents a partition-based method of generating RN and its routing and layout methods.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132157979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis of 90nm Look Up Table (LUT) for Low Power Application 低功耗90nm查找表(LUT)性能分析
Deepak Kumar, Pankaj Kumar, M. Pattanaik
This paper provides a detailed performance analysis of low power and high speed Look up Table (LUT) by using a circuit technique. Proper sizing of all the sleep transistors are done in the LUT to achieve an optimum power –delay relationship so that it can be used for fast growing low power applications. Also, we have implemented a benchmark circuit (8 × 10) encoder in Virtex-4, 90nm FPGA. As compared to the traditional 4-input LUT design, proposed design saves 12.8% of average power in high speed mode and 56.7% in low power mode with a little compromise in its speed.
本文利用电路技术对低功耗高速查找表(LUT)进行了详细的性能分析。在LUT中完成了所有休眠晶体管的适当尺寸,以实现最佳的功率-延迟关系,以便它可以用于快速增长的低功耗应用。此外,我们还在virtex - 4,90nm FPGA上实现了一个基准电路(8 × 10)编码器。与传统的4输入LUT设计相比,该设计在高速模式下节省12.8%的平均功率,在低功耗模式下节省56.7%的平均功率,同时在速度上略有妥协。
{"title":"Performance Analysis of 90nm Look Up Table (LUT) for Low Power Application","authors":"Deepak Kumar, Pankaj Kumar, M. Pattanaik","doi":"10.1109/DSD.2010.72","DOIUrl":"https://doi.org/10.1109/DSD.2010.72","url":null,"abstract":"This paper provides a detailed performance analysis of low power and high speed Look up Table (LUT) by using a circuit technique. Proper sizing of all the sleep transistors are done in the LUT to achieve an optimum power –delay relationship so that it can be used for fast growing low power applications. Also, we have implemented a benchmark circuit (8 × 10) encoder in Virtex-4, 90nm FPGA. As compared to the traditional 4-input LUT design, proposed design saves 12.8% of average power in high speed mode and 56.7% in low power mode with a little compromise in its speed.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133606865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
System Level Synthesis for Ultra Low-Power Wireless Sensor Nodes 超低功耗无线传感器节点的系统级综合
Muhammad Adeel Pasha, Steven Derrien, O. Sentieys
Engineering hardware platform for a Wireless Sensor Network (WSN) node is known to be a tough challenge, as the design must enforce many severe constraints, among which energy dissipation is by far the most challenging one. Today, most of the WSN node platforms are based on low cost and low-power programmable micro controllers, even if it is acknowledged that their energy efficiency remains limited and hinders the wide-spreading of WSN to new applications. In this paper, we propose a complete system level flow for an alternative approach based on the concept of hardware micro-tasks, which relies on hardware specialization and power gating to dramatically improve the energy efficiency of the computational part of the node. Early estimates show power saving by more than one order of magnitude over MCU-based implementations.
无线传感器网络(WSN)节点的工程硬件平台是一项艰巨的挑战,因为设计必须执行许多严格的约束,其中能量消耗是迄今为止最具挑战性的。目前,大多数WSN节点平台都是基于低成本和低功耗的可编程微控制器,尽管人们承认它们的能源效率仍然有限,阻碍了WSN向新应用的广泛传播。在本文中,我们提出了一种基于硬件微任务概念的替代方法的完整系统级流程,该方法依赖于硬件专门化和功率门控来显着提高节点计算部分的能量效率。早期的估计显示,与基于mcu的实现相比,功耗节省了一个数量级以上。
{"title":"System Level Synthesis for Ultra Low-Power Wireless Sensor Nodes","authors":"Muhammad Adeel Pasha, Steven Derrien, O. Sentieys","doi":"10.1109/DSD.2010.88","DOIUrl":"https://doi.org/10.1109/DSD.2010.88","url":null,"abstract":"Engineering hardware platform for a Wireless Sensor Network (WSN) node is known to be a tough challenge, as the design must enforce many severe constraints, among which energy dissipation is by far the most challenging one. Today, most of the WSN node platforms are based on low cost and low-power programmable micro controllers, even if it is acknowledged that their energy efficiency remains limited and hinders the wide-spreading of WSN to new applications. In this paper, we propose a complete system level flow for an alternative approach based on the concept of hardware micro-tasks, which relies on hardware specialization and power gating to dramatically improve the energy efficiency of the computational part of the node. Early estimates show power saving by more than one order of magnitude over MCU-based implementations.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126735253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Packet Classifier Using a Parallel Branching Program Machine 使用并行分支程序机的包分类器
Hiroki Nakahara, Tsutomu Sasao, M. Matsuura
A branching program machine (BM) is a special purpose processor that uses only two kinds of instructions: Branch and output instructions. Thus, the architecture for the BM is much simpler than that for a general purpose processor (MPU). Since the BM uses the dedicated instructions for a special purpose application, it is faster than the MPU. This paper presents a packet classifier using a parallel branching program machine (PBM). To reduce computation time and code size, first, a set of rules for the packet classifier is partitioned into groups. Then, they are evaluated by the PBM in parallel. Also, this paper shows a method to estimate the number of necessary BMs to realize the packet classifier. The PBM32 consisting of 32 BMs has been implemented on an FPGA, and compared with the Intel's Core2Duo@1.2GHz. The PBM32 is 8.1-11.1 times faster than the Core2Duo, and the PBM32 requires only 0.2-10.3 percent of the memory for the Core2Duo.
分支程序机(BM)是一种专用处理器,它只使用两种指令:分支指令和输出指令。因此,BM的体系结构比通用处理器(MPU)的要简单得多。由于BM使用专用指令用于特殊用途的应用,因此它比MPU快。提出了一种基于并行分支程序机(PBM)的分组分类器。为了减少计算时间和代码大小,首先,将分组分类器的一组规则分成若干组。然后,它们被PBM并行计算。此外,本文还给出了一种估计实现分组分类器所需bm数量的方法。在FPGA上实现了由32个bm组成的PBM32,并与Intel的Core2Duo@1.2GHz进行了比较。PBM32的速度是Core2Duo的8.1-11.1倍,而PBM32所需的内存仅为Core2Duo的0.2- 10.3%。
{"title":"A Packet Classifier Using a Parallel Branching Program Machine","authors":"Hiroki Nakahara, Tsutomu Sasao, M. Matsuura","doi":"10.1109/DSD.2010.18","DOIUrl":"https://doi.org/10.1109/DSD.2010.18","url":null,"abstract":"A branching program machine (BM) is a special purpose processor that uses only two kinds of instructions: Branch and output instructions. Thus, the architecture for the BM is much simpler than that for a general purpose processor (MPU). Since the BM uses the dedicated instructions for a special purpose application, it is faster than the MPU. This paper presents a packet classifier using a parallel branching program machine (PBM). To reduce computation time and code size, first, a set of rules for the packet classifier is partitioned into groups. Then, they are evaluated by the PBM in parallel. Also, this paper shows a method to estimate the number of necessary BMs to realize the packet classifier. The PBM32 consisting of 32 BMs has been implemented on an FPGA, and compared with the Intel's Core2Duo@1.2GHz. The PBM32 is 8.1-11.1 times faster than the Core2Duo, and the PBM32 requires only 0.2-10.3 percent of the memory for the Core2Duo.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115796301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1