首页 > 最新文献

Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion最新文献

英文 中文
An efficient hardware design for cerebellar models using approximate circuits: special session paper 使用近似电路的小脑模型的有效硬件设计:专题会议论文
Honglan Jiang, Leibo Liu, Jie Han
The superior controllability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to the motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. Here, we propose an efficient hardware design for cerebellar models using approximate circuits with a small area and a low power. Leveraging the inherent error tolerance in the cerebellum, approximate adders and multipliers are carefully evaluated for implementations in an adaptive filter based cerebellar model to achieve a good tradeoff in accuracy and hardware usage. A saccade system, whose vestibulo-ocular reflex (VOR) is controlled by the cerebellum, is simulated to show the applicability and effectiveness of the proposed design. Simulation results show that the approximate cerebellar circuit achieves a similar accuracy as an exact implementation, but it saves area by 29.7% and power by 37.3%.
小脑优越的可控性激发了人们对计算小脑模型发展的广泛兴趣。许多模型已经应用于机器人的电机控制和图像稳定。通常计算复杂,小脑模型很少在专用硬件中实现。在此,我们提出一种有效的小脑模型硬件设计,使用小面积和低功耗的近似电路。利用小脑固有的容错性,在基于自适应滤波器的小脑模型中,对近似加法器和乘法器的实现进行了仔细评估,以实现精度和硬件使用的良好权衡。通过对小脑控制前庭-眼反射(VOR)的扫视系统进行仿真,验证了该设计的适用性和有效性。仿真结果表明,近似的小脑电路达到了与精确实现相似的精度,但节省了29.7%的面积和37.3%的功耗。
{"title":"An efficient hardware design for cerebellar models using approximate circuits: special session paper","authors":"Honglan Jiang, Leibo Liu, Jie Han","doi":"10.1145/3125502.3125537","DOIUrl":"https://doi.org/10.1145/3125502.3125537","url":null,"abstract":"The superior controllability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to the motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. Here, we propose an efficient hardware design for cerebellar models using approximate circuits with a small area and a low power. Leveraging the inherent error tolerance in the cerebellum, approximate adders and multipliers are carefully evaluated for implementations in an adaptive filter based cerebellar model to achieve a good tradeoff in accuracy and hardware usage. A saccade system, whose vestibulo-ocular reflex (VOR) is controlled by the cerebellum, is simulated to show the applicability and effectiveness of the proposed design. Simulation results show that the approximate cerebellar circuit achieves a similar accuracy as an exact implementation, but it saves area by 29.7% and power by 37.3%.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A fast online sequential learning accelerator for IoT network intrusion detection: work-in-progress
Hantao Huang, Suleman Khalid Rai, Wenye Liu, Hao Yu
Deployment of IoT devices for smart buildings and homes will offer a high level of comfortability with increased energy efficiency; but can also introduce potential cyber-attacks such as network intrusions via linked IoT devices. Due to the low-power and low-latency requirement to secure IoT network, traditional software based security system is not applicable. Instead, an embedded hardware-accelerator based data analytics is more preferred for network intrusion detection. In this paper, we propose an online sequential machine learning hardware accelerator to perform realtime network intrusion detection. A single hidden layer feedforward neural network based learning algorithm is developed with a least-squares solver realized on hardware. Experimental results on a single FPGA achieve a bandwidth of 409.6 Gbps with fast yet low-power network intrusion detection based on a number of benchmarks.
为智能建筑和家庭部署物联网设备将提供高水平的舒适度,同时提高能源效率;但也可能引入潜在的网络攻击,例如通过连接的物联网设备进行网络入侵。由于物联网对低功耗、低时延的要求,传统的基于软件的安全系统已不适用。相反,基于嵌入式硬件加速器的数据分析更适合于网络入侵检测。在本文中,我们提出了一个在线顺序机器学习硬件加速器来执行实时网络入侵检测。提出了一种基于单隐层前馈神经网络的学习算法,并在硬件上实现了最小二乘求解器。基于多个基准测试,在单个FPGA上的实验结果实现了409.6 Gbps的带宽和快速低功耗网络入侵检测。
{"title":"A fast online sequential learning accelerator for IoT network intrusion detection: work-in-progress","authors":"Hantao Huang, Suleman Khalid Rai, Wenye Liu, Hao Yu","doi":"10.1145/3125502.3125532","DOIUrl":"https://doi.org/10.1145/3125502.3125532","url":null,"abstract":"Deployment of IoT devices for smart buildings and homes will offer a high level of comfortability with increased energy efficiency; but can also introduce potential cyber-attacks such as network intrusions via linked IoT devices. Due to the low-power and low-latency requirement to secure IoT network, traditional software based security system is not applicable. Instead, an embedded hardware-accelerator based data analytics is more preferred for network intrusion detection. In this paper, we propose an online sequential machine learning hardware accelerator to perform realtime network intrusion detection. A single hidden layer feedforward neural network based learning algorithm is developed with a least-squares solver realized on hardware. Experimental results on a single FPGA achieve a bandwidth of 409.6 Gbps with fast yet low-power network intrusion detection based on a number of benchmarks.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Heterogeneous redundancy to address performance and cost in multi-core SIMT: work-in-progress 异构冗余以解决多核SIMT中的性能和成本问题:正在进行的工作
M. Naghashi, S. H. Mozafari, S. Hessabi
As manufacturing processes scale to smaller feature sizes and processors become more complex, it is becoming challenging to have fabricated devices that operate according to their specification in the first place: yield losses are mounting [3].
随着制造工艺向更小的特征尺寸扩展,处理器变得越来越复杂,首先要根据其规格制造设备变得越来越具有挑战性:产量损失正在增加[3]。
{"title":"Heterogeneous redundancy to address performance and cost in multi-core SIMT: work-in-progress","authors":"M. Naghashi, S. H. Mozafari, S. Hessabi","doi":"10.1145/3125502.3125547","DOIUrl":"https://doi.org/10.1145/3125502.3125547","url":null,"abstract":"As manufacturing processes scale to smaller feature sizes and processors become more complex, it is becoming challenging to have fabricated devices that operate according to their specification in the first place: yield losses are mounting [3].","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123024593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DOVE: pinpointing firmware security vulnerabilities via symbolic control flow assertion mining (work-in-progress) DOVE:通过符号控制流断言挖掘来精确定位固件安全漏洞(正在进行中)
Alessandro Danese, G. Pravadelli, V. Bertacco
In the past decade, the number of reported security attacks exploiting unchecked input firmware values has been on the rise. To address this concerning trend, this work proposes a novel detection framework, called DOVE, capable of identifying unlikely firmware execution flows, specifically those that may reveal a security vulnerability. The DOVE framework operates by leveraging a symbolic simulation of the firmware's execution, paired with a probability computation that can identify unlikely execution flows and provide to the user corresponding formal assertions.
在过去十年中,利用未检查的输入固件值进行安全攻击的报告数量一直在上升。为了解决这一令人担忧的趋势,这项工作提出了一种新的检测框架,称为DOVE,能够识别不太可能的固件执行流程,特别是那些可能揭示安全漏洞的流程。DOVE框架通过利用固件执行的符号模拟来运行,并结合概率计算来识别不太可能的执行流程,并向用户提供相应的正式断言。
{"title":"DOVE: pinpointing firmware security vulnerabilities via symbolic control flow assertion mining (work-in-progress)","authors":"Alessandro Danese, G. Pravadelli, V. Bertacco","doi":"10.1145/3125502.3125541","DOIUrl":"https://doi.org/10.1145/3125502.3125541","url":null,"abstract":"In the past decade, the number of reported security attacks exploiting unchecked input firmware values has been on the rise. To address this concerning trend, this work proposes a novel detection framework, called DOVE, capable of identifying unlikely firmware execution flows, specifically those that may reveal a security vulnerability. The DOVE framework operates by leveraging a symbolic simulation of the firmware's execution, paired with a probability computation that can identify unlikely execution flows and provide to the user corresponding formal assertions.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
IR-level annotation strategy dealing with aggressive loop optimizations for performance estimation in native simulation: work-in-progress 处理原生模拟中用于性能估计的主动循环优化的ir级注释策略:正在进行中
Omayma Matoussi, F. Pétrot
Originally developed for purely functional verification of software, native or host compiled simulation [6] has gained momentum, thanks to its considerable speedup compared to instruction set simulation (ISS). To obtain a performance model of the software, non-functional information is computed from the target binary code using low-level analysis and back-annotated into the high-level code used to generate it. This annotated functional model is then natively compiled and executed on the host machine for fast software timing [8] estimations. Back-annotating at the right place needs a mapping between the binary instructions and the high-level code statements. So, it is necessary to decide at which stage of the software compilation process the information is back-annotated. There are three possibilities: in the original source code ([7]), in the host binary code ([3]), or in the compiler intermediate representation (IR) ([8], [2]). As compilers perform many optimizations to enhance software performance, the source code and the binary code structures may be radically different. In this work, we define a mapping approach between the compiler's IR and the binary control flow graph (CFG) when a high-level of compiler optimizations (eg. O3 in gcc) is used. Our approach handles aggressive compiler optimizations such as loop unrolling without having to introduce any modification to the compiler.
最初是为纯粹的软件功能验证而开发的,本机或主机编译的仿真[6]已经获得了动力,这要归功于它与指令集仿真(ISS)相比有相当大的加速。为了获得软件的性能模型,使用低级分析从目标二进制代码计算非功能信息,并将其反向注释到用于生成它的高级代码中。然后,本机编译并在主机上执行这个带注释的功能模型,以实现快速软件时序[8]估计。在正确的位置进行反向注释需要在二进制指令和高级代码语句之间进行映射。因此,有必要确定在软件编译过程的哪个阶段对信息进行反向注释。有三种可能性:在原始源代码([7])中,在主机二进制代码([3])中,或在编译器中间表示(IR)中([8],[2])。由于编译器执行许多优化来增强软件性能,源代码和二进制代码结构可能完全不同。在这项工作中,我们定义了编译器的IR和二进制控制流图(CFG)之间的映射方法,当一个高级的编译器优化(例如。使用gcc中的O3)。我们的方法可以处理激进的编译器优化,如循环展开,而无需对编译器进行任何修改。
{"title":"IR-level annotation strategy dealing with aggressive loop optimizations for performance estimation in native simulation: work-in-progress","authors":"Omayma Matoussi, F. Pétrot","doi":"10.1145/3125502.3125550","DOIUrl":"https://doi.org/10.1145/3125502.3125550","url":null,"abstract":"Originally developed for purely functional verification of software, native or host compiled simulation [6] has gained momentum, thanks to its considerable speedup compared to instruction set simulation (ISS). To obtain a performance model of the software, non-functional information is computed from the target binary code using low-level analysis and back-annotated into the high-level code used to generate it. This annotated functional model is then natively compiled and executed on the host machine for fast software timing [8] estimations. Back-annotating at the right place needs a mapping between the binary instructions and the high-level code statements. So, it is necessary to decide at which stage of the software compilation process the information is back-annotated. There are three possibilities: in the original source code ([7]), in the host binary code ([3]), or in the compiler intermediate representation (IR) ([8], [2]). As compilers perform many optimizations to enhance software performance, the source code and the binary code structures may be radically different. In this work, we define a mapping approach between the compiler's IR and the binary control flow graph (CFG) when a high-level of compiler optimizations (eg. O3 in gcc) is used. Our approach handles aggressive compiler optimizations such as loop unrolling without having to introduce any modification to the compiler.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127298869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hampering fault attacks against lattice-based signature schemes: countermeasures and their efficiency (special session) 阻止格签名方案的故障攻击:对策及有效性(专题会议)
Nina Bindel, Juliane Krämer, Johannes Schreiber
Research on physical attacks on lattice-based cryptography has seen some progress in recent years and first attacks and countermeasures have been described. In this work, we perform an exhaustive literature review on fault attacks on lattice-based encryption and signature schemes. Based on this, we provide a complete overview of suggested countermeasures and analyze which of the proposed attacks can prevented by respective countermeasures. Moreover, we show for selected countermeasures how they affect the runtime of the protected operations.
近年来,针对格密码的物理攻击研究取得了一定的进展,首次攻击和应对措施都得到了描述。在这项工作中,我们对基于格的加密和签名方案的故障攻击进行了详尽的文献综述。在此基础上,我们提供了建议的对策的完整概述,并分析了哪些建议的攻击可以通过各自的对策来阻止。此外,我们还将展示所选对策如何影响受保护操作的运行时。
{"title":"Hampering fault attacks against lattice-based signature schemes: countermeasures and their efficiency (special session)","authors":"Nina Bindel, Juliane Krämer, Johannes Schreiber","doi":"10.1145/3125502.3125546","DOIUrl":"https://doi.org/10.1145/3125502.3125546","url":null,"abstract":"Research on physical attacks on lattice-based cryptography has seen some progress in recent years and first attacks and countermeasures have been described. In this work, we perform an exhaustive literature review on fault attacks on lattice-based encryption and signature schemes. Based on this, we provide a complete overview of suggested countermeasures and analyze which of the proposed attacks can prevented by respective countermeasures. Moreover, we show for selected countermeasures how they affect the runtime of the protected operations.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121433155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Trends, challenges and needs for lattice-based cryptography implementations: special session 基于格的加密实现的趋势、挑战和需求:专题会议
Hamid Nejatollahi, N. Dutt, Rosario Cammarota
Advances in computing steadily erode computer security at its foundation, calling for fundamental innovations to strengthen the weakening cryptographic primitives and security protocols. At the same time, the emergence of new computing paradigms, such as Cloud Computing and Internet of Everything, demand that innovations in security extend beyond their foundational aspects, to the actual design and deployment of these primitives and protocols while satisfying emerging design constraints such as latency, compactness, energy efficiency, and agility. While many alternatives have been proposed for symmetric key cryptography and related protocols (e.g., lightweight ciphers and authenticated encryption), the alternatives for public key cryptography are limited to post-quantum cryptography primitives and their protocols. In particular, lattice-based cryptography is a promising candidate, both in terms of foundational properties, as well as its application to traditional security problems such as key exchange, digital signature, and encryption/decryption. We summarize trends in lattice-based cryptographic schemes, some fundamental recent proposals for the use of lattices in computer security, challenges for their implementation in software and hardware, and emerging needs.
计算机技术的进步不断地侵蚀着计算机安全的根基,需要根本性的创新来加强日益薄弱的密码原语和安全协议。与此同时,云计算和万物互联等新计算范式的出现,要求安全方面的创新超越其基础方面,扩展到这些原语和协议的实际设计和部署,同时满足诸如延迟、紧凑性、能效和敏捷性等新出现的设计约束。虽然已经为对称密钥加密和相关协议(例如,轻量级密码和身份验证加密)提出了许多替代方案,但公钥加密的替代方案仅限于后量子加密原语及其协议。特别是,基于格的密码学是一个很有前途的候选者,无论是在基本属性方面,还是在传统安全问题(如密钥交换、数字签名和加密/解密)中的应用方面。我们总结了基于格的加密方案的发展趋势,最近关于在计算机安全中使用格的一些基本建议,它们在软件和硬件上实现的挑战,以及新出现的需求。
{"title":"Trends, challenges and needs for lattice-based cryptography implementations: special session","authors":"Hamid Nejatollahi, N. Dutt, Rosario Cammarota","doi":"10.1145/3125502.3125559","DOIUrl":"https://doi.org/10.1145/3125502.3125559","url":null,"abstract":"Advances in computing steadily erode computer security at its foundation, calling for fundamental innovations to strengthen the weakening cryptographic primitives and security protocols. At the same time, the emergence of new computing paradigms, such as Cloud Computing and Internet of Everything, demand that innovations in security extend beyond their foundational aspects, to the actual design and deployment of these primitives and protocols while satisfying emerging design constraints such as latency, compactness, energy efficiency, and agility. While many alternatives have been proposed for symmetric key cryptography and related protocols (e.g., lightweight ciphers and authenticated encryption), the alternatives for public key cryptography are limited to post-quantum cryptography primitives and their protocols. In particular, lattice-based cryptography is a promising candidate, both in terms of foundational properties, as well as its application to traditional security problems such as key exchange, digital signature, and encryption/decryption. We summarize trends in lattice-based cryptographic schemes, some fundamental recent proposals for the use of lattices in computer security, challenges for their implementation in software and hardware, and emerging needs.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126500756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Exploring fast and slow memories in HMP core types: work-in-progress 探索HMP核心类型中的快存储器和慢存储器:正在进行中
Bryan Donyanavard, Amir Mahdi Hosseini Monazzah, T. Mück, N. Dutt
Studies have shown memory and computational needs vary independently across applications. Recent work has explored using hybrid memory technology (SRAM+NVM) in on-chip memories of multicore processors (CMPs) to support the varied needs of diverse workloads. Such works suggest architectural modifications that require supplemental management in the memory hierarchy. Instead, we propose to deploy hybrid memory in a manner that integrates seamlessly with the existing heterogeneous multicore (HMP) architectural model, and therefore does not require any architectural modification, simply the integration of different memory technologies on-chip. We evaluate platforms with a combination of fast (SRAM cache) and slow (STT-MRAM cache) core-types for mobile workloads.
研究表明,内存和计算需求在不同的应用程序之间独立变化。最近的工作是探索在多核处理器(cmp)的片上存储器中使用混合存储器技术(SRAM+NVM)来支持不同工作负载的不同需求。这些工作建议对架构进行修改,需要在内存层次结构中进行补充管理。相反,我们建议以一种与现有异构多核(HMP)架构模型无缝集成的方式部署混合内存,因此不需要任何架构修改,只需在芯片上集成不同的存储技术。我们评估了移动工作负载的快速(SRAM缓存)和慢速(STT-MRAM缓存)核心类型组合的平台。
{"title":"Exploring fast and slow memories in HMP core types: work-in-progress","authors":"Bryan Donyanavard, Amir Mahdi Hosseini Monazzah, T. Mück, N. Dutt","doi":"10.1145/3125502.3125545","DOIUrl":"https://doi.org/10.1145/3125502.3125545","url":null,"abstract":"Studies have shown memory and computational needs vary independently across applications. Recent work has explored using hybrid memory technology (SRAM+NVM) in on-chip memories of multicore processors (CMPs) to support the varied needs of diverse workloads. Such works suggest architectural modifications that require supplemental management in the memory hierarchy. Instead, we propose to deploy hybrid memory in a manner that integrates seamlessly with the existing heterogeneous multicore (HMP) architectural model, and therefore does not require any architectural modification, simply the integration of different memory technologies on-chip. We evaluate platforms with a combination of fast (SRAM cache) and slow (STT-MRAM cache) core-types for mobile workloads.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115744986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress 一种用于卷积神经网络的高效节能FPGA加速器:正在研究中
Lei Gong, Chao Wang, Xi Li, Hua-ping Chen, Xuehai Zhou
Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.
近年来,fpga被广泛应用于卷积神经网络(CNN)硬件加速器的实现,特别是在移动和嵌入式设备上。然而,大多数现有的加速器都是用与ASIC相同的概念设计的,即来自不同CNN层的所有操作都映射到相同的硬件单元并以多路复用的方式工作。这种方法虽然提高了这些加速器的通用性,但并没有充分利用fpga的可重构性和可定制性,导致一定程度的计算效率下降,在嵌入式平台上更是如此。在本文中,我们提出了一个基于fpga的CNN加速器,所有层都映射到它们自己的片上单元,并作为一个管道并发工作。提出了一种能够找到各层最优并行方案的策略,以消除管道失速,实现较高的资源利用率。此外,在全连接层(FC)上采用了一种基于平衡剪枝的方法来减少计算冗余。作为案例研究,我们在嵌入式FPGA器件Xilinx Zedboard上实现了广泛使用的cnn模型LeNet-5。它的峰值性能为39.78 GOP/s,功率效率为19.6 GOP/s/W,优于以往的方法。
{"title":"A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress","authors":"Lei Gong, Chao Wang, Xi Li, Hua-ping Chen, Xuehai Zhou","doi":"10.1145/3125502.3125534","DOIUrl":"https://doi.org/10.1145/3125502.3125534","url":null,"abstract":"Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129687492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Data analytics enables energy-efficiency and robustness: from mobile to manycores, datacenters, and networks (special session paper) 数据分析使能源效率和健壮性:从移动到多核、数据中心和网络(特别会议论文)
S. Pasricha, J. Doppa, K. Chakrabarty, Saideep Tiku, D. Dauwe, Shi Jin, P. Pande
The amount of data generated and collected across computing platforms every day is not only enormous, but growing at an exponential rate. Advanced data analytics and machine-learning techniques have become increasingly essential to analyze and extract meaning from such "Big Data". These techniques can be very useful to detect patterns and trends to improve the operational behavior of computing platforms, but they also introduce a number of outstanding challenges: (1) How can we design and deploy data analytics and learning mechanisms to improve energy-efficiency in IoT and mobile devices, without introducing significant software overheads? (2) How to use machine learning and analytics techniques for effective designspace exploration during manycore chip design? (3) How can data analytics and learning improve the reliability and energy-efficiency of large-scale cloud datacenters, to cost-effectively support connected embedded and IoT platforms? (4) How can data analytics detect anomalies and increase robustness in the network backbone of emerging cloud datacenter networks? In this paper, we discuss these outstanding problems and describe far-reaching solutions applicable across the interconnected ecosystem of IoT and mobile devices, manycore chips, datacenters, and networks.
每天跨计算平台生成和收集的数据量不仅是巨大的,而且还在以指数级的速度增长。先进的数据分析和机器学习技术对于从“大数据”中分析和提取意义变得越来越重要。这些技术对于检测模式和趋势以改善计算平台的操作行为非常有用,但它们也引入了许多突出的挑战:(1)我们如何设计和部署数据分析和学习机制来提高物联网和移动设备的能源效率,而不引入重大的软件开销?(2)在多核芯片设计过程中,如何使用机器学习和分析技术进行有效的设计空间探索?(3)数据分析和学习如何提高大型云数据中心的可靠性和能效,从而经济高效地支持互联嵌入式和物联网平台?(4)在新兴的云数据中心网络中,数据分析如何检测异常并增加网络骨干的鲁棒性?在本文中,我们讨论了这些突出的问题,并描述了适用于物联网和移动设备、多核芯片、数据中心和网络的互联生态系统的深远解决方案。
{"title":"Data analytics enables energy-efficiency and robustness: from mobile to manycores, datacenters, and networks (special session paper)","authors":"S. Pasricha, J. Doppa, K. Chakrabarty, Saideep Tiku, D. Dauwe, Shi Jin, P. Pande","doi":"10.1145/3125502.3125560","DOIUrl":"https://doi.org/10.1145/3125502.3125560","url":null,"abstract":"The amount of data generated and collected across computing platforms every day is not only enormous, but growing at an exponential rate. Advanced data analytics and machine-learning techniques have become increasingly essential to analyze and extract meaning from such \"Big Data\". These techniques can be very useful to detect patterns and trends to improve the operational behavior of computing platforms, but they also introduce a number of outstanding challenges: (1) How can we design and deploy data analytics and learning mechanisms to improve energy-efficiency in IoT and mobile devices, without introducing significant software overheads? (2) How to use machine learning and analytics techniques for effective designspace exploration during manycore chip design? (3) How can data analytics and learning improve the reliability and energy-efficiency of large-scale cloud datacenters, to cost-effectively support connected embedded and IoT platforms? (4) How can data analytics detect anomalies and increase robustness in the network backbone of emerging cloud datacenter networks? In this paper, we discuss these outstanding problems and describe far-reaching solutions applicable across the interconnected ecosystem of IoT and mobile devices, manycore chips, datacenters, and networks.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"938 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123064263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1