首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
AXI HyperConnect: A Predictable, Hypervisor-level Interconnect for Hardware Accelerators in FPGA SoC AXI HyperConnect: FPGA SoC中硬件加速器的可预测的管理程序级互连
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218652
Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, Giorgiomaria Cicero, G. Buttazzo
FPGA-based system-on-chips (SoC) are powerful computing platforms to implement mixed-criticality systems that require both multiprocessing and hardware acceleration. Virtualization via hypervisor technologies is, de-facto, an effective technique to allow the co-existence of multiple execution domains with different criticality levels in isolation upon the same platform. Implementing such technologies on FPGA-based SoC poses new challenges: one of such is the isolation of hardware accelerators deployed on the FPGA fabric that belong to different domains but share common resources such as a memory bus. This paper proposes AXI HyperConnect, a hypervisor-level hardware component that allows interconnecting hardware accelerators to the same bus while ensuring isolation and predictability features. AXI HyperConnect has been implemented on modern FPGA-SoC by Xilinx and tested with real-world accelerators, including one for Deep Neural Network inference.
基于fpga的片上系统(SoC)是实现需要多处理和硬件加速的混合临界系统的强大计算平台。实际上,通过hypervisor技术实现虚拟化是一种有效的技术,它允许在同一平台上隔离地共存具有不同临界级别的多个执行域。在基于FPGA的SoC上实现这些技术带来了新的挑战:其中之一是部署在FPGA结构上的硬件加速器的隔离,这些硬件加速器属于不同的领域,但共享公共资源,如内存总线。本文提出了AXI HyperConnect,这是一个管理程序级别的硬件组件,允许将硬件加速器互连到同一总线,同时确保隔离和可预测性。AXI HyperConnect已在赛灵思的现代FPGA-SoC上实现,并在现实世界的加速器上进行了测试,其中包括深度神经网络推理加速器。
{"title":"AXI HyperConnect: A Predictable, Hypervisor-level Interconnect for Hardware Accelerators in FPGA SoC","authors":"Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, Giorgiomaria Cicero, G. Buttazzo","doi":"10.1109/DAC18072.2020.9218652","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218652","url":null,"abstract":"FPGA-based system-on-chips (SoC) are powerful computing platforms to implement mixed-criticality systems that require both multiprocessing and hardware acceleration. Virtualization via hypervisor technologies is, de-facto, an effective technique to allow the co-existence of multiple execution domains with different criticality levels in isolation upon the same platform. Implementing such technologies on FPGA-based SoC poses new challenges: one of such is the isolation of hardware accelerators deployed on the FPGA fabric that belong to different domains but share common resources such as a memory bus. This paper proposes AXI HyperConnect, a hypervisor-level hardware component that allows interconnecting hardware accelerators to the same bus while ensuring isolation and predictability features. AXI HyperConnect has been implemented on modern FPGA-SoC by Xilinx and tested with real-world accelerators, including one for Deep Neural Network inference.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"52 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114027614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Late Breaking Results: Can You Hear Me? Towards an Ultra Low-Cost Hearing Screening Device 最新结果:你能听到我吗?迈向超低成本听力筛检装置
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218597
Nils Heitmann, Philipp H. Kindt, S. Chakraborty
Hearing screening devices emit an acoustic signal in the outer ear, which invokes a specific response from a healthy inner ear. However, the high cost of such devices prevents widely deploying them in schools or private homes, especially in developing countries. In this paper, we for the first time show that such tests are also feasible with a device that consists of only one speaker for emitting the signal and using the same speaker – now as a microphone – for also recording the response. Existing devices rely on a speaker and microphone pair, which makes them significantly more complex and costly. We further outline the embedded systems and signal processing challenges that such a setup entails. If successful, it has the potential to make hearing screening available to a much wider population in developing countries.
听力筛查设备在外耳发出声音信号,这引起健康内耳的特定反应。然而,这种设备的高成本阻碍了它们在学校或私人家庭的广泛部署,特别是在发展中国家。在这篇论文中,我们首次证明了这样的测试也是可行的,用一个只由一个扬声器组成的装置来发射信号,并使用同一个扬声器(现在作为麦克风)来记录响应。现有的设备依赖于一对扬声器和麦克风,这使得它们更加复杂和昂贵。我们进一步概述了嵌入式系统和信号处理的挑战,这种设置需要。如果成功,它有可能使发展中国家更广泛的人口获得听力筛查。
{"title":"Late Breaking Results: Can You Hear Me? Towards an Ultra Low-Cost Hearing Screening Device","authors":"Nils Heitmann, Philipp H. Kindt, S. Chakraborty","doi":"10.1109/DAC18072.2020.9218597","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218597","url":null,"abstract":"Hearing screening devices emit an acoustic signal in the outer ear, which invokes a specific response from a healthy inner ear. However, the high cost of such devices prevents widely deploying them in schools or private homes, especially in developing countries. In this paper, we for the first time show that such tests are also feasible with a device that consists of only one speaker for emitting the signal and using the same speaker – now as a microphone – for also recording the response. Existing devices rely on a speaker and microphone pair, which makes them significantly more complex and costly. We further outline the embedded systems and signal processing challenges that such a setup entails. If successful, it has the potential to make hearing screening available to a much wider population in developing countries.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122852437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Verification for Field-coupled Nanocomputing Circuits 场耦合纳米计算电路的验证
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218641
Marcel Walter, R. Wille, F. Sill, Daniel Große, R. Drechsler
With the decline of Moore’s Law, several post-CMOS technologies are currently under heavy consideration. Promising candidates can be found in the class of Field-coupled Nanocomputing (FCN) devices as they allow for highest processing performance with tremendously low energy dissipation. With upcoming design automation in this domain, the need for formal verification approaches arises. Unfortunately, FCN circuits come with certain domain-specific properties that render conventional methods for the verification non-applicable. In this paper, we investigate this issue and propose a verification approach for FCN circuits that addresses this problem. For the first time, this provides researchers and engineers with an automatic method that allows them to check whether an obtained FCN circuit design indeed implements the given/desired function. A prototype implementation demonstrates the applicability of the proposed approach.
随着摩尔定律的衰落,几种后cmos技术目前正受到重视。有希望的候选者可以在场耦合纳米计算(FCN)器件中找到,因为它们允许以极低的能量消耗获得最高的处理性能。随着这个领域中即将到来的设计自动化,出现了对正式验证方法的需求。不幸的是,FCN电路具有某些特定领域的特性,使得传统的验证方法不适用。在本文中,我们研究了这个问题,并提出了一种FCN电路的验证方法来解决这个问题。这首次为研究人员和工程师提供了一种自动方法,使他们能够检查获得的FCN电路设计是否确实实现了给定/期望的功能。一个原型实现证明了所提出方法的适用性。
{"title":"Verification for Field-coupled Nanocomputing Circuits","authors":"Marcel Walter, R. Wille, F. Sill, Daniel Große, R. Drechsler","doi":"10.1109/DAC18072.2020.9218641","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218641","url":null,"abstract":"With the decline of Moore’s Law, several post-CMOS technologies are currently under heavy consideration. Promising candidates can be found in the class of Field-coupled Nanocomputing (FCN) devices as they allow for highest processing performance with tremendously low energy dissipation. With upcoming design automation in this domain, the need for formal verification approaches arises. Unfortunately, FCN circuits come with certain domain-specific properties that render conventional methods for the verification non-applicable. In this paper, we investigate this issue and propose a verification approach for FCN circuits that addresses this problem. For the first time, this provides researchers and engineers with an automatic method that allows them to check whether an obtained FCN circuit design indeed implements the given/desired function. A prototype implementation demonstrates the applicability of the proposed approach.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131378803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Runtime Trust Evaluation and Hardware Trojan Detection Using On-Chip EM Sensors 基于片上电磁传感器的运行时信任评估和硬件木马检测
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218514
Jiaji He, Xiaolong Guo, Haocheng Ma, Yanjiang Liu, Yiqiang Zhao, Yier Jin
It has been widely demonstrated that the utilization of postdeployment trust evaluation approaches, such as side-channel measurements, along with statistical analysis methods is effective for detecting hardware Trojans in fabricated integrated circuits (ICs). However, more sophisticated Trojans proposed recently invalidate these methods with stealthy triggers and very-low side-channel signatures. Upon these challenges, in this paper, we propose an electromagnetic (EM) side-channel based post-fabrication trust evaluation framework which monitors EM radiations at runtime. The key component of the runtime trust evaluation framework is an on-chip EM sensor which can constantly measure and collect EM side-channel information of the target circuit. The simulation results validate the capability of the proposed framework in detecting stealthy hardware Trojans. Further, we fabricate an AES circuit protected by the proposed trust evaluation framework along with four different types of hardware Trojans. The measurements on the fabricated chips prove two key findings. First, the on-chip EM sensor can achieve a higher signal to noise ratio (SNR) and thus facilitate a better Trojan detection accuracy. Second, the trust evaluation framework can help detect different hardware Trojans at runtime.
已经广泛证明,利用部署后信任评估方法,如侧信道测量,以及统计分析方法,可以有效地检测制造集成电路(ic)中的硬件木马。然而,最近提出的更复杂的木马程序通过隐形触发器和非常低的侧信道签名使这些方法无效。针对这些挑战,在本文中,我们提出了一个基于电磁(EM)侧信道的制造后信任评估框架,该框架在运行时监测电磁辐射。运行时信任评估框架的关键部件是片上电磁传感器,该传感器能够持续测量和采集目标电路的电磁侧信道信息。仿真结果验证了该框架检测隐身硬件木马的能力。此外,我们制作了一个AES电路,该电路由所提出的信任评估框架以及四种不同类型的硬件木马保护。对制造芯片的测量证明了两个关键发现。首先,片上电磁传感器可以实现更高的信噪比(SNR),从而提高特洛伊木马的检测精度。其次,信任评估框架可以帮助在运行时检测不同的硬件木马。
{"title":"Runtime Trust Evaluation and Hardware Trojan Detection Using On-Chip EM Sensors","authors":"Jiaji He, Xiaolong Guo, Haocheng Ma, Yanjiang Liu, Yiqiang Zhao, Yier Jin","doi":"10.1109/DAC18072.2020.9218514","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218514","url":null,"abstract":"It has been widely demonstrated that the utilization of postdeployment trust evaluation approaches, such as side-channel measurements, along with statistical analysis methods is effective for detecting hardware Trojans in fabricated integrated circuits (ICs). However, more sophisticated Trojans proposed recently invalidate these methods with stealthy triggers and very-low side-channel signatures. Upon these challenges, in this paper, we propose an electromagnetic (EM) side-channel based post-fabrication trust evaluation framework which monitors EM radiations at runtime. The key component of the runtime trust evaluation framework is an on-chip EM sensor which can constantly measure and collect EM side-channel information of the target circuit. The simulation results validate the capability of the proposed framework in detecting stealthy hardware Trojans. Further, we fabricate an AES circuit protected by the proposed trust evaluation framework along with four different types of hardware Trojans. The measurements on the fabricated chips prove two key findings. First, the on-chip EM sensor can achieve a higher signal to noise ratio (SNR) and thus facilitate a better Trojan detection accuracy. Second, the trust evaluation framework can help detect different hardware Trojans at runtime.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132251344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Access Characteristic Guided Partition for Read Performance Improvement on Solid State Drives 提高固态硬盘读性能的访问特性引导分区
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218540
Yina Lv, Liang Shi, Qiao Li, C. Xue, E. Sha
Solid state drives (SSDs) are now widely deployed due to the development of high-density and low-cost NAND flash memories. Previous works have identified that the read performance of SSDs is degrading along with the development. One of the most critical reasons is the access interference between reads and writes, as the latest NAND flash memories have significant latency gap between reads and writes. This paper addresses this issue with the assistance of access characteristic guided SSD partitioning. First, several server workloads are studied and it is shown that reads and writes can be separated based on their access characteristics. Second, a set of techniques is proposed to place data judiciously for requests separation. Finally, a workload based SSD partitioning scheme is proposed to improve the read performance. The experimental results show that the proposed solution can improve read performance by 36% on average compared with the state-of-the-art solutions.
由于高密度和低成本NAND闪存的发展,固态硬盘(ssd)现在被广泛部署。以往的研究表明,随着技术的发展,ssd的读性能逐渐下降。其中一个最关键的原因是读写之间的访问干扰,因为最新的NAND闪存具有明显的读写延迟差距。本文借助访问特性引导的SSD分区解决了这个问题。首先,研究了几种服务器工作负载,并表明可以根据其访问特征将读取和写入分开。其次,提出了一组技术来明智地放置数据以实现请求分离。最后,提出了一种基于工作负载的SSD分区方案,以提高SSD的读性能。实验结果表明,与现有方案相比,该方案可将读取性能平均提高36%。
{"title":"Access Characteristic Guided Partition for Read Performance Improvement on Solid State Drives","authors":"Yina Lv, Liang Shi, Qiao Li, C. Xue, E. Sha","doi":"10.1109/DAC18072.2020.9218540","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218540","url":null,"abstract":"Solid state drives (SSDs) are now widely deployed due to the development of high-density and low-cost NAND flash memories. Previous works have identified that the read performance of SSDs is degrading along with the development. One of the most critical reasons is the access interference between reads and writes, as the latest NAND flash memories have significant latency gap between reads and writes. This paper addresses this issue with the assistance of access characteristic guided SSD partitioning. First, several server workloads are studied and it is shown that reads and writes can be separated based on their access characteristics. Second, a set of techniques is proposed to place data judiciously for requests separation. Finally, a workload based SSD partitioning scheme is proposed to improve the read performance. The experimental results show that the proposed solution can improve read performance by 36% on average compared with the state-of-the-art solutions.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Time Multiplexing via Circuit Folding 通过电路折叠的时间复用
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218552
Po-Chun Chien, J. H. Jiang
Time multiplexing is an important technique to overcome the bandwidth bottleneck of limited input-output pins in FPGAs. Most prior work tackles the problem from a physical design standpoint to minimize the number of cut nets or Time Division Multiplexing (TDM) ratio through circuit partitioning or routing. In this work, we formulate a new orthogonal approach at the logic level to achieve time multiplexing through structural and functional circuit folding. The new formulation provides a smooth trade-off between bandwidth and throughput. Experiments show the effectiveness of the structural method and improved optimality of the functional method on look-up-table and flip-flop usage.
时间复用是克服fpga有限输入输出引脚带宽瓶颈的重要技术。大多数先前的工作是从物理设计的角度来解决这个问题,通过电路划分或路由来最小化切割网的数量或时分复用(TDM)比率。在这项工作中,我们在逻辑层面制定了一种新的正交方法,通过结构和功能电路折叠来实现时间复用。新配方提供了带宽和吞吐量之间的平滑权衡。实验证明了结构方法的有效性,改进了函数方法在查找表和触发器使用上的最优性。
{"title":"Time Multiplexing via Circuit Folding","authors":"Po-Chun Chien, J. H. Jiang","doi":"10.1109/DAC18072.2020.9218552","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218552","url":null,"abstract":"Time multiplexing is an important technique to overcome the bandwidth bottleneck of limited input-output pins in FPGAs. Most prior work tackles the problem from a physical design standpoint to minimize the number of cut nets or Time Division Multiplexing (TDM) ratio through circuit partitioning or routing. In this work, we formulate a new orthogonal approach at the logic level to achieve time multiplexing through structural and functional circuit folding. The new formulation provides a smooth trade-off between bandwidth and throughput. Experiments show the effectiveness of the structural method and improved optimality of the functional method on look-up-table and flip-flop usage.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116544787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Late Breaking Results: Automatic Adaptive MOM Capacitor Cell Generation for Analog and Mixed-Signal Layout Design 后期突破成果:模拟和混合信号布局设计的自动自适应MOM电容单元生成
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218609
Tzu-Wei Wang, Po-Chang Wu, Mark Po-Hung Lin
This paper introduces the first problem formulation in the literature for automatic MOM capacitor cell generation with adaptive capacitance. Given an expected capacitance value and available metal layers, the proposed capacitor cell generation method can produce a compact MOM capacitor cell with minimized area and matched capacitance. Compared with MOM capacitor cells with non-adaptive capacitance in the previous work, the experimental results show that the proposed adaptive MOM capacitor cell generation method can reduce 25% chip area and 20% power consumption of the capacitor network in successive-approximation-register analog-to-digital converters (SAR ADC).
本文介绍了具有自适应电容的MOM电容电池自动生成的第一问题公式。在给定预期电容值和可用金属层数的情况下,所提出的电容电池生成方法可以生成面积最小、电容匹配的紧凑MOM电容电池。实验结果表明,与以往非自适应电容的MOM电容单元相比,本文提出的自适应MOM电容单元生成方法可使连续逼近寄存器模数转换器(SAR ADC)中电容网络的芯片面积减少25%,功耗降低20%。
{"title":"Late Breaking Results: Automatic Adaptive MOM Capacitor Cell Generation for Analog and Mixed-Signal Layout Design","authors":"Tzu-Wei Wang, Po-Chang Wu, Mark Po-Hung Lin","doi":"10.1109/DAC18072.2020.9218609","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218609","url":null,"abstract":"This paper introduces the first problem formulation in the literature for automatic MOM capacitor cell generation with adaptive capacitance. Given an expected capacitance value and available metal layers, the proposed capacitor cell generation method can produce a compact MOM capacitor cell with minimized area and matched capacitance. Compared with MOM capacitor cells with non-adaptive capacitance in the previous work, the experimental results show that the proposed adaptive MOM capacitor cell generation method can reduce 25% chip area and 20% power consumption of the capacitor network in successive-approximation-register analog-to-digital converters (SAR ADC).","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117066493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick 最新突破性成果:一块砖一块砖地构建片上深度学习内存层次结构
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218728
Isak Edo Vivancos, Sayeh Sharify, M. Nikolic, Ciaran Bannon, M. Mahmoud, Alberto Delmas Lascorz, Andreas Moshovos
Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present Boveda, a lossless on-chip memory compression technique for neural networks operating on fixed-point values. Boveda reduces the datawidth used per block of values to be only as long as necessary: since most values are of small magnitude Boveda drastically reduces their footprint. Boveda can be used to increase the effective on-chip capacity, to reduce off-chip traffic, or to reduce the on-chip memory capacity needed to achieve a performance/energy target. Boveda reduces total model footprint to 53%.
在使用深度学习网络进行推理时,片内和片外存储器之间的数据访问占总能耗的很大一部分。我们提出了Boveda,一种用于在定点值上操作的神经网络的无损片上存储压缩技术。Boveda减少了每个值块使用的数据宽度,只要有必要:因为大多数值都是小幅度的,Boveda大大减少了它们的占用。Boveda可用于增加有效的片上容量,减少片外流量,或减少实现性能/能量目标所需的片上存储器容量。Boveda将模型的总足迹减少到53%。
{"title":"Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick","authors":"Isak Edo Vivancos, Sayeh Sharify, M. Nikolic, Ciaran Bannon, M. Mahmoud, Alberto Delmas Lascorz, Andreas Moshovos","doi":"10.1109/DAC18072.2020.9218728","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218728","url":null,"abstract":"Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present Boveda, a lossless on-chip memory compression technique for neural networks operating on fixed-point values. Boveda reduces the datawidth used per block of values to be only as long as necessary: since most values are of small magnitude Boveda drastically reduces their footprint. Boveda can be used to increase the effective on-chip capacity, to reduce off-chip traffic, or to reduce the on-chip memory capacity needed to achieve a performance/energy target. Boveda reduces total model footprint to 53%.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116255501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Inference with Inaccurate RRAM Devices: Statistical Data, Model Transfer, and On-line Adaptation 准确推断与不准确的RRAM设备:统计数据,模型转移,和在线适应
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218605
G. Charan, Jubin Hazra, K. Beckmann, Xiaocong Du, Gokul Krishnan, R. Joshi, N. Cady, Yu Cao
Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.
电阻式随机存取存储器(RRAM)具有存储密度高、推理速度快、与CMOS兼容等优点,是一种很有前途的内存计算技术。然而,预训练深度神经网络(DNN)模型在RRAM上的映射受到现实设备问题的影响,特别是变异和量化误差,导致推理精度显著降低。在这项工作中,我们首先从300mm晶圆上的65nm RRAM数据中提取这些统计特性。RRAM数据量化程度为10级,方差为50%,导致MNIST和CIFAR-10数据集的准确率分别降至31.76%和10.49%。基于实验数据,我们提出了一种结合机器学习算法和在线自适应的方法,以最小的开销恢复精度。该配方首先应用知识蒸馏(Knowledge Distillation, KD)将理想模型转化为具有统计变化和10个水平的学生模型。此外,将在线稀疏自适应(OSA)方法应用于映射到随机存储器阵列上的DNN模型。通过重要性采样,OSA增加了一个小的SRAM阵列,该阵列稀疏地连接到主RRAM阵列;只有这个SRAM阵列被更新以恢复准确性。如MNIST和CIFAR-10数据集所示,7.86%的面积成本足以达到65nm RRAM器件的基线精度。
{"title":"Accurate Inference with Inaccurate RRAM Devices: Statistical Data, Model Transfer, and On-line Adaptation","authors":"G. Charan, Jubin Hazra, K. Beckmann, Xiaocong Du, Gokul Krishnan, R. Joshi, N. Cady, Yu Cao","doi":"10.1109/DAC18072.2020.9218605","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218605","url":null,"abstract":"Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123032498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture PIM-Prune:基于交叉棒的内存中进程架构的细粒度DCNN剪枝
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218523
Chaoqun Chu, Yanzhi Wang, Yilong Zhao, Xiaolong Ma, Shaokai Ye, Yunyan Hong, Xiaoyao Liang, Yinhe Han, Li Jiang
Deep Convolution Neural network (DCNN) pruning is an efficient way to reduce the resource and power consumption in a DCNN accelerator. Exploiting the sparsity in the weight matrices of DCNNs, however, is nontrivial if we deploy these DC-NNs in a crossbar-based Process-In-Memory (PIM) architecture, because of the crossbar structure. Structural pruning-exploiting a coarse-grained sparsity, such as filter/channel-level pruning-can result in a compressed weight matrix that fits the crossbar structure. However, this pruning method inevitably degrades the model accuracy. To solve this problem, in this paper, we propose PIM-PRUNE to exploit the finer-grained sparsity in PIM-architecture, and the resulting compressed weight matrices can significantly reduce the demand of crossbars with negligible accuracy loss.Further, we explore the design space of the crossbar, such as the crossbar size and aspect-ratio, from a new point-of-view of resource-oriented pruning. We find a trade-off existing between the pruning algorithm and the hardware overhead: a PIM with smaller crossbars is more friendly for pruning methods; however, the resulting peripheral circuit cause higher power consumption. Given a specific DCNN, we can suggest a sweet-spot of crossbar design to the optimal overall energy efficiency. Experimental results show that the proposed pruning method applied on Resnet18 can achieve up to 24.85× and 3.56× higher compression rate of occupied crossbars on CifarlO and Imagenet, respectively; while the accuracy loss is negligible, which is 4.56× and 1.99× better than the state-of-art methods.
深度卷积神经网络(DCNN)剪枝是减少DCNN加速器资源和功耗的有效方法。然而,如果我们将这些dc - nn部署在基于交叉栏的内存进程(PIM)体系结构中,由于交叉栏结构的原因,利用DCNNs权重矩阵中的稀疏性是非常重要的。结构性修剪——利用粗粒度的稀疏性,例如过滤器/通道级修剪——可以产生适合横杆结构的压缩权重矩阵。然而,这种修剪方法不可避免地降低了模型的精度。为了解决这一问题,本文提出了PIM-PRUNE,利用pim -架构中的细粒度稀疏性,得到的压缩权矩阵可以显著减少对横条的需求,且精度损失可以忽略不计。在此基础上,从资源导向剪枝的新视角出发,探讨了横梁的尺寸、纵横比等设计空间。我们发现剪枝算法和硬件开销之间存在权衡:具有较小交叉条的PIM对剪枝方法更友好;然而,由此产生的外围电路导致更高的功耗。给定一个特定的DCNN,我们可以建议一个最佳的横杆设计点,以达到最佳的整体能源效率。实验结果表明,该方法在Resnet18上的压缩率分别比CifarlO和Imagenet上的压缩率提高了24.85倍和3.56倍;而精度损失可以忽略不计,分别比现有方法提高了4.56倍和1.99倍。
{"title":"PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture","authors":"Chaoqun Chu, Yanzhi Wang, Yilong Zhao, Xiaolong Ma, Shaokai Ye, Yunyan Hong, Xiaoyao Liang, Yinhe Han, Li Jiang","doi":"10.1109/DAC18072.2020.9218523","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218523","url":null,"abstract":"Deep Convolution Neural network (DCNN) pruning is an efficient way to reduce the resource and power consumption in a DCNN accelerator. Exploiting the sparsity in the weight matrices of DCNNs, however, is nontrivial if we deploy these DC-NNs in a crossbar-based Process-In-Memory (PIM) architecture, because of the crossbar structure. Structural pruning-exploiting a coarse-grained sparsity, such as filter/channel-level pruning-can result in a compressed weight matrix that fits the crossbar structure. However, this pruning method inevitably degrades the model accuracy. To solve this problem, in this paper, we propose PIM-PRUNE to exploit the finer-grained sparsity in PIM-architecture, and the resulting compressed weight matrices can significantly reduce the demand of crossbars with negligible accuracy loss.Further, we explore the design space of the crossbar, such as the crossbar size and aspect-ratio, from a new point-of-view of resource-oriented pruning. We find a trade-off existing between the pruning algorithm and the hardware overhead: a PIM with smaller crossbars is more friendly for pruning methods; however, the resulting peripheral circuit cause higher power consumption. Given a specific DCNN, we can suggest a sweet-spot of crossbar design to the optimal overall energy efficiency. Experimental results show that the proposed pruning method applied on Resnet18 can achieve up to 24.85× and 3.56× higher compression rate of occupied crossbars on CifarlO and Imagenet, respectively; while the accuracy loss is negligible, which is 4.56× and 1.99× better than the state-of-art methods.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1