首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Late Breaking Results: Automatic Adaptive MOM Capacitor Cell Generation for Analog and Mixed-Signal Layout Design 后期突破成果:模拟和混合信号布局设计的自动自适应MOM电容单元生成
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218609
Tzu-Wei Wang, Po-Chang Wu, Mark Po-Hung Lin
This paper introduces the first problem formulation in the literature for automatic MOM capacitor cell generation with adaptive capacitance. Given an expected capacitance value and available metal layers, the proposed capacitor cell generation method can produce a compact MOM capacitor cell with minimized area and matched capacitance. Compared with MOM capacitor cells with non-adaptive capacitance in the previous work, the experimental results show that the proposed adaptive MOM capacitor cell generation method can reduce 25% chip area and 20% power consumption of the capacitor network in successive-approximation-register analog-to-digital converters (SAR ADC).
本文介绍了具有自适应电容的MOM电容电池自动生成的第一问题公式。在给定预期电容值和可用金属层数的情况下,所提出的电容电池生成方法可以生成面积最小、电容匹配的紧凑MOM电容电池。实验结果表明,与以往非自适应电容的MOM电容单元相比,本文提出的自适应MOM电容单元生成方法可使连续逼近寄存器模数转换器(SAR ADC)中电容网络的芯片面积减少25%,功耗降低20%。
{"title":"Late Breaking Results: Automatic Adaptive MOM Capacitor Cell Generation for Analog and Mixed-Signal Layout Design","authors":"Tzu-Wei Wang, Po-Chang Wu, Mark Po-Hung Lin","doi":"10.1109/DAC18072.2020.9218609","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218609","url":null,"abstract":"This paper introduces the first problem formulation in the literature for automatic MOM capacitor cell generation with adaptive capacitance. Given an expected capacitance value and available metal layers, the proposed capacitor cell generation method can produce a compact MOM capacitor cell with minimized area and matched capacitance. Compared with MOM capacitor cells with non-adaptive capacitance in the previous work, the experimental results show that the proposed adaptive MOM capacitor cell generation method can reduce 25% chip area and 20% power consumption of the capacitor network in successive-approximation-register analog-to-digital converters (SAR ADC).","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117066493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick 最新突破性成果:一块砖一块砖地构建片上深度学习内存层次结构
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218728
Isak Edo Vivancos, Sayeh Sharify, M. Nikolic, Ciaran Bannon, M. Mahmoud, Alberto Delmas Lascorz, Andreas Moshovos
Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present Boveda, a lossless on-chip memory compression technique for neural networks operating on fixed-point values. Boveda reduces the datawidth used per block of values to be only as long as necessary: since most values are of small magnitude Boveda drastically reduces their footprint. Boveda can be used to increase the effective on-chip capacity, to reduce off-chip traffic, or to reduce the on-chip memory capacity needed to achieve a performance/energy target. Boveda reduces total model footprint to 53%.
在使用深度学习网络进行推理时,片内和片外存储器之间的数据访问占总能耗的很大一部分。我们提出了Boveda,一种用于在定点值上操作的神经网络的无损片上存储压缩技术。Boveda减少了每个值块使用的数据宽度,只要有必要:因为大多数值都是小幅度的,Boveda大大减少了它们的占用。Boveda可用于增加有效的片上容量,减少片外流量,或减少实现性能/能量目标所需的片上存储器容量。Boveda将模型的总足迹减少到53%。
{"title":"Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick","authors":"Isak Edo Vivancos, Sayeh Sharify, M. Nikolic, Ciaran Bannon, M. Mahmoud, Alberto Delmas Lascorz, Andreas Moshovos","doi":"10.1109/DAC18072.2020.9218728","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218728","url":null,"abstract":"Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present Boveda, a lossless on-chip memory compression technique for neural networks operating on fixed-point values. Boveda reduces the datawidth used per block of values to be only as long as necessary: since most values are of small magnitude Boveda drastically reduces their footprint. Boveda can be used to increase the effective on-chip capacity, to reduce off-chip traffic, or to reduce the on-chip memory capacity needed to achieve a performance/energy target. Boveda reduces total model footprint to 53%.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116255501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time Multiplexing via Circuit Folding 通过电路折叠的时间复用
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218552
Po-Chun Chien, J. H. Jiang
Time multiplexing is an important technique to overcome the bandwidth bottleneck of limited input-output pins in FPGAs. Most prior work tackles the problem from a physical design standpoint to minimize the number of cut nets or Time Division Multiplexing (TDM) ratio through circuit partitioning or routing. In this work, we formulate a new orthogonal approach at the logic level to achieve time multiplexing through structural and functional circuit folding. The new formulation provides a smooth trade-off between bandwidth and throughput. Experiments show the effectiveness of the structural method and improved optimality of the functional method on look-up-table and flip-flop usage.
时间复用是克服fpga有限输入输出引脚带宽瓶颈的重要技术。大多数先前的工作是从物理设计的角度来解决这个问题,通过电路划分或路由来最小化切割网的数量或时分复用(TDM)比率。在这项工作中,我们在逻辑层面制定了一种新的正交方法,通过结构和功能电路折叠来实现时间复用。新配方提供了带宽和吞吐量之间的平滑权衡。实验证明了结构方法的有效性,改进了函数方法在查找表和触发器使用上的最优性。
{"title":"Time Multiplexing via Circuit Folding","authors":"Po-Chun Chien, J. H. Jiang","doi":"10.1109/DAC18072.2020.9218552","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218552","url":null,"abstract":"Time multiplexing is an important technique to overcome the bandwidth bottleneck of limited input-output pins in FPGAs. Most prior work tackles the problem from a physical design standpoint to minimize the number of cut nets or Time Division Multiplexing (TDM) ratio through circuit partitioning or routing. In this work, we formulate a new orthogonal approach at the logic level to achieve time multiplexing through structural and functional circuit folding. The new formulation provides a smooth trade-off between bandwidth and throughput. Experiments show the effectiveness of the structural method and improved optimality of the functional method on look-up-table and flip-flop usage.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116544787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symbolic Computer Algebra and SAT Based Information Forwarding for Fully Automatic Divider Verification 基于符号计算机代数和SAT的全自动分压器验证信息转发
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218721
Christoph Scholl, Alexander Konrad
During the last few years Symbolic Computer Algebra (SCA) delivered excellent results in the verification of large integer and finite field multipliers at the gate level. In contrast to those encouraging advances, SCA-based divider verification has been still in its infancy and awaited a major breakthrough. In this paper we analyze the fundamental reasons that prevented the success for SCA-based divider verification so far and present SAT Based Information Forwarding (SBIF). SBIF enhances SCA-based backward rewriting by information propagation in the opposite direction. We successfully apply the method to the fully automatic formal verification of large non-restoring dividers.
在过去的几年中,符号计算机代数(SCA)在门级验证大整数和有限域乘法器方面取得了优异的成绩。与这些令人鼓舞的进展相比,基于sca的分隔器验证仍处于起步阶段,等待重大突破。本文分析了迄今为止阻碍基于sca的分频器验证成功的根本原因,提出了基于SAT的信息转发(SBIF)。SBIF通过反向信息传播增强了基于sca的反向重写。我们成功地将该方法应用于大型非恢复分频器的全自动形式化验证。
{"title":"Symbolic Computer Algebra and SAT Based Information Forwarding for Fully Automatic Divider Verification","authors":"Christoph Scholl, Alexander Konrad","doi":"10.1109/DAC18072.2020.9218721","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218721","url":null,"abstract":"During the last few years Symbolic Computer Algebra (SCA) delivered excellent results in the verification of large integer and finite field multipliers at the gate level. In contrast to those encouraging advances, SCA-based divider verification has been still in its infancy and awaited a major breakthrough. In this paper we analyze the fundamental reasons that prevented the success for SCA-based divider verification so far and present SAT Based Information Forwarding (SBIF). SBIF enhances SCA-based backward rewriting by information propagation in the opposite direction. We successfully apply the method to the fully automatic formal verification of large non-restoring dividers.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
BitPruner: Network Pruning for Bit-serial Accelerators BitPruner:位串行加速器的网络修剪
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218534
Xiandong Zhao, Ying Wang, Cheng Liu, Cong Shi, Kaijie Tu, Lei Zhang
Bit-serial architectures (BSAs) are becoming increasingly popular in low power neural network processor (NNP) design. However, the performance and efficiency of state-of-the-art BSA NNPs are heavily depending on the distribution of ineffectual weight-bits of the running neural network. To boost the efficiency of third-party BSA accelerators, this work presents Bit-Pruner, a software approach to learn BSA-favored neural networks without resorting to hardware modifications. The techniques proposed in this work not only progressively prune but also structure the non-zero bits in weights, so that the number of zero-bits in the model can be increased and also load-balanced to suit the architecture of the target BSA accelerators. According to our experiments on a set of representative neural networks, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation. When the bit-pruned models are deployed onto typical BSA accelerators, the average performance is 2.1X and 1.5X higher than the baselines running non-pruned and weight-pruned networks, respectively.
位串行架构(BSAs)在低功耗神经网络处理器(NNP)设计中越来越受欢迎。然而,最先进的BSA nnp的性能和效率在很大程度上取决于运行神经网络的无效权重位的分布。为了提高第三方BSA加速器的效率,本研究提出了Bit-Pruner,这是一种无需修改硬件即可学习BSA青睐的神经网络的软件方法。本文提出的技术不仅可以对非零比特的权重进行逐步修剪,还可以对非零比特的权重进行结构化,从而可以增加模型中零比特的数量,并且可以负载均衡以适应目标BSA加速器的结构。根据我们在一组具有代表性的神经网络上的实验,Bit-Pruner将比特稀疏度提高到94.4%,而精度下降可以忽略不计。当将位修剪模型部署到典型的BSA加速器上时,平均性能分别比运行未修剪和权重修剪网络的基线高2.1倍和1.5倍。
{"title":"BitPruner: Network Pruning for Bit-serial Accelerators","authors":"Xiandong Zhao, Ying Wang, Cheng Liu, Cong Shi, Kaijie Tu, Lei Zhang","doi":"10.1109/DAC18072.2020.9218534","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218534","url":null,"abstract":"Bit-serial architectures (BSAs) are becoming increasingly popular in low power neural network processor (NNP) design. However, the performance and efficiency of state-of-the-art BSA NNPs are heavily depending on the distribution of ineffectual weight-bits of the running neural network. To boost the efficiency of third-party BSA accelerators, this work presents Bit-Pruner, a software approach to learn BSA-favored neural networks without resorting to hardware modifications. The techniques proposed in this work not only progressively prune but also structure the non-zero bits in weights, so that the number of zero-bits in the model can be increased and also load-balanced to suit the architecture of the target BSA accelerators. According to our experiments on a set of representative neural networks, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation. When the bit-pruned models are deployed onto typical BSA accelerators, the average performance is 2.1X and 1.5X higher than the baselines running non-pruned and weight-pruned networks, respectively.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123816366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
INVITED: Computational Methods of Biological Exploration 邀请:生物探索的计算方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218671
Louis K. Scheffer
Our technical ability to collect data about biological systems far outpaces our ability to understand them. Historically, for example, we have had complete and explicit genomes for almost two decades, but we still have no idea what many genes do. More recently a similar situation has arisen, where we can reconstruct huge neural circuits, and/or watch them operate in the brain, but still don’t know how they work. This talk covers this second and newer problem, understanding neural circuits. We introduce a variety of computational tools currently being used to attack this data-rich, understanding-poor problems. Examples include dimensionality reduction for nonlinear systems, looking for known and proposed circuits, and using machine learning for parameter estimation. One general theme is the use of biological priors, to help fill in unknowns, see if proposed solutions are feasible, and more generally aid understanding.
我们收集生物系统数据的技术能力远远超过了我们理解它们的能力。例如,从历史上看,我们拥有完整和明确的基因组已经近二十年了,但我们仍然不知道许多基因的作用。最近出现了类似的情况,我们可以重建巨大的神经回路,并/或观察它们在大脑中的运作,但仍然不知道它们是如何工作的。这次演讲涵盖了第二个也是较新的问题,理解神经回路。我们介绍了各种各样的计算工具,目前被用来解决这个数据丰富,理解贫乏的问题。例子包括非线性系统的降维,寻找已知和建议的电路,以及使用机器学习进行参数估计。一个普遍的主题是利用生物先验来帮助填补未知,看看所提出的解决方案是否可行,并更普遍地帮助理解。
{"title":"INVITED: Computational Methods of Biological Exploration","authors":"Louis K. Scheffer","doi":"10.1109/DAC18072.2020.9218671","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218671","url":null,"abstract":"Our technical ability to collect data about biological systems far outpaces our ability to understand them. Historically, for example, we have had complete and explicit genomes for almost two decades, but we still have no idea what many genes do. More recently a similar situation has arisen, where we can reconstruct huge neural circuits, and/or watch them operate in the brain, but still don’t know how they work. This talk covers this second and newer problem, understanding neural circuits. We introduce a variety of computational tools currently being used to attack this data-rich, understanding-poor problems. Examples include dimensionality reduction for nonlinear systems, looking for known and proposed circuits, and using machine learning for parameter estimation. One general theme is the use of biological priors, to help fill in unknowns, see if proposed solutions are feasible, and more generally aid understanding.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124078119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems 一种用于CPU-GPU集成系统的简单缓存一致性方案
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218664
Ardhi Wiratama Baskara Yudha, Reza Pulungan, H. Hoffmann, Yan Solihin
This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPU-GPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5–7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols.
本文提出了一种加速运行在CPU-GPU集成系统上的应用程序的新方法。许多集成的CPU-GPU系统使用缓存一致的共享内存进行通信。例如,CPU为GPU生成数据后,GPU在访问数据时可能会将数据拉入缓存。在这种基于拉的方法中,数据驻留在共享缓存中,直到GPU访问它,导致第一个GPU访问缓存线的加载延迟很长。在这项工作中,我们提出了一种新的、基于推送的一致性机制,该机制通过自动将数据从CPU移动到GPU的最后一级缓存来明确地利用CPU和GPU的生产者-消费者关系。提议的机制导致GPU L2缓存丢失率的显著降低,并随之提高整体性能。我们的实验表明,提出的方案可以提高性能高达37%,典型的改进在5-7%的范围内。我们发现,即使被测试的应用程序没有从建议的方法中获益,它们的性能也不会因为我们的技术而下降。虽然我们展示了所提出的方案如何与传统的缓存一致性机制共存,但我们认为它也可以作为现有协议的更简单替代品。
{"title":"A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems","authors":"Ardhi Wiratama Baskara Yudha, Reza Pulungan, H. Hoffmann, Yan Solihin","doi":"10.1109/DAC18072.2020.9218664","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218664","url":null,"abstract":"This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPU-GPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5–7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"54 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120904137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stealing Your Data from Compressed Machine Learning Models 从压缩机器学习模型中窃取数据
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218633
Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen
Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.
机器学习模型已经广泛应用于许多现实世界的任务中。当非专业数据持有者希望使用第三方机器学习服务进行模型训练时,保护训练数据的机密性至关重要。在本文中,我们首次探讨了恶意机器学习提供者提供数据持有者定制的训练代码(包括实际部署中必不可少的模型压缩)的潜在隐私泄露情况。提供者无法访问由安全第三方托管的训练过程,但可以在模型公开发布时查询模型。因此,对手甚至可以从这些为资源有限的设备量身定制的深度压缩模型中提取高质量的敏感训练数据。我们的调查表明,现有的压缩(如量化)可以通过同时降低模型精度和记忆数据质量来防御这种攻击。为了克服这种防御,我们首先尝试从对手的角度设计一个简单但隐蔽的量化相关编码攻击流。相应开发了数据预处理、分层数据权重关联正则化、数据感知量化三个集成组件。大量的实验结果表明,我们的框架可以保持从压缩模型中窃取数据的回避性和有效性。
{"title":"Stealing Your Data from Compressed Machine Learning Models","authors":"Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen","doi":"10.1109/DAC18072.2020.9218633","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218633","url":null,"abstract":"Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127039852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture PIM-Prune:基于交叉棒的内存中进程架构的细粒度DCNN剪枝
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218523
Chaoqun Chu, Yanzhi Wang, Yilong Zhao, Xiaolong Ma, Shaokai Ye, Yunyan Hong, Xiaoyao Liang, Yinhe Han, Li Jiang
Deep Convolution Neural network (DCNN) pruning is an efficient way to reduce the resource and power consumption in a DCNN accelerator. Exploiting the sparsity in the weight matrices of DCNNs, however, is nontrivial if we deploy these DC-NNs in a crossbar-based Process-In-Memory (PIM) architecture, because of the crossbar structure. Structural pruning-exploiting a coarse-grained sparsity, such as filter/channel-level pruning-can result in a compressed weight matrix that fits the crossbar structure. However, this pruning method inevitably degrades the model accuracy. To solve this problem, in this paper, we propose PIM-PRUNE to exploit the finer-grained sparsity in PIM-architecture, and the resulting compressed weight matrices can significantly reduce the demand of crossbars with negligible accuracy loss.Further, we explore the design space of the crossbar, such as the crossbar size and aspect-ratio, from a new point-of-view of resource-oriented pruning. We find a trade-off existing between the pruning algorithm and the hardware overhead: a PIM with smaller crossbars is more friendly for pruning methods; however, the resulting peripheral circuit cause higher power consumption. Given a specific DCNN, we can suggest a sweet-spot of crossbar design to the optimal overall energy efficiency. Experimental results show that the proposed pruning method applied on Resnet18 can achieve up to 24.85× and 3.56× higher compression rate of occupied crossbars on CifarlO and Imagenet, respectively; while the accuracy loss is negligible, which is 4.56× and 1.99× better than the state-of-art methods.
深度卷积神经网络(DCNN)剪枝是减少DCNN加速器资源和功耗的有效方法。然而,如果我们将这些dc - nn部署在基于交叉栏的内存进程(PIM)体系结构中,由于交叉栏结构的原因,利用DCNNs权重矩阵中的稀疏性是非常重要的。结构性修剪——利用粗粒度的稀疏性,例如过滤器/通道级修剪——可以产生适合横杆结构的压缩权重矩阵。然而,这种修剪方法不可避免地降低了模型的精度。为了解决这一问题,本文提出了PIM-PRUNE,利用pim -架构中的细粒度稀疏性,得到的压缩权矩阵可以显著减少对横条的需求,且精度损失可以忽略不计。在此基础上,从资源导向剪枝的新视角出发,探讨了横梁的尺寸、纵横比等设计空间。我们发现剪枝算法和硬件开销之间存在权衡:具有较小交叉条的PIM对剪枝方法更友好;然而,由此产生的外围电路导致更高的功耗。给定一个特定的DCNN,我们可以建议一个最佳的横杆设计点,以达到最佳的整体能源效率。实验结果表明,该方法在Resnet18上的压缩率分别比CifarlO和Imagenet上的压缩率提高了24.85倍和3.56倍;而精度损失可以忽略不计,分别比现有方法提高了4.56倍和1.99倍。
{"title":"PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture","authors":"Chaoqun Chu, Yanzhi Wang, Yilong Zhao, Xiaolong Ma, Shaokai Ye, Yunyan Hong, Xiaoyao Liang, Yinhe Han, Li Jiang","doi":"10.1109/DAC18072.2020.9218523","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218523","url":null,"abstract":"Deep Convolution Neural network (DCNN) pruning is an efficient way to reduce the resource and power consumption in a DCNN accelerator. Exploiting the sparsity in the weight matrices of DCNNs, however, is nontrivial if we deploy these DC-NNs in a crossbar-based Process-In-Memory (PIM) architecture, because of the crossbar structure. Structural pruning-exploiting a coarse-grained sparsity, such as filter/channel-level pruning-can result in a compressed weight matrix that fits the crossbar structure. However, this pruning method inevitably degrades the model accuracy. To solve this problem, in this paper, we propose PIM-PRUNE to exploit the finer-grained sparsity in PIM-architecture, and the resulting compressed weight matrices can significantly reduce the demand of crossbars with negligible accuracy loss.Further, we explore the design space of the crossbar, such as the crossbar size and aspect-ratio, from a new point-of-view of resource-oriented pruning. We find a trade-off existing between the pruning algorithm and the hardware overhead: a PIM with smaller crossbars is more friendly for pruning methods; however, the resulting peripheral circuit cause higher power consumption. Given a specific DCNN, we can suggest a sweet-spot of crossbar design to the optimal overall energy efficiency. Experimental results show that the proposed pruning method applied on Resnet18 can achieve up to 24.85× and 3.56× higher compression rate of occupied crossbars on CifarlO and Imagenet, respectively; while the accuracy loss is negligible, which is 4.56× and 1.99× better than the state-of-art methods.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Accurate Inference with Inaccurate RRAM Devices: Statistical Data, Model Transfer, and On-line Adaptation 准确推断与不准确的RRAM设备:统计数据,模型转移,和在线适应
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218605
G. Charan, Jubin Hazra, K. Beckmann, Xiaocong Du, Gokul Krishnan, R. Joshi, N. Cady, Yu Cao
Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.
电阻式随机存取存储器(RRAM)具有存储密度高、推理速度快、与CMOS兼容等优点,是一种很有前途的内存计算技术。然而,预训练深度神经网络(DNN)模型在RRAM上的映射受到现实设备问题的影响,特别是变异和量化误差,导致推理精度显著降低。在这项工作中,我们首先从300mm晶圆上的65nm RRAM数据中提取这些统计特性。RRAM数据量化程度为10级,方差为50%,导致MNIST和CIFAR-10数据集的准确率分别降至31.76%和10.49%。基于实验数据,我们提出了一种结合机器学习算法和在线自适应的方法,以最小的开销恢复精度。该配方首先应用知识蒸馏(Knowledge Distillation, KD)将理想模型转化为具有统计变化和10个水平的学生模型。此外,将在线稀疏自适应(OSA)方法应用于映射到随机存储器阵列上的DNN模型。通过重要性采样,OSA增加了一个小的SRAM阵列,该阵列稀疏地连接到主RRAM阵列;只有这个SRAM阵列被更新以恢复准确性。如MNIST和CIFAR-10数据集所示,7.86%的面积成本足以达到65nm RRAM器件的基线精度。
{"title":"Accurate Inference with Inaccurate RRAM Devices: Statistical Data, Model Transfer, and On-line Adaptation","authors":"G. Charan, Jubin Hazra, K. Beckmann, Xiaocong Du, Gokul Krishnan, R. Joshi, N. Cady, Yu Cao","doi":"10.1109/DAC18072.2020.9218605","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218605","url":null,"abstract":"Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123032498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1