Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文中文

Efficient access scheme for multi-bank based NTT architecture through conflict graph 基于冲突图的多银行NTT架构的高效接入方案

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530656

Xiangren Chen, Bohan Yang, Yong Lu, S. Yin, Shaojun Wei, Leibo Liu

Number Theoretical Transform (NTT) hardware accelerator becomes crucial building block in many cryptosystems like post-quantum cryptography. In this paper, we provide new insights into the construction of conflict-free memory mapping scheme (CFMMS) for multi-bank NTT architecture. Firstly, we offer parallel loop structure of arbitrary-radix NTT and propose two point-fetching modes. Afterwards, we transform the conflict-free mapping problem into conflict graph and develop novel heuristic to explore the design space of CFMMS, which turns out more efficient access scheme than classic works. To further verify the methodology, we design high-performance NTT/INTT kernels for Dilithium, whose area-time efficiency significantly outperforms state-of-the-art works on the similar FPGA platform.

数字理论变换(NTT)硬件加速器已成为后量子密码等许多密码系统的重要组成部分。在本文中，我们对多银行NTT架构的无冲突内存映射方案(CFMMS)的构建提供了新的见解。首先，提出了任意基数NTT的并行环路结构，并提出了两种取点模式。在此基础上，将无冲突映射问题转化为冲突图，提出了一种新的启发式方法来探索CFMMS的设计空间，得到了比经典方法更有效的访问方案。为了进一步验证该方法，我们为Dilithium设计了高性能的NTT/INTT内核，其面积时间效率显着优于同类FPGA平台上的最新工作。

引用次数: 1

ASTERS: adaptable threshold spike-timing neuromorphic design with twin-column ReRAM synapses aster:双列ReRAM突触的自适应阈值峰值时序神经形态设计

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530591

Ziru Li, Qilin Zheng, Bonan Yan, Ru Huang, Bing Li, Yiran Chen

Complex event-driven neuron dynamics was an obstacle to implementing efficient brain-inspired computing architectures with VLSI circuits. To solve this problem and harness the event-driven advantage, we propose ASTERS, a resistive random-access memory (ReRAM) based neuromorphic design to conduct the time-to-first-spike SNN inference. In addition to the fundamental novel axon and neuron circuits, we also propose two techniques through hardware-software co-design: "Multi-Level Firing Threshold Adjustment" to mitigate the impact of ReRAM device process variations, and "Timing Threshold Adjustment" to further speed up the computation. Experimental results show that our cross-layer solution ASTERS achieves more than 34.7% energy savings compared to the existing spiking neuromorphic designs, meanwhile maintaining 90.1% accuracy under the process variations with a 20% standard deviation.

复杂的事件驱动神经元动力学是在VLSI电路中实现高效脑启发计算架构的障碍。为了解决这个问题并利用事件驱动的优势，我们提出了一种基于电阻随机存取存储器(ReRAM)的神经形态设计，用于进行SNN的时间到第一尖峰推理。除了基本新颖的轴突和神经元电路外，我们还提出了两种通过硬件软件协同设计的技术:“多级触发阈值调整”以减轻ReRAM器件过程变化的影响，以及“定时阈值调整”以进一步加快计算速度。实验结果表明，与现有的峰值神经形态设计相比，我们的跨层解决方案aster节能34.7%以上，同时在工艺变化下保持90.1%的精度，标准差为20%。

引用次数: 2

CDB: critical data backup design for consumer devices with high-density flash based hybrid storage CDB:基于高密度闪存混合存储的消费者设备关键数据备份设计

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530468

Longfei Luo, Dingcui Yu, Liang Shi, Chuanming Ding, Changlong Li, E. Sha

Hybrid flash based storage constructed with high-density and low-cost flash memory are becoming increasingly popular in consumer devices during the last decade. However, to protect critical data, existing methods are designed for improving reliability of consumer devices with non-hybrid flash storage. Based on evaluations and analysis, these methods will result in significant performance and lifetime degradation in consumer devices with hybrid storage. The reason is that different kinds of memory in hybrid storage have different characteristics, such as performance and access granularity. To address the above problems, a critical data backup (CDB) method is proposed to backup designated critical data with making full use of different kinds of memory in hybrid storage. Experiment results show that compared with the state-of-the-arts, CDB achieves encouraging performance and lifetime improvement.

在过去的十年中，以高密度和低成本闪存为基础的混合闪存在消费设备中越来越受欢迎。然而，为了保护关键数据，现有的方法是为了提高非混合闪存的可靠性而设计的。基于评估和分析，这些方法将导致使用混合存储的消费设备的性能和寿命显著下降。原因是混合存储中不同类型的内存具有不同的特性，例如性能和访问粒度。针对上述问题，提出了一种关键数据备份(CDB)方法，充分利用混合存储中不同类型的内存对指定的关键数据进行备份。实验结果表明，与最先进的技术相比，CDB的性能和寿命都得到了令人鼓舞的提高。

引用次数: 1

MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation MC-CIM:用于高效立体匹配成本计算的可重构内存计算

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530477

Zhiheng Yue, Yabing Wang, Leibo Liu, Shaojun Wei, S. Yin

This paper proposes the design of a computation-in-memory for stereo matching cost computation. The matching cost computation incurs large energy and latency overhead because of frequent memory access. To overcome previous design limitations, this work, named MC-CIM, performs matching cost computation without incurring memory access and introduces several key features. (1) Lightweight balanced computing unit is integrated within cell array to reduce memory access and improve system throughput. (2) Self-optimized circuit design enables to alter arithmetic operation for matching algorithm in various scenario. (3) Flexible data mapping method and reconfigurable digital peripheral explore maximum parallelism on different algorithm and bit-precision. The proposed design is implemented in 28nm technology and achieves average performance of 277 TOPs/W.

提出了一种用于立体匹配代价计算的内存计算设计。由于频繁的内存访问，匹配成本计算会带来较大的能量和延迟开销。为了克服以前的设计限制，这项名为MC-CIM的工作在不需要内存访问的情况下执行匹配成本计算，并引入了几个关键特性。(1)在单元阵列内集成轻量级平衡计算单元，减少内存访问，提高系统吞吐量。(2)自优化电路设计，可以在各种场景下改变匹配算法的运算。(3)灵活的数据映射方法和可重构数字外设探索不同算法和位精度的最大并行度。该设计采用28nm工艺实现，平均性能为277 TOPs/W。

引用次数: 1

A fast parameter tuning framework via transfer learning and multi-objective bayesian optimization 基于迁移学习和多目标贝叶斯优化的快速参数整定框架

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530430

Zheng Zhang, Tinghuan Chen, Jiaxin Huang, Meng Zhang

Design space exploration (DSE) can automatically and effectively determine design parameters to achieve the optimal performance, power and area (PPA) in very large-scale integration (VLSI) design. The lack of prior knowledge causes low efficient exploration. In this paper, a fast parameter tuning framework via transfer learning and multi-objective Bayesian optimization is proposed to quickly find the optimal design parameters. Gaussian Copula is utilized to establish the correlation of the implemented technology. The prior knowledge is integrated into multi-objective Bayesian optimization through transforming the PPA data to residual observation. The uncertainty-aware search acquisition function is employed to explore design space efficiently. Experiments on a CPU design show that this framework can achieve a higher quality of Pareto frontier with less design flow running than state-of-the-art methodologies.

设计空间探索(DSE)可以自动有效地确定设计参数，以实现超大规模集成电路(VLSI)设计中的最佳性能、功耗和面积(PPA)。由于缺乏先验知识，导致搜索效率低下。本文提出了一种基于迁移学习和多目标贝叶斯优化的快速参数整定框架，以快速找到最优设计参数。利用高斯Copula建立了所实现技术之间的相关性。通过将PPA数据转化为残差观测值，将先验知识整合到多目标贝叶斯优化中。利用不确定性感知搜索获取功能，有效地探索设计空间。在CPU设计上的实验表明，与最先进的方法相比，该框架可以以更少的设计流程运行实现更高质量的帕累托边界。

引用次数: 3

GLite GLite

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1201/9781420074079.axb

Jiaqi Li, Min Peng, Qingan Li, Meizheng Peng, Mengting Yuan

引用次数: 0

PatterNet PatterNet

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530422

Behnam Khaleghi, U. Mallappa, Duygu Yaldiz, Haichao Yang, Monil Shah, Jaeyoung Kang, Tajana Rosing

Weight clustering is an effective technique for compressing deep neural networks (DNNs) memory by using a limited number of unique weights and low-bit weight indexes to store clustering information. In this paper, we propose PatterNet, which enforces shared clustering topologies on filters. Cluster sharing leads to a greater extent of memory reduction by reusing the index information. PatterNet effectively factorizes input activations and post-processes the unique weights, which saves multiplications by several orders of magnitude. Furthermore, PatterNet reduces the add operations by harnessing the fact that filters sharing a clustering pattern have the same factorized terms. We introduce techniques for determining and assigning clustering patterns and training a network to fulfill the target patterns. We also propose and implement an efficient accelerator that builds upon the patterned filters. Experimental results show that PatterNet shrinks the memory and operation count up to 80.2% and 73.1%, respectively, with similar accuracy to the baseline models. PatterNet accelerator improves the energy efficiency by 107x over Nvidia 1080 1080 GTX and 2.2x over state of the art.

{"title":"PatterNet","authors":"Behnam Khaleghi, U. Mallappa, Duygu Yaldiz, Haichao Yang, Monil Shah, Jaeyoung Kang, Tajana Rosing","doi":"10.1145/3489517.3530422","DOIUrl":"https://doi.org/10.1145/3489517.3530422","url":null,"abstract":"Weight clustering is an effective technique for compressing deep neural networks (DNNs) memory by using a limited number of unique weights and low-bit weight indexes to store clustering information. In this paper, we propose PatterNet, which enforces shared clustering topologies on filters. Cluster sharing leads to a greater extent of memory reduction by reusing the index information. PatterNet effectively factorizes input activations and post-processes the unique weights, which saves multiplications by several orders of magnitude. Furthermore, PatterNet reduces the add operations by harnessing the fact that filters sharing a clustering pattern have the same factorized terms. We introduce techniques for determining and assigning clustering patterns and training a network to fulfill the target patterns. We also propose and implement an efficient accelerator that builds upon the patterned filters. Experimental results show that PatterNet shrinks the memory and operation count up to 80.2% and 73.1%, respectively, with similar accuracy to the baseline models. PatterNet accelerator improves the energy efficiency by 107x over Nvidia 1080 1080 GTX and 2.2x over state of the art.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121593790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Hierarchical memory-constrained operator scheduling of neural architecture search networks 神经结构搜索网络的分层内存约束算子调度

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530472

Zihan Wang, Chengcheng Wan, Yuting Chen, Ziyi Lin, He Jiang, Lei Qiao

Neural Architecture Search (NAS) is widely used in industry, searching for neural networks meeting task requirements. Meanwhile, it faces a challenge in scheduling networks satisfying memory constraints. This paper proposes HMCOS that performs hierarchical memory-constrained operator scheduling of NAS networks: given a network, HMCOS constructs a hierarchical computation graph and employs an iterative scheduling algorithm to progressively reduce peak memory footprints. We evaluate HMCOS against RPO and Serenity (two popular scheduling techniques). The results show that HMCOS outperforms existing techniques in supporting more NAS networks, reducing 8.7~42.4% of peak memory footprints, and achieving 137--283x of speedups in scheduling.

神经结构搜索(Neural Architecture Search, NAS)广泛应用于工业中，用于搜索满足任务要求的神经网络。同时，在满足内存约束的网络调度问题上也面临着挑战。本文提出了一种对NAS网络进行分层内存约束算子调度的HMCOS算法:给定一个网络，HMCOS构建一个分层计算图，并采用迭代调度算法逐步降低峰值内存占用。我们将HMCOS与RPO和Serenity(两种流行的调度技术)进行了比较。结果表明，HMCOS在支持更多NAS网络方面优于现有技术，减少了8.7~42.4%的峰值内存占用，并实现了137 ~ 283x的调度速度。

引用次数: 2

A cost-efficient fully synthesizable stochastic time-to-digital converter design based on integral nonlinearity scrambling 基于积分非线性置乱的低成本全合成随机时数转换器设计

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530502

Qiaochu Zhang, Shiyu Su, M. Chen

Stochastic time-to-digital converters (STDCs) are gaining increasing interest in submicron CMOS analog/mixed-signal design for their superior tolerance to nonlinear quantization levels. However, the large number of required delay units and time comparators for conventional STDC operation incurs excessive implementation costs. This paper presents a fully synthesizable STDC architecture based on an integral non-linearity (INL) scrambling technique, allowing order-of-magnitude cost reduction. The proposed technique randomizes and averages the STDC INL using a digital-to-time converter. Moreover, we propose an associated design automation flow and demonstrate an STDC design in 12nm FinFET process. Post-layout simulations show significant linearity and area/power efficiency improvements compared to prior arts.

随机时间-数字转换器(stdc)由于其对非线性量化水平的优越容受性，在亚微米CMOS模拟/混合信号设计中越来越受到关注。然而，传统STDC操作所需的大量延迟单元和时间比较器导致了过高的实施成本。本文提出了一种基于积分非线性(INL)置乱技术的完全可合成的STDC架构，使成本降低了一个数量级。该技术使用数字-时间转换器对STDC INL进行随机化和平均。此外，我们提出了相关的设计自动化流程，并演示了12nm FinFET工艺的STDC设计。与现有技术相比，布局后仿真显示显着的线性和面积/功率效率改进。

引用次数: 0

LPCA: learned MRC profiling based cache allocation for file storage systems LPCA:学习基于MRC分析的文件存储系统缓存分配

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530662

Yibin Gu, Yifan Li, Hua Wang, L. Liu, Ke Zhou, Wei Fang, Gang Hu, Jinhu Liu, Zhuo Cheng

File storage system (FSS) uses multi-caches to accelerate data accesses. Unfortunately, efficient FSS cache allocation remains extremely difficult. First, as the key of cache allocation, existing miss ratio curve (MRC) constructions are limited to LRU. Second, existing techniques are suitable for same-layer caches but not for hierarchical ones. We present a Learned MRC Profiling based Cache Allocation (LPCA) scheme for FSS. To the best of our knowledge, LPCA is the first to apply machine learning to model MRC under non-LRU, LPCA also explores optimization target for hierarchical caches, in that LPCA can provide universal and efficient cache allocation for FSSs.

文件存储系统FSS (File storage system)采用多高速缓存来提高数据访问速度。不幸的是，有效的FSS缓存分配仍然非常困难。首先，作为缓存分配的关键，现有的缺失率曲线(MRC)结构仅限于LRU。其次，现有技术适用于同层缓存，但不适用于分层缓存。提出了一种基于MRC分析的FSS缓存分配(LPCA)方案。据我们所知，LPCA是第一个将机器学习应用于非lru下的MRC模型，LPCA还探索了分层缓存的优化目标，因为LPCA可以为fss提供通用和高效的缓存分配。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀