首页 > 最新文献

Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文 中文
A near-storage framework for boosted data preprocessing of mass spectrum clustering 质谱聚类增强数据预处理的近存储框架
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530449
Weihong Xu, Jaeyoung Kang, T. Simunic
Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. Modern MS equipment generates massive amount of tandem mass spectra with high redundancy, making spectral analysis the major bottleneck in design of new medicines. Mass spectrum clustering is one promising solution as it greatly reduces data redundancy and boosts protein identification. However, state-of-the-art MS tools take many hours to run spectrum clustering. Spectra loading and preprocessing consumes average 82% execution time and energy during clustering. We propose a near-storage framework, MSAS, to speed up spectrum preprocessing. Instead of loading data into host memory and CPU, MSAS processes spectra near storage, thus reducing the expensive cost of data movement. We present two types of accelerators that leverage internal bandwidth at two storage levels: SSD and channel. The accelerators are optimized to match the data rate at each storage level with negligible overhead. Our results demonstrate that the channel-level design yields the best performance improvement for preprocessing - it is up to 187X and 1.8X faster than the CPU and the state-of-the-art in-storage computing solution, INSIDER, respectively. After integrating channel-level MSAS into existing MS clustering tools, we measure system level improvements in speed of 3.5X to 9.8X with 2.8X to 11.9X better energy efficiency.
质谱(MS)由于其独特的鉴定和分析蛋白质结构的能力,已成为蛋白质组学和代谢组学的关键。现代质谱分析设备产生大量高冗余的串联质谱,使谱分析成为新药设计的主要瓶颈。质谱聚类是一种很有前途的解决方案,因为它大大减少了数据冗余并提高了蛋白质识别。然而,最先进的MS工具需要花费许多小时来运行频谱聚类。在集群过程中,谱加载和预处理平均消耗82%的执行时间和能量。我们提出了一个近存储框架,MSAS,以加快频谱预处理。MSAS不是将数据加载到主机内存和CPU中,而是在存储附近处理频谱,从而降低了昂贵的数据移动成本。我们提出了两种利用两个存储级别的内部带宽的加速器:SSD和channel。加速器经过优化以匹配每个存储级别的数据速率,而开销可以忽略不计。我们的结果表明,通道级设计在预处理方面产生了最佳的性能改进-它分别比CPU和最先进的存储计算解决方案INSIDER快187X和1.8X。在将通道级MSAS集成到现有的MS集群工具中后,我们测量了系统级速度的3.5到9.8倍的提高,以及2.8到11.9倍的能效提高。
{"title":"A near-storage framework for boosted data preprocessing of mass spectrum clustering","authors":"Weihong Xu, Jaeyoung Kang, T. Simunic","doi":"10.1145/3489517.3530449","DOIUrl":"https://doi.org/10.1145/3489517.3530449","url":null,"abstract":"Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. Modern MS equipment generates massive amount of tandem mass spectra with high redundancy, making spectral analysis the major bottleneck in design of new medicines. Mass spectrum clustering is one promising solution as it greatly reduces data redundancy and boosts protein identification. However, state-of-the-art MS tools take many hours to run spectrum clustering. Spectra loading and preprocessing consumes average 82% execution time and energy during clustering. We propose a near-storage framework, MSAS, to speed up spectrum preprocessing. Instead of loading data into host memory and CPU, MSAS processes spectra near storage, thus reducing the expensive cost of data movement. We present two types of accelerators that leverage internal bandwidth at two storage levels: SSD and channel. The accelerators are optimized to match the data rate at each storage level with negligible overhead. Our results demonstrate that the channel-level design yields the best performance improvement for preprocessing - it is up to 187X and 1.8X faster than the CPU and the state-of-the-art in-storage computing solution, INSIDER, respectively. After integrating channel-level MSAS into existing MS clustering tools, we measure system level improvements in speed of 3.5X to 9.8X with 2.8X to 11.9X better energy efficiency.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Architecting DDR5 DRAM caches for non-volatile memory systems 为非易失性存储器系统设计DDR5 DRAM缓存
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530570
Xin Xin, Wanyi Zhu, Li Zhao
With the release of Intel's Optane DIMM, Non-Volatile Memories (NVMs) are emerging as viable alternatives to DRAM memories because of the advantage of higher capacity. However, the higher latency and lower bandwidth of Optane prevent it from outright replacing DRAM. A prevailing strategy is to employ existing DRAM as a data cache for Optane, thereby achieving overall benefit in capacity, bandwidth, and latency. In this paper, we inspect new features in DDR5 to better support the DRAM cache design for Optane. Specifically, we leverage the two-level ECC scheme, i.e., DIMM ECC and on-die ECC, in DDR5 to construct a narrower channel for tag probing and propose a new operation for fast cache replacement. Experimental results show that our proposed strategy can achieve, on average, 26% performance improvement.
随着英特尔Optane DIMM的发布,非易失性存储器(NVMs)因其更高容量的优势而成为DRAM存储器的可行替代品。然而,更高的延迟和更低的带宽使Optane无法完全取代DRAM。一种流行的策略是使用现有的DRAM作为Optane的数据缓存,从而在容量、带宽和延迟方面获得总体优势。在本文中,我们考察了DDR5的新特性,以更好地支持Optane的DRAM缓存设计。具体来说,我们利用DDR5中的两级ECC方案,即DIMM ECC和片上ECC,构建了一个更窄的标签探测通道,并提出了一种快速缓存替换的新操作。实验结果表明,我们提出的策略可以实现平均26%的性能提升。
{"title":"Architecting DDR5 DRAM caches for non-volatile memory systems","authors":"Xin Xin, Wanyi Zhu, Li Zhao","doi":"10.1145/3489517.3530570","DOIUrl":"https://doi.org/10.1145/3489517.3530570","url":null,"abstract":"With the release of Intel's Optane DIMM, Non-Volatile Memories (NVMs) are emerging as viable alternatives to DRAM memories because of the advantage of higher capacity. However, the higher latency and lower bandwidth of Optane prevent it from outright replacing DRAM. A prevailing strategy is to employ existing DRAM as a data cache for Optane, thereby achieving overall benefit in capacity, bandwidth, and latency. In this paper, we inspect new features in DDR5 to better support the DRAM cache design for Optane. Specifically, we leverage the two-level ECC scheme, i.e., DIMM ECC and on-die ECC, in DDR5 to construct a narrower channel for tag probing and propose a new operation for fast cache replacement. Experimental results show that our proposed strategy can achieve, on average, 26% performance improvement.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123568898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-situ self-powered intelligent vision system with inference-adaptive energy scheduling for BNN-based always-on perception 基于bnn的实时感知推理自适应能量调度的原位自供电智能视觉系统
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530554
Maimaiti Nazhamaiti, Haijin Su, Han Xu, Zheyu Liu, F. Qiao, Qi Wei, Zidong Du, Xinghua Yang, Li Luo
This paper proposes an in-situ self-powered BNN-based intelligent visual perception system that harvests light energy utilizing the indispensable image sensor itself. The harvested energy is allocated to the low-power BNN computation modules layer by layer, adopting a light-weighted duty-cycling-based energy scheduler. A software-hardware co-design method, which exploits the layer-wise error tolerance of BNN as well as the computing-error and energy consumption characteristics of the computation circuit, is proposed to determine the parameters of the energy scheduler, achieving high energy efficiency for self-powered BNN inference. Simulation results show that with the proposed inference-adaptive energy scheduling method, self-powered MNIST classification task can be performed at a frame rate of 4 fps if the harvesting power is 1μW, while guaranteeing at least 90% inference accuracy using binary LeNet-5 network.
本文提出了一种基于原位自供电bnn的智能视觉感知系统,该系统利用必不可少的图像传感器本身收集光能。采用轻量级的基于占空比的能量调度方法,将收集到的能量逐层分配到低功耗的BNN计算模块中。提出了一种软硬件协同设计方法,利用BNN的分层容错性以及计算电路的计算误差和能耗特性来确定能量调度器的参数,实现了自供电BNN推理的高能效。仿真结果表明,采用本文提出的推理自适应能量调度方法,当收获功率为1μW时,自供电的MNIST分类任务可以以4 fps的帧率执行,同时在二进制LeNet-5网络上保证了至少90%的推理准确率。
{"title":"In-situ self-powered intelligent vision system with inference-adaptive energy scheduling for BNN-based always-on perception","authors":"Maimaiti Nazhamaiti, Haijin Su, Han Xu, Zheyu Liu, F. Qiao, Qi Wei, Zidong Du, Xinghua Yang, Li Luo","doi":"10.1145/3489517.3530554","DOIUrl":"https://doi.org/10.1145/3489517.3530554","url":null,"abstract":"This paper proposes an in-situ self-powered BNN-based intelligent visual perception system that harvests light energy utilizing the indispensable image sensor itself. The harvested energy is allocated to the low-power BNN computation modules layer by layer, adopting a light-weighted duty-cycling-based energy scheduler. A software-hardware co-design method, which exploits the layer-wise error tolerance of BNN as well as the computing-error and energy consumption characteristics of the computation circuit, is proposed to determine the parameters of the energy scheduler, achieving high energy efficiency for self-powered BNN inference. Simulation results show that with the proposed inference-adaptive energy scheduling method, self-powered MNIST classification task can be performed at a frame rate of 4 fps if the harvesting power is 1μW, while guaranteeing at least 90% inference accuracy using binary LeNet-5 network.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129696012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIMap HIMap
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530460
Xing Li, Lei Chen, Fan Yang, Mingxuan Yuan, Hongli Yan, Yupeng Wan
Recently, many models show their superiority in sequence and parameter tuning. However, they usually generate non-deterministic flows and require lots of training data. We thus propose a heuristic and iterative flow, namely HIMap, for deterministic logic synthesis. In which, domain knowledge of the functionality and parameters of synthesis operators and their correlations to netlist PPA is fully utilized to design synthesis templates for various objetives. We also introduce deterministic and effective heuristics to tune the templates with relatively fixed operator combinations and iteratively improve netlist PPA. Two nested iterations with local searching and early stopping can thus generate dynamic sequence for various circuits and reduce runtime. HIMap improves 13 best results of the EPFL combinational benchmarks for delay (5 for area). Especially, for several arithmetic benchmarks, HIMap significantly reduces LUT-6 levels by 11.6 ~ 21.2% and delay after P&R by 5.0 ~ 12.9%.
{"title":"HIMap","authors":"Xing Li, Lei Chen, Fan Yang, Mingxuan Yuan, Hongli Yan, Yupeng Wan","doi":"10.1145/3489517.3530460","DOIUrl":"https://doi.org/10.1145/3489517.3530460","url":null,"abstract":"Recently, many models show their superiority in sequence and parameter tuning. However, they usually generate non-deterministic flows and require lots of training data. We thus propose a heuristic and iterative flow, namely HIMap, for deterministic logic synthesis. In which, domain knowledge of the functionality and parameters of synthesis operators and their correlations to netlist PPA is fully utilized to design synthesis templates for various objetives. We also introduce deterministic and effective heuristics to tune the templates with relatively fixed operator combinations and iteratively improve netlist PPA. Two nested iterations with local searching and early stopping can thus generate dynamic sequence for various circuits and reduce runtime. HIMap improves 13 best results of the EPFL combinational benchmarks for delay (5 for area). Especially, for several arithmetic benchmarks, HIMap significantly reduces LUT-6 levels by 11.6 ~ 21.2% and delay after P&R by 5.0 ~ 12.9%.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128592693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
GraphRing GraphRing
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530571
Zerun Li, Xiaoming Chen, Yinhe Han
Due to the irregular memory access and high bandwidth demanding, graph processing is usually inefficient on conventional computer architectures. The recent development of the processing-in-memory (PIM) technique such as hybrid memory cube (HMC) has provided a feasible design direction for graph processing accelerators. Although PIM provides high internal bandwidth, inter-node memory access is inevitable in large-scale graph processing, which greatly affects the performance. In this paper, we propose an HMC-based graph processing framework, GraphRing. GraphRing is a software-hardware codesign framework that optimizes inter-HMC communication. It contains a regularity- and locality-aware graph execution model and a ring-based multi-HMC architecture. The evaluation results based on 5 graph datasets and 4 graph algorithms show that GraphRing achieves on average 2.14× speedup and 3.07× inter-HMC communication energy saving, compared with GraphQ, a state-of-the-art graph processing architecture.
{"title":"GraphRing","authors":"Zerun Li, Xiaoming Chen, Yinhe Han","doi":"10.1145/3489517.3530571","DOIUrl":"https://doi.org/10.1145/3489517.3530571","url":null,"abstract":"Due to the irregular memory access and high bandwidth demanding, graph processing is usually inefficient on conventional computer architectures. The recent development of the processing-in-memory (PIM) technique such as hybrid memory cube (HMC) has provided a feasible design direction for graph processing accelerators. Although PIM provides high internal bandwidth, inter-node memory access is inevitable in large-scale graph processing, which greatly affects the performance. In this paper, we propose an HMC-based graph processing framework, GraphRing. GraphRing is a software-hardware codesign framework that optimizes inter-HMC communication. It contains a regularity- and locality-aware graph execution model and a ring-based multi-HMC architecture. The evaluation results based on 5 graph datasets and 4 graph algorithms show that GraphRing achieves on average 2.14× speedup and 3.07× inter-HMC communication energy saving, compared with GraphQ, a state-of-the-art graph processing architecture.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"27 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127920828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware-efficient stochastic rounding unit design for DNN training: late breaking results 用于深度神经网络训练的硬件高效随机舍入单元设计:迟破结果
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530619
Sung-En Chang, Geng Yuan, Alec Lu, Mengshu Sun, Yanyu Li, Xiaolong Ma, Z. Li, Yanyue Xie, Minghai Qin, Xue Lin, Zhenman Fang, Yanzhi Wang, Zhenman, Fang
Stochastic rounding is crucial in the training of low-bit deep neural networks (DNNs) to achieve high accuracy. Unfortunately, prior studies require a large number of high-precision stochastic rounding units (SRUs) to guarantee the low-bit DNN accuracy, which involves considerable hardware overhead. In this paper, we propose an automated framework to explore hardware-efficient low-bit SRUs (ESRUs) that can still generate high-quality random numbers to guarantee the accuracy of low-bit DNN training. Experimental results using state-of-the-art DNN models demonstrate that, compared to the prior 24-bit SRU with 24-bit pseudo random number generator (PRNG), our 8-bit with 3-bit PRNG reduces the SRU resource usage by 9.75× while achieving a higher accuracy.
随机舍入是低比特深度神经网络(dnn)训练中实现高精度的关键。不幸的是,之前的研究需要大量的高精度随机舍入单元(sru)来保证低位DNN的精度,这涉及到相当大的硬件开销。在本文中,我们提出了一个自动化框架来探索硬件高效的低比特sru (esru),它仍然可以生成高质量的随机数,以保证低比特DNN训练的准确性。使用最先进的DNN模型的实验结果表明,与之前使用24位伪随机数生成器(PRNG)的24位SRU相比,我们的8位与3位PRNG将SRU资源使用减少了9.75倍,同时实现了更高的精度。
{"title":"Hardware-efficient stochastic rounding unit design for DNN training: late breaking results","authors":"Sung-En Chang, Geng Yuan, Alec Lu, Mengshu Sun, Yanyu Li, Xiaolong Ma, Z. Li, Yanyue Xie, Minghai Qin, Xue Lin, Zhenman Fang, Yanzhi Wang, Zhenman, Fang","doi":"10.1145/3489517.3530619","DOIUrl":"https://doi.org/10.1145/3489517.3530619","url":null,"abstract":"Stochastic rounding is crucial in the training of low-bit deep neural networks (DNNs) to achieve high accuracy. Unfortunately, prior studies require a large number of high-precision stochastic rounding units (SRUs) to guarantee the low-bit DNN accuracy, which involves considerable hardware overhead. In this paper, we propose an automated framework to explore hardware-efficient low-bit SRUs (ESRUs) that can still generate high-quality random numbers to guarantee the accuracy of low-bit DNN training. Experimental results using state-of-the-art DNN models demonstrate that, compared to the prior 24-bit SRU with 24-bit pseudo random number generator (PRNG), our 8-bit with 3-bit PRNG reduces the SRU resource usage by 9.75× while achieving a higher accuracy.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121177999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDPG
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530668
Yang Ni, Mariam Issa, Danny Abraham, M. Imani, Xunzhao Yin, M. Imani
Traditional robot control or more general continuous control tasks often rely on carefully hand-crafted classic control methods. These models often lack the self-learning adaptability and intelligence to achieve human-level control. On the other hand, recent advancements in Reinforcement Learning (RL) present algorithms that have the capability of human-like learning. The integration of Deep Neural Networks (DNN) and RL thereby enables autonomous learning in robot control tasks. However, DNN-based RL brings both high-quality learning and high computation cost, which is no longer ideal for currently fast-growing edge computing scenarios. In this paper, we introduce HDPG, a highly-efficient policy-based RL algorithm using Hyperdimensional Computing. Hyperdimensional computing is a lightweight brain-inspired learning methodology; its holistic representation of information leads to a well-defined set of hardware-friendly high-dimensional operations. Our HDPG fully exploits the efficient HDC for high-quality state value approximation and policy gradient update. In our experiments, we use HDPG for robotics tasks with continuous action space and achieve significantly higher rewards than DNN-based RL. Our evaluation also shows that HDPG achieves 4.7× faster and 5.3× higher energy efficiency than DNN-based RL running on embedded FPGA.
{"title":"HDPG","authors":"Yang Ni, Mariam Issa, Danny Abraham, M. Imani, Xunzhao Yin, M. Imani","doi":"10.1145/3489517.3530668","DOIUrl":"https://doi.org/10.1145/3489517.3530668","url":null,"abstract":"Traditional robot control or more general continuous control tasks often rely on carefully hand-crafted classic control methods. These models often lack the self-learning adaptability and intelligence to achieve human-level control. On the other hand, recent advancements in Reinforcement Learning (RL) present algorithms that have the capability of human-like learning. The integration of Deep Neural Networks (DNN) and RL thereby enables autonomous learning in robot control tasks. However, DNN-based RL brings both high-quality learning and high computation cost, which is no longer ideal for currently fast-growing edge computing scenarios. In this paper, we introduce HDPG, a highly-efficient policy-based RL algorithm using Hyperdimensional Computing. Hyperdimensional computing is a lightweight brain-inspired learning methodology; its holistic representation of information leads to a well-defined set of hardware-friendly high-dimensional operations. Our HDPG fully exploits the efficient HDC for high-quality state value approximation and policy gradient update. In our experiments, we use HDPG for robotics tasks with continuous action space and achieve significantly higher rewards than DNN-based RL. Our evaluation also shows that HDPG achieves 4.7× faster and 5.3× higher energy efficiency than DNN-based RL running on embedded FPGA.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Hexagons are the bestagons: design automation for silicon dangling bond logic 六边形是标杆:设计自动化的硅悬垂键逻辑
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530525
Marcel Walter, S. S. H. Ng, K. Waluś, R. Wille
Field-coupled Nanocomputing (FCN) defines a class of post-CMOS nanotechnologies that promises compact layouts, low power operation, and high clock rates. Recent breakthroughs in the fabrication of Silicon Dangling Bonds (SiDBs) acting as quantum dots enabled the demonstration of a sub-30 nm2 OR gate and wire segments. This motivated the research community to invest manual labor in the design of additional gates and whole circuits which, however, is currently severely limited by scalability issues. In this work, these limitations are overcome by the introduction of a design automation framework that establishes a flexible topology based on hexagons as well as a corresponding Bestagon gate library for this technology and, additionally, provides automatic methods for physical design. By this, the first design automation solution for the promising SiDB platform is proposed. In an effort to support open research and open data, the resulting framework and all design files will be made available.
场耦合纳米计算(FCN)定义了一类后cmos纳米技术,承诺紧凑的布局,低功耗运行和高时钟速率。作为量子点的硅悬空键(sidb)制造的最新突破使低于30 nm2的OR门和线段的演示成为可能。这促使研究团体投入体力劳动来设计额外的门和整个电路,然而,目前受到可扩展性问题的严重限制。在这项工作中,通过引入设计自动化框架来克服这些限制,该框架建立了基于六边形的灵活拓扑以及相应的Bestagon门库,此外,还为物理设计提供了自动化方法。在此基础上,提出了第一个具有发展前景的SiDB平台设计自动化解决方案。为了支持开放研究和开放数据,最终的框架和所有设计文件都将提供。
{"title":"Hexagons are the bestagons: design automation for silicon dangling bond logic","authors":"Marcel Walter, S. S. H. Ng, K. Waluś, R. Wille","doi":"10.1145/3489517.3530525","DOIUrl":"https://doi.org/10.1145/3489517.3530525","url":null,"abstract":"Field-coupled Nanocomputing (FCN) defines a class of post-CMOS nanotechnologies that promises compact layouts, low power operation, and high clock rates. Recent breakthroughs in the fabrication of Silicon Dangling Bonds (SiDBs) acting as quantum dots enabled the demonstration of a sub-30 nm2 OR gate and wire segments. This motivated the research community to invest manual labor in the design of additional gates and whole circuits which, however, is currently severely limited by scalability issues. In this work, these limitations are overcome by the introduction of a design automation framework that establishes a flexible topology based on hexagons as well as a corresponding Bestagon gate library for this technology and, additionally, provides automatic methods for physical design. By this, the first design automation solution for the promising SiDB platform is proposed. In an effort to support open research and open data, the resulting framework and all design files will be made available.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122717508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
HDLock
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530515
Shijin Duan, Shaolei Ren, Xiaolin Xu
Hyperdimensional Computing (HDC) is facing infringement issues due to straightforward computations. This work, for the first time, raises a critical vulnerability of HDC --- an attacker can reverse engineer the entire model, only requiring the unindexed hypervector memory. To mitigate this attack, we propose a defense strategy, namely HDLock, which significantly increases the reasoning cost of encoding. Specifically, HDLock adds extra feature hypervector combination and permutation in the encoding module. Compared to the standard HDC model, a two-layer-key HDLock can increase the adversarial reasoning complexity by 10 order of magnitudes without inference accuracy loss, with only 21% latency overhead.
{"title":"HDLock","authors":"Shijin Duan, Shaolei Ren, Xiaolin Xu","doi":"10.1145/3489517.3530515","DOIUrl":"https://doi.org/10.1145/3489517.3530515","url":null,"abstract":"Hyperdimensional Computing (HDC) is facing infringement issues due to straightforward computations. This work, for the first time, raises a critical vulnerability of HDC --- an attacker can reverse engineer the entire model, only requiring the unindexed hypervector memory. To mitigate this attack, we propose a defense strategy, namely HDLock, which significantly increases the reasoning cost of encoding. Specifically, HDLock adds extra feature hypervector combination and permutation in the encoding module. Compared to the standard HDC model, a two-layer-key HDLock can increase the adversarial reasoning complexity by 10 order of magnitudes without inference accuracy loss, with only 21% latency overhead.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115983631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multi-electrostatic FPGA placement considering SLICEL-SLICEM heterogeneity and clock feasibility 考虑SLICEL-SLICEM异构性和时钟可行性的多静电FPGA布局
Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530568
Jing Mai, Yibai Meng, Zhixiong Di, Yibo Lin
Modern field-programmable gate arrays (FPGAs) contain heterogeneous resources, including CLB, DSP, BRAM, IO, etc. A Configurable Logic Block (CLB) slice is further categorized to SLICEL and SLICEM, which can be configured as specific combinations of instances in {LUT, FF, distributed RAM, SHIFT, CARRY}. Such kind of heterogeneity challenges the existing FPGA placement algorithms. Meanwhile, limited clock routing resources also lead to complicated clock constraints, causing difficulties in achieving clock feasible placement solutions. In this work, we propose a heterogeneous FPGA placement framework considering SLICEL-SLICEM heterogeneity and clock feasibility based on a multi-electrostatic formulation. We support a comprehensive set of the aforementioned instance types with a uniform algorithm for wirelength, routability, and clock optimization. Experimental results on both academic and industrial benchmarks demonstrate that we outperform the state-of-the-art placers in both quality and efficiency.
现代现场可编程门阵列(fpga)包含异构资源,包括CLB、DSP、BRAM、IO等。可配置逻辑块(CLB)片进一步分为SLICEL和SLICEM,它们可以配置为{LUT、FF、分布式RAM、SHIFT、CARRY}中实例的特定组合。这种异构性对现有的FPGA布局算法提出了挑战。同时,有限的时钟路由资源也导致了复杂的时钟约束,难以获得时钟可行的放置方案。在这项工作中,我们提出了一个基于多静电配方的异构FPGA放置框架,考虑了SLICEL-SLICEM的异构性和时钟可行性。我们支持上述实例类型的全面集合,并使用统一的算法进行无线、可达性和时钟优化。学术和工业基准的实验结果表明,我们在质量和效率方面都优于最先进的研磨机。
{"title":"Multi-electrostatic FPGA placement considering SLICEL-SLICEM heterogeneity and clock feasibility","authors":"Jing Mai, Yibai Meng, Zhixiong Di, Yibo Lin","doi":"10.1145/3489517.3530568","DOIUrl":"https://doi.org/10.1145/3489517.3530568","url":null,"abstract":"Modern field-programmable gate arrays (FPGAs) contain heterogeneous resources, including CLB, DSP, BRAM, IO, etc. A Configurable Logic Block (CLB) slice is further categorized to SLICEL and SLICEM, which can be configured as specific combinations of instances in {LUT, FF, distributed RAM, SHIFT, CARRY}. Such kind of heterogeneity challenges the existing FPGA placement algorithms. Meanwhile, limited clock routing resources also lead to complicated clock constraints, causing difficulties in achieving clock feasible placement solutions. In this work, we propose a heterogeneous FPGA placement framework considering SLICEL-SLICEM heterogeneity and clock feasibility based on a multi-electrostatic formulation. We support a comprehensive set of the aforementioned instance types with a uniform algorithm for wirelength, routability, and clock optimization. Experimental results on both academic and industrial benchmarks demonstrate that we outperform the state-of-the-art placers in both quality and efficiency.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 59th ACM/IEEE Design Automation Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1