Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文中文

A fast and low-cost comparison-free sorting engine with unary computing: late breaking results 具有一元计算的快速、低成本、无比较的排序引擎:延迟中断结果

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530615

Amir Hossein Jalilvand, Seyedeh Newsha Estiri, S. Naderi, M. Najafi, M. Imani

Hardware-efficient implementation of sorting operation is crucial for numerous applications, particularly when fast and energy-efficient sorting of data is desired. Unary computing has been used for low-cost hardware sorting. This work proposes a comparison-free unary sorting engine by iteratively finding maximum values. Synthesis results show up to 81% reduction in hardware area compared to the state-of-the-art unary sorting design. By processing right-aligned unary bit-streams, our unary sorter is able to sort many inputs in fewer clock cycles.

排序操作的硬件高效实现对于许多应用程序至关重要，特别是在需要快速和节能的数据排序时。一元计算已被用于低成本的硬件排序。这项工作提出了一个通过迭代寻找最大值的无比较一元排序引擎。综合结果显示，与最先进的一元分选设计相比，硬件面积减少了81%。通过处理右对齐的一元位流，我们的一元排序器能够在更少的时钟周期内对许多输入进行排序。

引用次数: 2

CNN-inspired analytical global placement for large-scale heterogeneous FPGAs 大规模异构fpga的cnn启发分析全局布局

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530566

Huimin Wang, Xingyu Tong, Chenyue Ma, Runming Shi, Jianli Chen, Kun Wang, Jun Yu, Yao-Wen Chang

The fast-growing capacity and complexity are challenging for FPGA global placement. Besides, while many recent studies have focused on the eDensity-based placement as its great efficiency and quality, they suffer from redundant frequency translation. This paper presents a CNN-inspired analytical placement algorithm to effectively handle the redundant frequency translation problem for large-scale FPGAs. Specifically, we compute the density penalty by a fully-connected propagation and gradient to a discrete differential convolution backward. With the FPGA heterogeneity, vectorization plays a vital role in self-adjusting the density penalty factor and the learning rate. In addition, a pseudo net model is used to further optimize the site constraints by establishing connections between blocks and their nearest available regions. Finally, we formulate a refined objective function and a degree-specific gradient preconditioning to achieve a robust, high-quality solution. Experimental results show that our algorithm achieves an 8% reduction on HPWL and 15% less global placement runtime on average over leading commercial tools.

快速增长的容量和复杂性对FPGA的全球布局提出了挑战。此外，基于密度的布局由于其高效率和高质量而受到许多研究的关注，但其存在频率转换冗余的问题。本文提出了一种受cnn启发的解析放置算法，以有效地处理大规模fpga的冗余频率转换问题。具体来说，我们通过一个完全连接的传播和梯度到一个离散微分卷积来计算密度惩罚。由于FPGA的异构性，向量化在自调节密度惩罚因子和学习率方面起着至关重要的作用。此外，还使用伪网络模型通过在块和它们最近的可用区域之间建立连接来进一步优化站点约束。最后，我们制定了一个精炼的目标函数和一个特定程度的梯度预处理，以获得一个鲁棒的，高质量的解决方案。实验结果表明，与领先的商业工具相比，我们的算法平均减少了8%的HPWL和15%的全局放置运行时间。

{"title":"CNN-inspired analytical global placement for large-scale heterogeneous FPGAs","authors":"Huimin Wang, Xingyu Tong, Chenyue Ma, Runming Shi, Jianli Chen, Kun Wang, Jun Yu, Yao-Wen Chang","doi":"10.1145/3489517.3530566","DOIUrl":"https://doi.org/10.1145/3489517.3530566","url":null,"abstract":"The fast-growing capacity and complexity are challenging for FPGA global placement. Besides, while many recent studies have focused on the eDensity-based placement as its great efficiency and quality, they suffer from redundant frequency translation. This paper presents a CNN-inspired analytical placement algorithm to effectively handle the redundant frequency translation problem for large-scale FPGAs. Specifically, we compute the density penalty by a fully-connected propagation and gradient to a discrete differential convolution backward. With the FPGA heterogeneity, vectorization plays a vital role in self-adjusting the density penalty factor and the learning rate. In addition, a pseudo net model is used to further optimize the site constraints by establishing connections between blocks and their nearest available regions. Finally, we formulate a refined objective function and a degree-specific gradient preconditioning to achieve a robust, high-quality solution. Experimental results show that our algorithm achieves an 8% reduction on HPWL and 15% less global placement runtime on average over leading commercial tools.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130172005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

VirTEE VirTEE

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530436

Jianqiang Wang, Pouya Mahmoody, Ferdinand Brasser, Patrick Jauernig, A. Sadeghi, D.Y. Yu, Dahan Pan, Yuanyuan Zhang

Modern security architectures provide Trusted Execution Environments (TEEs) to protect critical data and applications against malicious privileged software in so-called enclaves. However, the seamless integration of existing TEEs into the cloud is hindered, as they require substantial adaptation of the software executing inside an enclave as well as the cloud management software to handle enclaved workloads. We tackle these challenges by presenting VirTEE, the first TEE architecture that allows strongly isolated execution of unmodified virtual machines (VMs) in enclaves, as well as secure live migration of VM enclaves between VirTEE-enabled servers. Combined with its secure I/O capabilities, VirTEE enables the integration of enclaved computing in today's complex cloud infrastructure. We thoroughly evaluate our RISC-V-based prototype, and show its effectiveness and efficiency.

引用次数: 1

GTuner GTuner

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530584

Qi Sun, Xinyun Zhang, Hao Geng, Yuxuan Zhao, Yang Bai, Haisheng Zheng, Bei Yu

It is an open problem to compile DNN models on GPU and improve the performance. A novel framework, GTuner, is proposed to jointly learn from the structures of computational graphs and the statistical features of codes to find the optimal code implementations. A Graph ATtention network (GAT) is designed as the performance estimator in GTuner. In GAT, graph neural layers are used to propagate the information in the graph and a multi-head self-attention module is designed to learn the complicated relationships between the features. Under the guidance of GAT, the GPU codes are generated through auto-tuning. Experimental results demonstrate that our method outperforms the previous arts remarkably.

引用次数: 0

Placement initialization via a projected eigenvector algorithm: late breaking results 通过投影特征向量算法的放置初始化:迟破结果

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530620

Pengwen Chen, Chung-Kuan Cheng, Albert Chern, Chester Holtz, Aoxi Li, Yucheng Wang

Canonical methods for analytical placement of VLSI designs rely on solving nonlinear programs to minimize wirelength and cell overlap. We focus on producing initial layouts such that a global analytical placer performs better compared to existing heuristics for initialization. We reduce the problem of initialization to a quadratically constrained quadratic program. Our formulation is aware of fixed macros. We propose an efficient algorithm which can quickly generate initializations for testcases with millions of cells. We show that the our method for parameter initialization results in superior performance with respect to post-detailed placement wirelength.

VLSI设计的解析放置的规范方法依赖于求解非线性程序，以最大限度地减少波长和单元重叠。我们专注于生成初始布局，使全局分析砂矿比现有的初始化启发式执行得更好。我们将初始化问题简化为二次约束的二次规划问题。我们的公式知道固定的宏。我们提出了一种有效的算法，可以快速生成具有数百万个单元格的测试用例的初始化。我们表明，我们的方法参数初始化结果优越的性能，相对于后详细放置的无线。

引用次数: 1

Algorithm/architecture co-design for energy-efficient acceleration of multi-task DNN 多任务深度神经网络节能加速的算法/架构协同设计

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530455

Jaekang Shin, Seungkyu Choi, Jongwoo Ra, L. Kim

Real-world AI applications, such as augmented reality or autonomous driving, require processing multiple CV tasks simultaneously. However, the enormous data size and the memory footprint have been a crucial hurdle for deep neural networks to be applied in resource-constrained devices. To solve the problem, we propose an algorithm/architecture co-design. The proposed algorithmic scheme, named SqueeD, reduces per-task weight and activation size by 21.9x and 2.1x, respectively, by sharing those data between tasks. Moreover, we design architecture and dataflow to minimize DRAM access by fully utilizing benefits from SqueeD. As a result, the proposed architecture reduces the DRAM access increment and energy consumption increment per task by 2.2x and 1.3x, respectively.

现实世界的人工智能应用，如增强现实或自动驾驶，需要同时处理多个CV任务。然而，巨大的数据量和内存占用一直是深度神经网络在资源受限设备中应用的关键障碍。为了解决这个问题，我们提出了一种算法/架构协同设计。该算法方案被命名为SqueeD，通过在任务之间共享这些数据，将每个任务的权重和激活大小分别降低了21.9倍和2.1倍。此外，我们设计的架构和数据流通过充分利用SqueeD的优势来最大限度地减少DRAM访问。因此，该架构可将每个任务的DRAM存取增量和能耗增量分别降低2.2倍和1.3倍。

引用次数: 1

A cross-layer approach to cognitive computing: invited 认知计算的跨层方法:邀请

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530642

Gobinda Saha, Cheng Wang, A. Raghunathan, K. Roy

Remarkable advances in machine learning and artificial intelligence have been made in various domains, achieving near-human performance in a plethora of cognitive tasks including vision, speech and natural language processing. However, implementations of such cognitive algorithms in conventional "von-Neumann" architectures are orders of magnitude more area and power expensive than the biological brain. Therefore, it is imperative to search for fundamentally new approaches so that the improvement in computing performance and efficiency can keep up with the exponential growth of the AI computational demand. In this article, we present a cross-layer approach to the exploration of new paradigms in cognitive computing. This effort spans new learning algorithms inspired from biological information processing principles, network architectures best suited for such algorithms, and neuromorphic hardware substrates such as computing-in-memory fabrics in order to build intelligent machines that can achieve orders of improvement in energy efficiency at cognitive processing. We argue that such cross-layer innovations in cognitive computing are well-poised to enable a new wave of autonomous intelligence across the computing spectrum, from resource-constrained IoT devices to the cloud.

机器学习和人工智能在各个领域取得了显著进展，在视觉、语音和自然语言处理等大量认知任务中取得了接近人类的表现。然而，在传统的“冯-诺伊曼”架构中实现这种认知算法比生物大脑的面积和功耗昂贵得多。因此，必须从根本上寻找新的方法，使计算性能和效率的提高能够跟上人工智能计算需求的指数增长。在本文中，我们提出了一种跨层方法来探索认知计算的新范式。这项工作涵盖了新的学习算法，灵感来自生物信息处理原理，最适合这种算法的网络架构，以及神经形态硬件基础，如内存计算结构，以建立智能机器，可以在认知处理中实现能源效率的数量级提高。我们认为，这种认知计算的跨层创新已经准备好了，可以在计算频谱上实现新一波的自主智能，从资源受限的物联网设备到云。

{"title":"A cross-layer approach to cognitive computing: invited","authors":"Gobinda Saha, Cheng Wang, A. Raghunathan, K. Roy","doi":"10.1145/3489517.3530642","DOIUrl":"https://doi.org/10.1145/3489517.3530642","url":null,"abstract":"Remarkable advances in machine learning and artificial intelligence have been made in various domains, achieving near-human performance in a plethora of cognitive tasks including vision, speech and natural language processing. However, implementations of such cognitive algorithms in conventional \"von-Neumann\" architectures are orders of magnitude more area and power expensive than the biological brain. Therefore, it is imperative to search for fundamentally new approaches so that the improvement in computing performance and efficiency can keep up with the exponential growth of the AI computational demand. In this article, we present a cross-layer approach to the exploration of new paradigms in cognitive computing. This effort spans new learning algorithms inspired from biological information processing principles, network architectures best suited for such algorithms, and neuromorphic hardware substrates such as computing-in-memory fabrics in order to build intelligent machines that can achieve orders of improvement in energy efficiency at cognitive processing. We argue that such cross-layer innovations in cognitive computing are well-poised to enable a new wave of autonomous intelligence across the computing spectrum, from resource-constrained IoT devices to the cloud.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133665838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Accelerating and pruning CNNs for semantic segmentation on FPGA 基于FPGA的cnn语义分割加速与剪枝

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530424

Pierpaolo Morì, M. Vemparala, Nael Fasfous, Saptarshi Mitra, Sreetama Sarkar, Alexander Frickenstein, Lukas Frickenstein, D. Helms, N. Nagaraja, W. Stechele, C. Passerone

Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (−1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA

语义分割是计算机视觉中的热门任务之一，为场景理解提供逐像素的注释。然而，基于分段的卷积神经网络需要巨大的计算能力。在这项工作中，引入了一个支持扩展卷积的全流水线硬件加速器，它减少了冗余的零乘法。此外，我们提出了一种基于遗传算法的自动信道修剪技术，以共同优化计算复杂度和模型精度。最后，硬件启发式和定制加速器设计的精确模型使硬件感知修剪框架成为可能。与在Arria-10 FPGA上评估的基线DeepLabV3+模型相比，我们实现了2.44倍的低延迟和最小的语义预测质量下降(- 1.98 pp的平均交集比union低)。FPGA设计、基线和剪枝模型的二进制文件可以在github.com/pierpaolomori/SemanticSegmentationFPGA找到

引用次数: 2

Tailor 裁缝

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.5040/9781472581839.article-286

Xingchen Li, Zhihang Yuan, Guangyu Sun, Liang Zhao, Zhichao Lu

引用次数: 0

BlueSeer

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530519

Valentin Poirot, Laura Harms, Hendrica Martens, O. Landsiedel

IoT devices rely on environment detection to trigger specific actions, e.g., for headphones to adapt noise cancellation to the surroundings. While phones feature many sensors, from GNSS to cameras, small wearables must rely on the few energy-efficient components they already incorporate. In this paper, we demonstrate that a Bluetooth radio is the only component required to accurately classify environments and present BlueSeer, an environment-detection system that solely relies on received BLE packets and an embedded neural network. BlueSeer achieves an accuracy of up to 84% differentiating between 7 environments on resource-constrained devices, and requires only ~ 12 ms for inference on a 64 MHz microcontroller-unit.

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀