首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Centaur: Hybrid Processing in On/Off-chip Memory Architecture for Graph Analytics 半人马:用于图形分析的片上/片外内存架构中的混合处理
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218624
Abraham Addisie, V. Bertacco
The increased use of graph algorithms in diverse fields has highlighted their inefficiencies in current chip-multiprocessor (CMP) architectures, primarily due to their seemingly random-access patterns to off-chip memory. Recently, two families of solutions have been proposed: 1) solutions that offload operations generated by all vertices from the processor cores to off-chip memory; and 2) solutions that offload only operations generated by high-degree vertices to dedicated on-chip memory, while the cores continue to process the work related to the remaining vertices. Neither approach is optimal over the full range of vertex’s degrees. Thus, in this work, we propose Centaur, a novel architecture that processes operations on vertex data in on- and off-chip memory. Centaur utilizes a vertex’s degree as a proxy to determine whether to process related operations in on- or off-chip memory. Centaur manages to provide up to 4.0× improvement in performance and 3.8× in energy benefits, compared to a baseline CMP, and up to a 2.0× performance boost over state-of-the-art specialized solutions.
图算法在不同领域的使用越来越多,这突出了它们在当前芯片多处理器(CMP)架构中的低效率,主要是由于它们对片外存储器的看似随机的访问模式。最近,提出了两类解决方案:1)将所有顶点产生的操作从处理器内核卸载到片外存储器;2)只将高度顶点产生的操作卸载到专用片上存储器的解决方案,而内核继续处理与剩余顶点相关的工作。这两种方法在顶点度的整个范围内都不是最优的。因此,在这项工作中,我们提出了Centaur,这是一种新颖的架构,可以在片内和片外存储器中处理顶点数据的操作。Centaur利用顶点的度作为代理来确定是否在片内或片外内存中处理相关操作。与基准CMP相比,Centaur能够提供高达4.0倍的性能提升和3.8倍的能源效益,并且比最先进的专业解决方案提供高达2.0倍的性能提升。
{"title":"Centaur: Hybrid Processing in On/Off-chip Memory Architecture for Graph Analytics","authors":"Abraham Addisie, V. Bertacco","doi":"10.1109/DAC18072.2020.9218624","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218624","url":null,"abstract":"The increased use of graph algorithms in diverse fields has highlighted their inefficiencies in current chip-multiprocessor (CMP) architectures, primarily due to their seemingly random-access patterns to off-chip memory. Recently, two families of solutions have been proposed: 1) solutions that offload operations generated by all vertices from the processor cores to off-chip memory; and 2) solutions that offload only operations generated by high-degree vertices to dedicated on-chip memory, while the cores continue to process the work related to the remaining vertices. Neither approach is optimal over the full range of vertex’s degrees. Thus, in this work, we propose Centaur, a novel architecture that processes operations on vertex data in on- and off-chip memory. Centaur utilizes a vertex’s degree as a proxy to determine whether to process related operations in on- or off-chip memory. Centaur manages to provide up to 4.0× improvement in performance and 3.8× in energy benefits, compared to a baseline CMP, and up to a 2.0× performance boost over state-of-the-art specialized solutions.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133736255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware 基于格的加密硬件的内存加速
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218730
Hamid Nejatollahi, Saransh Gupta, M. Imani, T. Simunic, Rosario Cammarota, N. Dutt
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.
量子计算机有望在多项式时间内解决整数分解和离散对数等数学难题,使标准化的公钥密码系统变得不安全。基于Lattice-Based Cryptography (LBC)是一种很有前途的后量子公钥加密协议,由于其固有的抗后量子特性、效率和通用性,它可以取代标准化的公钥加密。数论变换(NTT)是LBC中一个重要的数学工具,它是一种计算多项式乘法的常用方法。它是计算最密集的例程,需要加速LBC协议的实际部署。在本文中,我们提出了CryptoPIM,一个基于ntt的多项式乘子的高吞吐量内存处理(PIM)加速器,支持度高达32k的多项式。与基于ntt的乘法器的最快FPGA实现相比,CryptoPIM在相同的能量下实现了平均31倍的吞吐量提高,而性能仅降低了28%,因此显示出LBC实际部署的希望。
{"title":"CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware","authors":"Hamid Nejatollahi, Saransh Gupta, M. Imani, T. Simunic, Rosario Cammarota, N. Dutt","doi":"10.1109/DAC18072.2020.9218730","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218730","url":null,"abstract":"Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133527901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Developing Privacy-preserving AI Systems: The Lessons learned 开发保护隐私的人工智能系统:经验教训
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218662
Huili Chen, S. Hussain, Fabian Boemer, Emmanuel Stapf, A. Sadeghi, F. Koushanfar, Rosario Cammarota
Advances in customers' data privacy laws create pressures and pain points across the entire lifecycle of AI products. Working figures such as data scientists and data engineers need to account for the correct use of privacy-enhancing technologies such as homomorphic encryption, secure multi-party computation, and trusted execution environment when they develop, test and deploy products embedding AI models while providing data protection guarantees. In this work, we share the lessons learned during the development of frameworks to aid data scientists and data engineers to map their optimized workloads onto privacy-enhancing technologies seamlessly and correctly.
客户数据隐私法的进步给人工智能产品的整个生命周期带来了压力和痛点。数据科学家和数据工程师等工作人员在开发、测试和部署嵌入人工智能模型的产品时,在提供数据保护保证的同时,需要考虑到正确使用同态加密、安全多方计算、可信执行环境等增强隐私的技术。在这项工作中,我们分享了在框架开发过程中获得的经验教训,以帮助数据科学家和数据工程师将其优化的工作负载无缝且正确地映射到隐私增强技术上。
{"title":"Developing Privacy-preserving AI Systems: The Lessons learned","authors":"Huili Chen, S. Hussain, Fabian Boemer, Emmanuel Stapf, A. Sadeghi, F. Koushanfar, Rosario Cammarota","doi":"10.1109/DAC18072.2020.9218662","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218662","url":null,"abstract":"Advances in customers' data privacy laws create pressures and pain points across the entire lifecycle of AI products. Working figures such as data scientists and data engineers need to account for the correct use of privacy-enhancing technologies such as homomorphic encryption, secure multi-party computation, and trusted execution environment when they develop, test and deploy products embedding AI models while providing data protection guarantees. In this work, we share the lessons learned during the development of frameworks to aid data scientists and data engineers to map their optimized workloads onto privacy-enhancing technologies seamlessly and correctly.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132863040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scalable Multi-FPGA Acceleration for Large RNNs with Full Parallelism Levels 具有完全并行性的大型rnn的可扩展多fpga加速
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218528
Dongup Kwon, Suyeon Hur, Hamin Jang, E. Nurvitadhi, Jangwoo Kim
The increasing size of recurrent neural networks (RNNs) makes it hard to meet the growing demand for real-time AI services. For low-latency RNN serving, FPGA-based accelerators can leverage specialized architectures with optimized dataflow. However, they also suffer from severe HW under-utilization when partitioning RNNs, and thus fail to obtain the scalable performance.In this paper, we identify the performance bottlenecks of existing RNN partitioning strategies. Then, we propose a novel RNN partitioning strategy to achieve the scalable multi-FPGA acceleration for large RNNs. First, we introduce three parallelism levels and exploit them by partitioning weight matrices, matrix/vector operations, and layers. Second, we examine the performance impact of collective communications and software pipelining to derive more accurate and optimal distribution results. We prototyped an FPGA-based acceleration system using multiple Intel high-end FPGAs, and our partitioning scheme allows up to 2.4x faster inference of modern RNN workloads than conventional partitioning methods.
递归神经网络(rnn)的规模不断扩大,难以满足日益增长的实时人工智能服务需求。对于低延迟RNN服务,基于fpga的加速器可以利用具有优化数据流的专用架构。然而,它们在对rnn进行分区时也存在严重的硬件利用率不足,无法获得可扩展性能。在本文中,我们识别了现有RNN分区策略的性能瓶颈。然后,我们提出了一种新的RNN划分策略,以实现大型RNN的可扩展多fpga加速。首先,我们引入了三个并行性级别,并通过划分权重矩阵、矩阵/向量操作和层来利用它们。其次,我们研究了集体通信和软件流水线对性能的影响,以得出更准确和最优的分布结果。我们使用多个英特尔高端fpga原型设计了一个基于fpga的加速系统,我们的分区方案允许比传统分区方法快2.4倍的现代RNN工作负载推理。
{"title":"Scalable Multi-FPGA Acceleration for Large RNNs with Full Parallelism Levels","authors":"Dongup Kwon, Suyeon Hur, Hamin Jang, E. Nurvitadhi, Jangwoo Kim","doi":"10.1109/DAC18072.2020.9218528","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218528","url":null,"abstract":"The increasing size of recurrent neural networks (RNNs) makes it hard to meet the growing demand for real-time AI services. For low-latency RNN serving, FPGA-based accelerators can leverage specialized architectures with optimized dataflow. However, they also suffer from severe HW under-utilization when partitioning RNNs, and thus fail to obtain the scalable performance.In this paper, we identify the performance bottlenecks of existing RNN partitioning strategies. Then, we propose a novel RNN partitioning strategy to achieve the scalable multi-FPGA acceleration for large RNNs. First, we introduce three parallelism levels and exploit them by partitioning weight matrices, matrix/vector operations, and layers. Second, we examine the performance impact of collective communications and software pipelining to derive more accurate and optimal distribution results. We prototyped an FPGA-based acceleration system using multiple Intel high-end FPGAs, and our partitioning scheme allows up to 2.4x faster inference of modern RNN workloads than conventional partitioning methods.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122881759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
CAP’NN: Class-Aware Personalized Neural Network Inference 类别感知的个性化神经网络推理
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218741
Maedeh Hemmat, Joshua San Miguel, A. Davoodi
We propose CAP’NN, a framework for Class-Aware Personalized Neural Network Inference. CAP’NN prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each user is expected to encounter, CAP’NN is able to prune not only ineffectual neurons but also miseffectual neurons that confuse classification, without the need to retrain the network. CAP’NN achieves up to 50% model size reduction while actually improving the top-l(5) classification accuracy by up to 2.3%(3.2%) when the user only encounters a subset of VGG-16 classes.
我们提出了一种基于类感知的个性化神经网络推理框架CAP 'NN。CAP 'NN根据个人用户的偏好对已经训练好的神经网络模型进行修剪。具体来说,通过适应每个用户预期会遇到的输出类的子集,CAP 'NN不仅能够修剪无效的神经元,还能够修剪混淆分类的无效神经元,而无需重新训练网络。当用户只遇到VGG-16类的一个子集时,CAP 'NN实现了高达50%的模型尺寸缩减,同时实际上将top- 1(5)分类精度提高了2.3%(3.2%)。
{"title":"CAP’NN: Class-Aware Personalized Neural Network Inference","authors":"Maedeh Hemmat, Joshua San Miguel, A. Davoodi","doi":"10.1109/DAC18072.2020.9218741","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218741","url":null,"abstract":"We propose CAP’NN, a framework for Class-Aware Personalized Neural Network Inference. CAP’NN prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each user is expected to encounter, CAP’NN is able to prune not only ineffectual neurons but also miseffectual neurons that confuse classification, without the need to retrain the network. CAP’NN achieves up to 50% model size reduction while actually improving the top-l(5) classification accuracy by up to 2.3%(3.2%) when the user only encounters a subset of VGG-16 classes.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124186928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Romeo: Conversion and Evaluation of HDL Designs in the Encrypted Domain 罗密欧:加密领域中HDL设计的转换与评估
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218579
Charles Gouert, N. G. Tsoutsos
As cloud computing becomes increasingly ubiquitous, protecting the confidentiality of data outsourced to third parties becomes a priority. While encryption is a natural solution to this problem, traditional algorithms may only protect data at rest and in transit, but do not support encrypted processing. In this work we introduce ROMEO, which enables easy-to-use privacy-preserving processing of data in the cloud using homomorphic encryption. ROMEO automatically converts arbitrary programs expressed in Verilog HDL into equivalent homomorphic circuits that are evaluated using encrypted inputs. For our experiments, we employ cryptographic circuits, such as AES, and benchmarks from the ISCAS’85 and ISCAS’89 suites.
随着云计算变得越来越普遍,保护外包给第三方的数据的机密性成为一个优先事项。虽然加密是解决这个问题的自然方法,但传统算法可能只保护静态和传输中的数据,而不支持加密处理。在这项工作中,我们介绍了ROMEO,它可以使用同态加密对云中的数据进行易于使用的隐私保护处理。ROMEO自动转换在Verilog HDL中表达的任意程序为等效的同态电路,使用加密输入进行评估。在我们的实验中,我们使用了加密电路,如AES,以及来自ISCAS ' 85和ISCAS ' 89套件的基准测试。
{"title":"Romeo: Conversion and Evaluation of HDL Designs in the Encrypted Domain","authors":"Charles Gouert, N. G. Tsoutsos","doi":"10.1109/DAC18072.2020.9218579","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218579","url":null,"abstract":"As cloud computing becomes increasingly ubiquitous, protecting the confidentiality of data outsourced to third parties becomes a priority. While encryption is a natural solution to this problem, traditional algorithms may only protect data at rest and in transit, but do not support encrypted processing. In this work we introduce ROMEO, which enables easy-to-use privacy-preserving processing of data in the cloud using homomorphic encryption. ROMEO automatically converts arbitrary programs expressed in Verilog HDL into equivalent homomorphic circuits that are evaluated using encrypted inputs. For our experiments, we employ cryptographic circuits, such as AES, and benchmarks from the ISCAS’85 and ISCAS’89 suites.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116367182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor atun:共享内存多处理器中原子操作的模块化和可伸缩支持
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218661
Andreas Kurth, Samuel Riedel, Florian Zaruba, T. Hoefler, L. Benini
Atomic operations are crucial for most modern parallel and concurrent algorithms, which necessitates their optimized implementation in highly-scalable manycore processors. We pro-pose a modular and efficient, open-source ATomic UNit (ATUN) architecture that can be placed flexibly at different levels of the memory hierarchy. ATUN demonstrates near-optimal linear scaling for various synthetic and real-world workloads on an FPGA prototype with 32 RISC-V cores. We characterize the hardware complexity of our ATUN design in 22 nm FDSOI and find that it scales linearly in area (only 0.5 kGE per core) and logarithmically in the critical path.
原子操作对于大多数现代并行和并发算法至关重要,这就需要在高可伸缩的多核处理器中对其进行优化实现。我们提出了一个模块化的、高效的、开源的原子单元(ATUN)架构,它可以灵活地放置在内存层次结构的不同级别。ATUN在具有32个RISC-V内核的FPGA原型上为各种合成和实际工作负载展示了近乎最佳的线性缩放。我们在22 nm FDSOI中表征了我们的ATUN设计的硬件复杂性,并发现它在面积上呈线性缩放(每核仅0.5 kGE),在关键路径上呈对数缩放。
{"title":"ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor","authors":"Andreas Kurth, Samuel Riedel, Florian Zaruba, T. Hoefler, L. Benini","doi":"10.1109/DAC18072.2020.9218661","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218661","url":null,"abstract":"Atomic operations are crucial for most modern parallel and concurrent algorithms, which necessitates their optimized implementation in highly-scalable manycore processors. We pro-pose a modular and efficient, open-source ATomic UNit (ATUN) architecture that can be placed flexibly at different levels of the memory hierarchy. ATUN demonstrates near-optimal linear scaling for various synthetic and real-world workloads on an FPGA prototype with 32 RISC-V cores. We characterize the hardware complexity of our ATUN design in 22 nm FDSOI and find that it scales linearly in area (only 0.5 kGE per core) and logarithmically in the critical path.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116904053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CL(R)Early: An Early-stage DSE Methodology for Cross-Layer Reliability-aware Heterogeneous Embedded Systems 李志强(R)早期:跨层可靠性感知异构嵌入式系统的早期DSE方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218747
Siva Satyendra Sahoo, B. Veeravalli, Akash Kumar
Cross-layer reliability (CLR) presents a cost-effective alternative to traditional single-layer design in resource-constrained embedded systems. CLR provides the scope for leveraging the inherent fault-masking of multiple layers and exploiting application-specific tolerances to degradation in some Quality of Service (QoS) metrics. However, it can also lead to an explosion in the design complexity. State-of-the art approaches to such joint optimization across multiple degrees of freedom can lead to degradation in the system-level Design Space Exploration (DSE) results. To this end, we propose a DSE methodology for enabling CLR-aware task-mapping in heterogeneous embedded systems. Specifically, we present novel approaches to both task and system-level analysis for performing an early-stage exploration of various design decisions. The proposed methodology results in considerable improvements over other state-of-the-art approaches and shows significant scaling with application size.
在资源受限的嵌入式系统中,跨层可靠性(CLR)为传统的单层设计提供了一种经济有效的替代方案。CLR提供了利用多层固有的故障屏蔽的范围,并在某些服务质量(QoS)度量中利用特定于应用程序的降级容忍度。然而,它也可能导致设计复杂性的爆炸。这种跨多个自由度的联合优化的最新方法可能导致系统级设计空间探索(DSE)结果的退化。为此,我们提出了一种在异构嵌入式系统中实现clr感知任务映射的DSE方法。具体来说,我们提出了任务级和系统级分析的新方法,用于执行各种设计决策的早期探索。与其他最先进的方法相比,所提出的方法得到了相当大的改进,并显示出应用程序大小的显著可伸缩性。
{"title":"CL(R)Early: An Early-stage DSE Methodology for Cross-Layer Reliability-aware Heterogeneous Embedded Systems","authors":"Siva Satyendra Sahoo, B. Veeravalli, Akash Kumar","doi":"10.1109/DAC18072.2020.9218747","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218747","url":null,"abstract":"Cross-layer reliability (CLR) presents a cost-effective alternative to traditional single-layer design in resource-constrained embedded systems. CLR provides the scope for leveraging the inherent fault-masking of multiple layers and exploiting application-specific tolerances to degradation in some Quality of Service (QoS) metrics. However, it can also lead to an explosion in the design complexity. State-of-the art approaches to such joint optimization across multiple degrees of freedom can lead to degradation in the system-level Design Space Exploration (DSE) results. To this end, we propose a DSE methodology for enabling CLR-aware task-mapping in heterogeneous embedded systems. Specifically, we present novel approaches to both task and system-level analysis for performing an early-stage exploration of various design decisions. The proposed methodology results in considerable improvements over other state-of-the-art approaches and shows significant scaling with application size.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115209714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Cross-Layer Power and Timing Evaluation Method for Wide Voltage Scaling 宽电压标度的跨层功率和时序评估方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218682
Wenjie Fu, Leilei Jin, Ming Ling, Yu Zheng, Longxing Shi
Wide supply voltage scaling is critical to enable worthwhile dynamic adjustment of the processor efficiency against varying workloads. In this paper, a cross-layer power and timing evaluation method is proposed to estimate the processor energy efficiency using both circuit and architectural information in a wide voltage range. The process variations are considered through statistical static timing analysis while the voltage effect is modeled through secondary iterated fittings. The error for estimating processor energy efficiency decreases to 8.29% when the supply voltage is scaled from 1.1V to 0.6V, while traditional architectural evaluations behave more than 40% errors.
宽电源电压缩放对于实现针对不同工作负载的有价值的处理器效率动态调整至关重要。本文提出了一种跨层功率和时序评估方法,在宽电压范围内利用电路和结构信息来评估处理器的能量效率。通过统计静态时序分析来考虑过程变化,通过二次迭代拟合来模拟电压效应。当电源电压从1.1V缩放到0.6V时,处理器能量效率的估计误差降低到8.29%,而传统的架构评估误差超过40%。
{"title":"A Cross-Layer Power and Timing Evaluation Method for Wide Voltage Scaling","authors":"Wenjie Fu, Leilei Jin, Ming Ling, Yu Zheng, Longxing Shi","doi":"10.1109/DAC18072.2020.9218682","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218682","url":null,"abstract":"Wide supply voltage scaling is critical to enable worthwhile dynamic adjustment of the processor efficiency against varying workloads. In this paper, a cross-layer power and timing evaluation method is proposed to estimate the processor energy efficiency using both circuit and architectural information in a wide voltage range. The process variations are considered through statistical static timing analysis while the voltage effect is modeled through secondary iterated fittings. The error for estimating processor energy efficiency decreases to 8.29% when the supply voltage is scaled from 1.1V to 0.6V, while traditional architectural evaluations behave more than 40% errors.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125327886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ALSRAC: Approximate Logic Synthesis by Resubstitution with Approximate Care Set 基于近似关心集的近似逻辑综合
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218627
Chang Meng, Weikang Qian, A. Mishchenko
Approximate computing is an emerging design technique for error-resilient applications. It improves circuit area, power, and delay at the cost of introducing some errors. Approximate logic synthesis (ALS) is an automatic process to produce approximate circuits. This paper proposes approximate resubstitution with approximate care set and uses it to build a simulation-based ALS flow. The experimental results demonstrate that the proposed method saves 7%–18% area compared to state-of-the-art methods. The code of ALSRAC is made open-source.
近似计算是一种新兴的容错设计技术。它以引入一些误差为代价,改善了电路面积、功率和延迟。近似逻辑合成(ALS)是一种自动生成近似电路的过程。本文提出了近似关心集的近似重替换,并利用它构建了一个基于仿真的ALS流程。实验结果表明,与现有方法相比,该方法节省了7% ~ 18%的面积。ALSRAC的代码是开源的。
{"title":"ALSRAC: Approximate Logic Synthesis by Resubstitution with Approximate Care Set","authors":"Chang Meng, Weikang Qian, A. Mishchenko","doi":"10.1109/DAC18072.2020.9218627","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218627","url":null,"abstract":"Approximate computing is an emerging design technique for error-resilient applications. It improves circuit area, power, and delay at the cost of introducing some errors. Approximate logic synthesis (ALS) is an automatic process to produce approximate circuits. This paper proposes approximate resubstitution with approximate care set and uses it to build a simulation-based ALS flow. The experimental results demonstrate that the proposed method saves 7%–18% area compared to state-of-the-art methods. The code of ALSRAC is made open-source.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126910666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1