首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability FTDL:一种适合深度学习的高扩展性fpga覆盖
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218581
Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding
Fast inference is of paramount value to a wide range of deep learning applications. This work presents FTDL, a highly-scalable FPGA overlay framework for deep learning applications, to address the architecture and hardware mismatch faced by traditional efforts. The FTDL overlay is specifically optimized for the tiled structure of FPGAs, thereby achieving post-place-and-route operating frequencies exceeding 88 % of the theoretical maximum across different devices and design scales. A flexible compilation framework efficiently schedules matrix multiply and convolution operations of large neural network inference on the overlay and achieved over 80 % hardware efficiency on average. Taking advantage of both high operating frequency and hardware efficiency, FTDL achieves 402.6 and 151.2 FPS with GoogLeNet and ResNet50 on ImageNet, respectively, while operating at a power efficiency of 27.6 GOPS/W, making it up to 7.7 × higher performance and 1.9× more power-efficient than the state-of-the-art.
快速推理对于广泛的深度学习应用具有至关重要的价值。这项工作提出了FTDL,一个高度可扩展的FPGA覆盖框架,用于深度学习应用,以解决传统工作所面临的架构和硬件不匹配问题。FTDL覆盖层专门针对fpga的平铺结构进行了优化,从而实现了在不同器件和设计规模下,放置和路由后的工作频率超过理论最大值的88%。灵活的编译框架有效地调度了覆盖层上大型神经网络推理的矩阵乘法和卷积运算,平均硬件效率达到80%以上。利用高工作频率和硬件效率,FTDL在GoogLeNet和ImageNet上的ResNet50分别达到402.6和151.2 FPS,同时以27.6 GOPS/W的功率效率工作,使其性能提高7.7倍,能效提高1.9倍。
{"title":"FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability","authors":"Runbin Shi, Yuhao Ding, Xuechao Wei, He Li, Hang Liu, Hayden Kwok-Hay So, Caiwen Ding","doi":"10.1109/DAC18072.2020.9218581","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218581","url":null,"abstract":"Fast inference is of paramount value to a wide range of deep learning applications. This work presents FTDL, a highly-scalable FPGA overlay framework for deep learning applications, to address the architecture and hardware mismatch faced by traditional efforts. The FTDL overlay is specifically optimized for the tiled structure of FPGAs, thereby achieving post-place-and-route operating frequencies exceeding 88 % of the theoretical maximum across different devices and design scales. A flexible compilation framework efficiently schedules matrix multiply and convolution operations of large neural network inference on the overlay and achieved over 80 % hardware efficiency on average. Taking advantage of both high operating frequency and hardware efficiency, FTDL achieves 402.6 and 151.2 FPS with GoogLeNet and ResNet50 on ImageNet, respectively, while operating at a power efficiency of 27.6 GOPS/W, making it up to 7.7 × higher performance and 1.9× more power-efficient than the state-of-the-art.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124098955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
ReTriple: Reduction of Redundant Rendering on Android Devices for Performance and Energy Optimizations 重复:减少冗余渲染在Android设备的性能和能源优化
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218517
Xianfeng Li, Gengchao Li, Xiaole Cui
Graphics rendering is a compute-intensive work and a major source of energy consumption on battery-driven mobile devices. Unlike the existing works that degrade user experience or reuse rendering results coarsely, we propose ReTriple, a fine-grained scheme to reduce rendering workload by reusing the past rendering results at the UI element level. This fine-grained reuse mechanism can explore more opportunities to reduce the workload of the rendering process and save energy. The experiments tested with popular apps show that ReTriple achieves an average speedup of 2.6x and per-frame energy saving of 32.3% for the rendering process while improving user experience.
图形渲染是一项计算密集型工作,也是电池驱动的移动设备的主要能源消耗来源。与现有的降低用户体验或粗略地重用呈现结果的工作不同,我们提出了ReTriple,这是一种细粒度的方案,通过在UI元素级别重用过去的呈现结果来减少呈现工作量。这种细粒度的重用机制可以探索更多的机会来减少呈现过程的工作负载并节省能源。通过对流行应用的实验测试表明,在改善用户体验的同时,retple在渲染过程中实现了平均2.6倍的加速和每帧32.3%的节能。
{"title":"ReTriple: Reduction of Redundant Rendering on Android Devices for Performance and Energy Optimizations","authors":"Xianfeng Li, Gengchao Li, Xiaole Cui","doi":"10.1109/DAC18072.2020.9218517","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218517","url":null,"abstract":"Graphics rendering is a compute-intensive work and a major source of energy consumption on battery-driven mobile devices. Unlike the existing works that degrade user experience or reuse rendering results coarsely, we propose ReTriple, a fine-grained scheme to reduce rendering workload by reusing the past rendering results at the UI element level. This fine-grained reuse mechanism can explore more opportunities to reduce the workload of the rendering process and save energy. The experiments tested with popular apps show that ReTriple achieves an average speedup of 2.6x and per-frame energy saving of 32.3% for the rendering process while improving user experience.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123485192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Wafer Map Defect Patterns Classification using Deep Selective Learning 基于深度选择学习的晶圆图缺陷模式分类
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218580
M. Alawieh, D. Boning, D. Pan
With the continuous drive toward integrated circuits scaling, efficient yield analysis is becoming more crucial yet more challenging. In this paper, we propose a novel methodology for wafer map defect pattern classification using deep selective learning. Our proposed approach features an integrated reject option where the model chooses to abstain from predicting a class label when misclassification risk is high. Thus, providing a trade-off between prediction coverage and misclassification risk. This selective learning scheme allows for new defect class detection, concept shift detection, and resource allocation. Besides, and to address the class imbalance problem in the wafer map classification, we propose a data augmentation framework built around a convolutional auto-encoder model for synthetic sample generation. The efficacy of our proposed approach is demonstrated on the WM-811k industrial dataset where it achieves 94% accuracy under full coverage and 99% with selective learning while successfully detecting new defect types.
随着集成电路规模的不断扩大,有效的良率分析变得越来越重要,但也越来越具有挑战性。本文提出了一种基于深度选择学习的晶圆图缺陷模式分类方法。我们提出的方法具有一个集成的拒绝选项,当错误分类风险很高时,模型选择放弃预测类别标签。因此,在预测覆盖率和错误分类风险之间提供了一种权衡。这种选择性学习方案允许新的缺陷类检测、概念转移检测和资源分配。此外,为了解决晶圆图分类中的类不平衡问题,我们提出了一个基于卷积自编码器模型的数据增强框架,用于合成样本的生成。我们提出的方法的有效性在WM-811k工业数据集上得到了证明,在完全覆盖下达到94%的准确率,在成功检测新缺陷类型的同时,选择性学习达到99%。
{"title":"Wafer Map Defect Patterns Classification using Deep Selective Learning","authors":"M. Alawieh, D. Boning, D. Pan","doi":"10.1109/DAC18072.2020.9218580","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218580","url":null,"abstract":"With the continuous drive toward integrated circuits scaling, efficient yield analysis is becoming more crucial yet more challenging. In this paper, we propose a novel methodology for wafer map defect pattern classification using deep selective learning. Our proposed approach features an integrated reject option where the model chooses to abstain from predicting a class label when misclassification risk is high. Thus, providing a trade-off between prediction coverage and misclassification risk. This selective learning scheme allows for new defect class detection, concept shift detection, and resource allocation. Besides, and to address the class imbalance problem in the wafer map classification, we propose a data augmentation framework built around a convolutional auto-encoder model for synthetic sample generation. The efficacy of our proposed approach is demonstrated on the WM-811k industrial dataset where it achieves 94% accuracy under full coverage and 99% with selective learning while successfully detecting new defect types.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121896576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Late Breaking Results: Pole-aware Analog Placement Considering Monotonic Current Flow and Crossing-Wire Minimization 迟断结果:考虑单调电流和交叉线最小化的极感知模拟放置
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218634
Abhishek Patyal, Hung-Ming Chen, Mark Po-Hung Lin
This paper presents a new paradigm for analog placement, which further incorporates poles in addition to the considerations of symmetry-island and monotonic current flow while minimizing wire crossings. The nodes along the signal path in an analog circuit contribute to the poles, and the parasitics on these dominant poles can significantly limit the circuit performance. Although the monotonic placements introduced in the previous works can generate simpler routing topologies, the unawareness of poles, especially both dominant pole and the first non-dominant pole, and wire crossing among critical nets may result in the increase wire-load and performance degradation. Experimental results show that the proposed pole-aware analog placement method considering symmetry-island, monotonic current flow, and crossing-wire minimization results in much better solution quality in terms of circuit performance.
本文提出了一种新的模拟放置范例,除了考虑对称岛和单调电流外,还进一步结合了极点,同时最大限度地减少了导线交叉。模拟电路中沿信号路径的节点构成了极点,这些极点上的寄生会极大地限制电路的性能。虽然在以前的工作中介绍的单调放置可以产生更简单的路由拓扑,但极点的不感知,特别是主导极点和第一个非主导极点,以及关键网络之间的导线交叉可能导致导线负载增加和性能下降。实验结果表明,考虑对称岛、单调电流和交叉线最小化的极感模拟放置方法在电路性能方面获得了更好的解决质量。
{"title":"Late Breaking Results: Pole-aware Analog Placement Considering Monotonic Current Flow and Crossing-Wire Minimization","authors":"Abhishek Patyal, Hung-Ming Chen, Mark Po-Hung Lin","doi":"10.1109/DAC18072.2020.9218634","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218634","url":null,"abstract":"This paper presents a new paradigm for analog placement, which further incorporates poles in addition to the considerations of symmetry-island and monotonic current flow while minimizing wire crossings. The nodes along the signal path in an analog circuit contribute to the poles, and the parasitics on these dominant poles can significantly limit the circuit performance. Although the monotonic placements introduced in the previous works can generate simpler routing topologies, the unawareness of poles, especially both dominant pole and the first non-dominant pole, and wire crossing among critical nets may result in the increase wire-load and performance degradation. Experimental results show that the proposed pole-aware analog placement method considering symmetry-island, monotonic current flow, and crossing-wire minimization results in much better solution quality in terms of circuit performance.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Purposeful Design Space Exploration of Heterogeneous CGRAs: Clock Frequency Estimation 面向异构CGRAs有目的的设计空间探索:时钟频率估计
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218649
D. Wolf, Christoph Spang, C. Hochberger
Coarse Grained Reconfigurable Arrays become increasingly popular. Besides research on scheduling algorithms and microarchitecture concepts, the use of heterogeneous structures can be a key approach to exploit their full potential. Unfortunately, a purposeful design space exploration of CGRAs is not trivial, since one needs to know the clock frequency of the resulting hardware implementation. This paper discusses challenges and a statistical approach to maximum clock frequency estimation of heterogeneous CGRAs with an irregular interconnect on FPGAs. The presented approach allows estimation with a maximum error of 8.8 - 17.4% and a mean error of only 1.9 - 4.6%.
粗粒度可重构数组越来越受欢迎。除了对调度算法和微体系结构概念的研究外,异构结构的使用可能是开发其全部潜力的关键方法。不幸的是,对CGRAs进行有目的的设计空间探索并非易事,因为需要知道最终硬件实现的时钟频率。本文讨论了fpga上具有不规则互连的异构CGRAs最大时钟频率估计的挑战和统计方法。提出的方法允许估计的最大误差为8.8 - 17.4%,平均误差仅为1.9 - 4.6%。
{"title":"Towards Purposeful Design Space Exploration of Heterogeneous CGRAs: Clock Frequency Estimation","authors":"D. Wolf, Christoph Spang, C. Hochberger","doi":"10.1109/DAC18072.2020.9218649","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218649","url":null,"abstract":"Coarse Grained Reconfigurable Arrays become increasingly popular. Besides research on scheduling algorithms and microarchitecture concepts, the use of heterogeneous structures can be a key approach to exploit their full potential. Unfortunately, a purposeful design space exploration of CGRAs is not trivial, since one needs to know the clock frequency of the resulting hardware implementation. This paper discusses challenges and a statistical approach to maximum clock frequency estimation of heterogeneous CGRAs with an irregular interconnect on FPGAs. The presented approach allows estimation with a maximum error of 8.8 - 17.4% and a mean error of only 1.9 - 4.6%.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Learning Multi-Channel Fusion Attack Against Side-Channel Protected Hardware 针对侧信道保护硬件的深度学习多信道融合攻击
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218705
Benjamin Hettwer, Daniel Fennes, S. Leger, Jan Richter-Brockmann, Stefan Gehrer, T. Güneysu
State-of-the-art hardware masking approaches like threshold implementations and domain-oriented masking provide a guaranteed level of security even in the presence of glitches. Although provable secure in theory, recent work showed that the effective security order of a masked hardware implementation can be lowered by applying a multi-probe attack or exploiting externally amplified coupling effects. However, the proposed attacks are based on an unrealistic adversary model (i.e. knowledge of masks values during profiling) or require complex measurement setup manipulations.In this work, we propose a novel attack vector that exploits location dependent leakage from several decoupling capacitors of a modern System-on-Chip (SoC) with 16 nm fabrication technology. We combine the leakage from different sources using a deep learning-based information fusion approach. The results show a remarkable advantage regarding the number of required traces for a successful key recovery compared to state-of-the-art profiled side-channel attacks. All evaluations are performed under realistic conditions, resulting in a real-world attack scenario that is not limited to academic environments.
最先进的硬件屏蔽方法,如阈值实现和面向域的屏蔽,即使在存在故障的情况下也能提供有保证的安全级别。虽然理论上是安全的,但最近的研究表明,通过应用多探针攻击或利用外部放大的耦合效应,可以降低掩膜硬件实现的有效安全顺序。然而,所提出的攻击是基于不现实的对手模型(即在分析过程中对掩码值的了解)或需要复杂的测量设置操作。在这项工作中,我们提出了一种新的攻击向量,利用16纳米制造技术的现代片上系统(SoC)的几个去耦电容器的位置相关泄漏。我们使用基于深度学习的信息融合方法结合来自不同来源的泄漏。结果显示,与最先进的侧信道攻击相比,成功恢复密钥所需的迹线数量具有显着优势。所有的评估都是在真实的条件下进行的,从而产生一个不局限于学术环境的真实攻击场景。
{"title":"Deep Learning Multi-Channel Fusion Attack Against Side-Channel Protected Hardware","authors":"Benjamin Hettwer, Daniel Fennes, S. Leger, Jan Richter-Brockmann, Stefan Gehrer, T. Güneysu","doi":"10.1109/DAC18072.2020.9218705","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218705","url":null,"abstract":"State-of-the-art hardware masking approaches like threshold implementations and domain-oriented masking provide a guaranteed level of security even in the presence of glitches. Although provable secure in theory, recent work showed that the effective security order of a masked hardware implementation can be lowered by applying a multi-probe attack or exploiting externally amplified coupling effects. However, the proposed attacks are based on an unrealistic adversary model (i.e. knowledge of masks values during profiling) or require complex measurement setup manipulations.In this work, we propose a novel attack vector that exploits location dependent leakage from several decoupling capacitors of a modern System-on-Chip (SoC) with 16 nm fabrication technology. We combine the leakage from different sources using a deep learning-based information fusion approach. The results show a remarkable advantage regarding the number of required traces for a successful key recovery compared to state-of-the-art profiled side-channel attacks. All evaluations are performed under realistic conditions, resulting in a real-world attack scenario that is not limited to academic environments.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116505418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FlexReduce: Flexible All-reduce for Distributed Deep Learning on Asymmetric Network Topology FlexReduce:用于非对称网络拓扑上分布式深度学习的灵活全约简
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218538
Jinho Lee, Inseok Hwang, Soham Shah, Minsik Cho
We propose FlexReduce, an efficient and flexible all-reduce algorithm for distributed deep learning under irregular network hierarchies. With ever-growing deep neural networks, distributed learning over multiple nodes is becoming imperative for expedited training. There are several approaches leveraging the symmetric network structure to optimize the performance over different hierarchy levels of the network. However, the assumption of symmetric network does not always hold, especially in shared cloud environments. By allocating an uneven portion of gradients to each learner (GPU), FlexReduce outperforms conventional algorithms on asymmetric network structures, and still performs even or better on symmetric networks.
FlexReduce是一种高效灵活的全约简算法,用于不规则网络层次下的分布式深度学习。随着深度神经网络的不断发展,多节点的分布式学习对于快速训练变得势在必行。有几种方法利用对称网络结构在网络的不同层次上优化性能。然而,对称网络的假设并不总是成立,特别是在共享云环境中。通过将不均匀的梯度分配给每个学习器(GPU), FlexReduce在非对称网络结构上优于传统算法,并且在对称网络上仍然表现均匀或更好。
{"title":"FlexReduce: Flexible All-reduce for Distributed Deep Learning on Asymmetric Network Topology","authors":"Jinho Lee, Inseok Hwang, Soham Shah, Minsik Cho","doi":"10.1109/DAC18072.2020.9218538","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218538","url":null,"abstract":"We propose FlexReduce, an efficient and flexible all-reduce algorithm for distributed deep learning under irregular network hierarchies. With ever-growing deep neural networks, distributed learning over multiple nodes is becoming imperative for expedited training. There are several approaches leveraging the symmetric network structure to optimize the performance over different hierarchy levels of the network. However, the assumption of symmetric network does not always hold, especially in shared cloud environments. By allocating an uneven portion of gradients to each learner (GPU), FlexReduce outperforms conventional algorithms on asymmetric network structures, and still performs even or better on symmetric networks.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
KFR: Optimal Cache Management with K-Framed Reclamation for Drive-Managed SMR Disks 基于k帧回收的驱动管理SMR磁盘的最优缓存管理
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218636
Chenlin Ma, Yi Wang, Zhaoyan Shen, Z. Shao
Shingled Magnetic Recording (SMR) disks have been proposed as a promising solution to satisfy the increasing capacity need in the big data era. Drive-Managed SMR (DM-SMR) disk which acts as a traditional block device is favored for providing high compatibility. However, DM-SMR disks suffer from high performance recovery time (PRT) due to the "SMR space reclamation" issue. This paper proposes an optimal cache management named K-Framed Reclamation (KFR) to minimize PRT within the DM-SMR disk. The effectiveness of our proposed design was evaluated with realistic and intensive I/O workloads and the results are encouraging.
瓦式磁记录(SMR)磁盘被认为是满足大数据时代日益增长的容量需求的一种很有前途的解决方案。驱动器管理的SMR (DM-SMR)磁盘作为传统的块设备,由于具有较高的兼容性而受到青睐。但是,由于“SMR空间回收”问题,DM-SMR磁盘的性能恢复时间(PRT)较高。本文提出了一种基于k帧回收(K-Framed Reclamation, KFR)的优化缓存管理方法,以最小化DM-SMR磁盘内的PRT。我们建议的设计的有效性在实际和密集的I/O工作负载下进行了评估,结果令人鼓舞。
{"title":"KFR: Optimal Cache Management with K-Framed Reclamation for Drive-Managed SMR Disks","authors":"Chenlin Ma, Yi Wang, Zhaoyan Shen, Z. Shao","doi":"10.1109/DAC18072.2020.9218636","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218636","url":null,"abstract":"Shingled Magnetic Recording (SMR) disks have been proposed as a promising solution to satisfy the increasing capacity need in the big data era. Drive-Managed SMR (DM-SMR) disk which acts as a traditional block device is favored for providing high compatibility. However, DM-SMR disks suffer from high performance recovery time (PRT) due to the \"SMR space reclamation\" issue. This paper proposes an optimal cache management named K-Framed Reclamation (KFR) to minimize PRT within the DM-SMR disk. The effectiveness of our proposed design was evaluated with realistic and intensive I/O workloads and the results are encouraging.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"34 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131653368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware-assisted Service Live Migration in Resource-limited Edge Computing Systems 资源有限边缘计算系统中硬件辅助服务实时迁移
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218677
Zhe Zhou, Xintong Li, Xiaoyang Wang, Zheng Liang, Guangyu Sun, Guojie Luo
Service live migration means migrating the running services from one machine to another with negligible service downtime. It has been considered as a powerful mechanism to facilitate service management. However, conventional live migration methods always come with expensive cost of data transmission, and thus can hardly be applied to a real-world edge computing system directly due to the limited network bandwidth. To tackle this problem, some recent works present various techniques to reduce the data transmission.However, these techniques for data transmission reduction always introduce extra computational costs, which have a great impact on the quality of service (QoS), especially in edge systems containing lots of nodes with insufficient computational resources. To alleviate this issue, we propose an insight to offload data reduction computations to a specific hardware accelerator, thus reducing the burden of CPU cores. To this end, we present a novel hardware accelerator design to speed up the data transmission reduction computations to accelerate the service live migration. For evaluation, we implement a prototype on an FPGA platform. Compared to the normal CPU-based approaches, our specialized accelerator is 3.1× faster, 2.9× more-energy efficient, and can reduce 29%∼47% of total migrating time and 24%∼40% of service downtime in our cases. Furthermore, our architecture has great scalability and is easy-configurable to achieve a balance between cost and performance.
服务实时迁移意味着将运行的服务从一台机器迁移到另一台机器,而服务停机时间可以忽略不计。它被认为是一种促进服务管理的强大机制。然而,传统的实时迁移方法往往具有昂贵的数据传输成本,并且由于网络带宽的限制,难以直接应用于实际的边缘计算系统。为了解决这个问题,最近的一些研究提出了各种减少数据传输的技术。然而,这些减少数据传输的技术往往会引入额外的计算成本,这对服务质量(QoS)有很大的影响,特别是在包含大量节点且计算资源不足的边缘系统中。为了缓解这个问题,我们提出了一种见解,将数据减少计算卸载到特定的硬件加速器,从而减少CPU内核的负担。为此,我们提出了一种新的硬件加速器设计,以加快数据传输减少计算,从而加快业务的实时迁移。为了进行评估,我们在FPGA平台上实现了一个原型。与普通的基于cpu的方法相比,我们的专用加速器速度快3.1倍,能效高2.9倍,并且在我们的案例中可以减少29% ~ 47%的总迁移时间和24% ~ 40%的服务停机时间。此外,我们的架构具有很强的可扩展性,并且易于配置,从而实现成本和性能之间的平衡。
{"title":"Hardware-assisted Service Live Migration in Resource-limited Edge Computing Systems","authors":"Zhe Zhou, Xintong Li, Xiaoyang Wang, Zheng Liang, Guangyu Sun, Guojie Luo","doi":"10.1109/DAC18072.2020.9218677","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218677","url":null,"abstract":"Service live migration means migrating the running services from one machine to another with negligible service downtime. It has been considered as a powerful mechanism to facilitate service management. However, conventional live migration methods always come with expensive cost of data transmission, and thus can hardly be applied to a real-world edge computing system directly due to the limited network bandwidth. To tackle this problem, some recent works present various techniques to reduce the data transmission.However, these techniques for data transmission reduction always introduce extra computational costs, which have a great impact on the quality of service (QoS), especially in edge systems containing lots of nodes with insufficient computational resources. To alleviate this issue, we propose an insight to offload data reduction computations to a specific hardware accelerator, thus reducing the burden of CPU cores. To this end, we present a novel hardware accelerator design to speed up the data transmission reduction computations to accelerate the service live migration. For evaluation, we implement a prototype on an FPGA platform. Compared to the normal CPU-based approaches, our specialized accelerator is 3.1× faster, 2.9× more-energy efficient, and can reduce 29%∼47% of total migrating time and 24%∼40% of service downtime in our cases. Furthermore, our architecture has great scalability and is easy-configurable to achieve a balance between cost and performance.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130736899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Robust Exponential Integrator Method for Generic Nonlinear Circuit Simulation 一般非线性电路仿真的鲁棒指数积分法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218556
Quan Chen
In this paper, we aim to address two long-lasting issues in large scale transient circuit simulation using the exponential integrator (EI) method. First is the numerical instability caused by the singularity in the differential-algebraic equation system. Our proposed solution is a systematic, algebraic and sparsity preserving regularization technique to eliminate the unstable modes in the system to be solved. Next, we devise a generic scheme to apply Newton-Raphson iterations in the EI framework for enhanced nonlinearity handling capability. With the two techniques, we wish to elevate the robustness and performance of EI and make it a competitive alternative to the existing SPICE-type simulators in practical usage.
本文旨在利用指数积分器(EI)方法解决大规模暂态电路仿真中两个长期存在的问题。首先是微分-代数方程组的奇异性引起的数值不稳定性。我们提出的解决方案是一种系统的、代数的、保持稀疏性的正则化技术来消除待解系统中的不稳定模态。接下来,我们设计了一个通用方案,在EI框架中应用牛顿-拉夫森迭代来增强非线性处理能力。通过这两种技术,我们希望提高EI的鲁棒性和性能,使其在实际使用中成为现有spice型模拟器的竞争替代品。
{"title":"A Robust Exponential Integrator Method for Generic Nonlinear Circuit Simulation","authors":"Quan Chen","doi":"10.1109/DAC18072.2020.9218556","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218556","url":null,"abstract":"In this paper, we aim to address two long-lasting issues in large scale transient circuit simulation using the exponential integrator (EI) method. First is the numerical instability caused by the singularity in the differential-algebraic equation system. Our proposed solution is a systematic, algebraic and sparsity preserving regularization technique to eliminate the unstable modes in the system to be solved. Next, we devise a generic scheme to apply Newton-Raphson iterations in the EI framework for enhanced nonlinearity handling capability. With the two techniques, we wish to elevate the robustness and performance of EI and make it a competitive alternative to the existing SPICE-type simulators in practical usage.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134423917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1