首页 > 最新文献

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage 基于机器学习的云块存储SSD缓存写策略研究
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116539
Yu Zhang, Ke Zhou, Ping Huang, Hua Wang, Jianying Hu, Yangtao Wang, Yongguang Ji, Bin Cheng
Nowadays, SSD cache plays an important role in cloud storage systems. The associated write policy, which enforces an admission control policy regarding filling data into the cache, has a significant impact on the performance of the cache system and the amount of write traffic to SSD caches. Based on our analysis on a typical cloud block storage system, approximately 47.09% writes are write-only, i.e., writes to the blocks which have not been read during a certain time window. Naively writing the write-only data to the SSD cache unnecessarily introduces a large number of harmful writes to the SSD cache without any contribution to cache performance. On the other hand, it is a challenging task to identify and filter out those write-only data in a real-time manner, especially in a cloud environment running changing and diverse workloads.In this paper, to alleviate the above cache problem, we propose an ML-WP, Machine Learning Based Write Policy, which reduces write traffic to SSDs by avoiding writing write-only data. The main challenge in this approach is to identify write-only data in a real-time manner. To realize ML-WP and achieve accurate write-only data identification, we use machine learning methods to classify data into two groups (i.e., write-only and normal data). Based on this classification, the write-only data is directly written to backend storage without being cached. Experimental results show that, compared with the industry widely deployed write-back policy, ML-WP decreases write traffic to SSD cache by 41.52%, while improving the hit ratio by 2.61% and reducing the average read latency by 37.52%.
当前,SSD缓存在云存储系统中扮演着重要的角色。关联写策略对cache系统的性能和对SSD cache的写流量有较大的影响。根据我们对一个典型的云块存储系统的分析,大约47.09%的写操作是只写,即写到某个时间窗口内没有被读的块。如果单纯地将只写数据写入SSD cache,会导致不必要的大量有害的写操作,对SSD cache的性能没有任何影响。另一方面,以实时方式识别和过滤这些只写数据是一项具有挑战性的任务,特别是在运行不断变化和多样化工作负载的云环境中。在本文中,为了缓解上述缓存问题,我们提出了一个ML-WP,基于机器学习的写策略,它通过避免写只写数据来减少对ssd的写流量。这种方法的主要挑战是以实时的方式识别只写数据。为了实现ML-WP并实现准确的只写数据识别,我们使用机器学习方法将数据分为两组(即只写数据和正常数据)。根据这种分类,只写数据直接写入后端存储,而不缓存。实验结果表明,与业界广泛部署的回写策略相比,ML-WP将对SSD缓存的写流量减少了41.52%,命中率提高了2.61%,平均读延迟降低了37.52%。
{"title":"A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage","authors":"Yu Zhang, Ke Zhou, Ping Huang, Hua Wang, Jianying Hu, Yangtao Wang, Yongguang Ji, Bin Cheng","doi":"10.23919/DATE48585.2020.9116539","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116539","url":null,"abstract":"Nowadays, SSD cache plays an important role in cloud storage systems. The associated write policy, which enforces an admission control policy regarding filling data into the cache, has a significant impact on the performance of the cache system and the amount of write traffic to SSD caches. Based on our analysis on a typical cloud block storage system, approximately 47.09% writes are write-only, i.e., writes to the blocks which have not been read during a certain time window. Naively writing the write-only data to the SSD cache unnecessarily introduces a large number of harmful writes to the SSD cache without any contribution to cache performance. On the other hand, it is a challenging task to identify and filter out those write-only data in a real-time manner, especially in a cloud environment running changing and diverse workloads.In this paper, to alleviate the above cache problem, we propose an ML-WP, Machine Learning Based Write Policy, which reduces write traffic to SSDs by avoiding writing write-only data. The main challenge in this approach is to identify write-only data in a real-time manner. To realize ML-WP and achieve accurate write-only data identification, we use machine learning methods to classify data into two groups (i.e., write-only and normal data). Based on this classification, the write-only data is directly written to backend storage without being cached. Experimental results show that, compared with the industry widely deployed write-back policy, ML-WP decreases write traffic to SSD cache by 41.52%, while improving the hit ratio by 2.61% and reducing the average read latency by 37.52%.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130706570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Hypergeometric Distribution as a More Accurate Model for Stochastic Computing 超几何分布是一种更精确的随机计算模型
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116492
T. Baker, J. Hayes
A fundamental assumption in stochastic computing (SC) is that bit-streams are generally well-approximated by a Bernoulli process, i.e., a sequence of independent 0-1 choices. We show that this assumption is flawed in unexpected and significant ways for some bit-streams such as those produced by a typical LFSR-based stochastic number generator (SNG). In particular, the Bernoulli assumption leads to a surprising overestimation of output errors and how they vary with input changes. We then propose a more accurate model for such bit-streams based on the hypergeometric distribution and examine its implications for several SC applications. First, we explore the effect of correlation on a mux-based stochastic adder and show that, contrary to what was previously thought, it is not entirely correlation insensitive. Further, inspired by the hypergeometric model, we introduce a new mux tree adder that offers major area savings and accuracy improvement. The effectiveness of this study is validated on a large image processing circuit which achieves an accuracy improvement of 32%, combined with a reduction in overall circuit area.
随机计算(SC)中的一个基本假设是,比特流通常由伯努利过程很好地近似,即一个独立的0-1选择序列。我们表明,对于一些比特流,例如由典型的基于lfsr的随机数字发生器(SNG)产生的比特流,这种假设在意想不到的和显著的方面存在缺陷。特别是,伯努利假设导致了对输出误差的惊人高估,以及它们如何随输入变化而变化。然后,我们提出了一个基于超几何分布的更准确的比特流模型,并研究了它对几个SC应用的影响。首先,我们探讨了相关性对基于多的随机加法器的影响,并表明,与以前认为的相反,它并非完全相关不敏感。此外,受超几何模型的启发,我们引入了一种新的多树加法器,可以节省大量面积并提高精度。在一个大型图像处理电路上验证了该研究的有效性,该电路的精度提高了32%,同时减小了整个电路的面积。
{"title":"The Hypergeometric Distribution as a More Accurate Model for Stochastic Computing","authors":"T. Baker, J. Hayes","doi":"10.23919/DATE48585.2020.9116492","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116492","url":null,"abstract":"A fundamental assumption in stochastic computing (SC) is that bit-streams are generally well-approximated by a Bernoulli process, i.e., a sequence of independent 0-1 choices. We show that this assumption is flawed in unexpected and significant ways for some bit-streams such as those produced by a typical LFSR-based stochastic number generator (SNG). In particular, the Bernoulli assumption leads to a surprising overestimation of output errors and how they vary with input changes. We then propose a more accurate model for such bit-streams based on the hypergeometric distribution and examine its implications for several SC applications. First, we explore the effect of correlation on a mux-based stochastic adder and show that, contrary to what was previously thought, it is not entirely correlation insensitive. Further, inspired by the hypergeometric model, we introduce a new mux tree adder that offers major area savings and accuracy improvement. The effectiveness of this study is validated on a large image processing circuit which achieves an accuracy improvement of 32%, combined with a reduction in overall circuit area.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132476420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Synthesis of Fault-Tolerant Reconfigurable Scan Networks 容错可重构扫描网络的综合
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116525
Sebastian Brandhofer, M. Kochte, H. Wunderlich
On-chip instrumentation is mandatory for efficient bring-up, test and diagnosis, post-silicon validation, as well as in-field calibration, maintenance, and fault tolerance. Reconfigurable scan networks (RSNs) provide a scalable and efficient scan-based access mechanism to such instruments. The correct operation of this access mechanism is crucial for all manufacturing, bring-up and debug tasks as well as for in-field operation, but it can be affected by faults and design errors.This work develops for the first time fault-tolerant RSNs such that the resulting scan network still provides access to as many instruments as possible in presence of a fault. The work contributes a model and an algorithm to compute scan paths in faulty RSNs, a metric to quantify its fault tolerance and a synthesis algorithm that is based on graph connectivity and selective hardening of control logic in the scan network. Experimental results demonstrate that fault-tolerant RSNs can be synthesized with only moderate hardware overhead.
片上仪器是强制性的有效的启动,测试和诊断,硅后验证,以及现场校准,维护和容错。可重构扫描网络(rsn)为此类仪器提供了可扩展且高效的基于扫描的访问机制。该访问机制的正确操作对于所有制造、启动和调试任务以及现场操作都至关重要,但它可能受到故障和设计错误的影响。这项工作首次开发了容错性rsn,使得在存在故障的情况下,产生的扫描网络仍然可以访问尽可能多的仪器。该工作提供了一个模型和算法来计算故障rsn中的扫描路径,一个度量来量化其容错性,以及一个基于图连通性和扫描网络中控制逻辑的选择性强化的综合算法。实验结果表明,在硬件开销适中的情况下,可以合成容错rsn。
{"title":"Synthesis of Fault-Tolerant Reconfigurable Scan Networks","authors":"Sebastian Brandhofer, M. Kochte, H. Wunderlich","doi":"10.23919/DATE48585.2020.9116525","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116525","url":null,"abstract":"On-chip instrumentation is mandatory for efficient bring-up, test and diagnosis, post-silicon validation, as well as in-field calibration, maintenance, and fault tolerance. Reconfigurable scan networks (RSNs) provide a scalable and efficient scan-based access mechanism to such instruments. The correct operation of this access mechanism is crucial for all manufacturing, bring-up and debug tasks as well as for in-field operation, but it can be affected by faults and design errors.This work develops for the first time fault-tolerant RSNs such that the resulting scan network still provides access to as many instruments as possible in presence of a fault. The work contributes a model and an algorithm to compute scan paths in faulty RSNs, a metric to quantify its fault tolerance and a synthesis algorithm that is based on graph connectivity and selective hardening of control logic in the scan network. Experimental results demonstrate that fault-tolerant RSNs can be synthesized with only moderate hardware overhead.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130825602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Real-Time Energy Monitoring in IoT-enabled Mobile Devices 支持物联网的移动设备中的实时能源监测
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116577
N. Shivaraman, Seima Saki, Zhiwei Liu, Saravanan Ramanathan, A. Easwaran, S. Steinhorst
With rapid advancements in the Internet of Things (IoT) paradigm, electrical devices in the near future is expected to have IoT capabilities. This enables fine-grained tracking of individual energy consumption data of such devices, offering location-independent per-device billing. Thus, it is more fine-grained than the location-based metering of state-of-the-art infrastructure, which traditionally aggregates on a building or household level, defining the entity to be billed. However, such in-device energy metering is susceptible to manipulation and fraud. As a remedy, we propose a decentralized metering architecture that enables devices with IoT capabilities to measure their own energy consumption. In this architecture, the device-level consumption is additionally reported to a system-level aggregator that verifies distributed information and provides secure data storage using Blockchain, preventing data manipulation by untrusted entities. Using evaluations on an experimental testbed, we show that the proposed architecture supports device mobility and enables location-independent monitoring of energy consumption.
随着物联网(IoT)范式的快速发展,预计在不久的将来,电气设备将具有物联网功能。这样就可以对此类设备的单个能耗数据进行细粒度跟踪,提供与位置无关的每台设备计费。因此,它比最先进的基础设施的基于位置的计量更细粒度,后者传统上在建筑物或家庭级别上聚集,定义要计费的实体。然而,这种设备内能量计量容易受到操纵和欺诈。作为补救措施,我们提出了一种分散的计量架构,使具有物联网功能的设备能够测量自己的能耗。在这个体系结构中,设备级的消费被额外报告给系统级聚合器,该聚合器验证分布式信息,并使用区块链提供安全的数据存储,防止不受信任的实体操纵数据。通过对实验测试平台的评估,我们证明了所提出的架构支持设备移动性,并能够实现与位置无关的能耗监测。
{"title":"Real-Time Energy Monitoring in IoT-enabled Mobile Devices","authors":"N. Shivaraman, Seima Saki, Zhiwei Liu, Saravanan Ramanathan, A. Easwaran, S. Steinhorst","doi":"10.23919/DATE48585.2020.9116577","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116577","url":null,"abstract":"With rapid advancements in the Internet of Things (IoT) paradigm, electrical devices in the near future is expected to have IoT capabilities. This enables fine-grained tracking of individual energy consumption data of such devices, offering location-independent per-device billing. Thus, it is more fine-grained than the location-based metering of state-of-the-art infrastructure, which traditionally aggregates on a building or household level, defining the entity to be billed. However, such in-device energy metering is susceptible to manipulation and fraud. As a remedy, we propose a decentralized metering architecture that enables devices with IoT capabilities to measure their own energy consumption. In this architecture, the device-level consumption is additionally reported to a system-level aggregator that verifies distributed information and provides secure data storage using Blockchain, preventing data manipulation by untrusted entities. Using evaluations on an experimental testbed, we show that the proposed architecture supports device mobility and enables location-independent monitoring of energy consumption.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130846811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
M3D-ADTCO: Monolithic 3D Architecture, Design and Technology Co-Optimization for High Energy Efficient 3D IC M3D-ADTCO:高能效3D集成电路的单片3D架构、设计和技术协同优化
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116293
S. Thuries, O. Billoint, Sylvain Choisnet, R. Lemaire, P. Vivet, P. Batude, D. Lattard
Monolithic 3D (M3D) stands now as the ultimate technology to side step Moore’s Law stagnation. Due to its nanoscale Monolithic Inter-tier Via (MIV), M3D enables an ultrahigh density interconnect between Logic and Memory that is required in the field of highly energy efficient 3D integrated circuits (3D-ICs) designed for new abundant data computing systems. At design level, M3D still suffers from a lack of commercial tools, especially for Place and Route, precluding the capability to provide signoff M3D GDS. In this paper, we introduce M3D-ADTCO, an architecture, design and technology co-optimization platform aimed at providing signoff M3D GDS. It relies on a M3D Process Design Kit and the use of a commercial Place and Route tool. We demonstrate an area reduction of 23.61 % at iso performance and power compared to a 2D RISC-V micro-controller based System on Chip (SoC) while creating space to increase (2x) the RISC-V instruction memory.
单片3D (M3D)现在是突破摩尔定律停滞的终极技术。由于其纳米级单片层间通孔(MIV), M3D实现了逻辑和存储器之间的超高密度互连,这是为新的丰富数据计算系统设计的高能效3D集成电路(3D- ic)领域所需要的。在设计层面,M3D仍然缺乏商业工具,特别是对于Place和Route,这阻碍了提供签名M3D GDS的能力。在本文中,我们介绍了M3D- adtco,一个旨在提供签名M3D GDS的架构、设计和技术协同优化平台。它依赖于一个M3D过程设计套件和使用的商业地点和路线工具。我们展示了与基于片上系统(SoC)的2D RISC-V微控制器相比,在iso性能和功耗下面积减少了23.61%,同时创造了增加(2倍)RISC-V指令存储器的空间。
{"title":"M3D-ADTCO: Monolithic 3D Architecture, Design and Technology Co-Optimization for High Energy Efficient 3D IC","authors":"S. Thuries, O. Billoint, Sylvain Choisnet, R. Lemaire, P. Vivet, P. Batude, D. Lattard","doi":"10.23919/DATE48585.2020.9116293","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116293","url":null,"abstract":"Monolithic 3D (M3D) stands now as the ultimate technology to side step Moore’s Law stagnation. Due to its nanoscale Monolithic Inter-tier Via (MIV), M3D enables an ultrahigh density interconnect between Logic and Memory that is required in the field of highly energy efficient 3D integrated circuits (3D-ICs) designed for new abundant data computing systems. At design level, M3D still suffers from a lack of commercial tools, especially for Place and Route, precluding the capability to provide signoff M3D GDS. In this paper, we introduce M3D-ADTCO, an architecture, design and technology co-optimization platform aimed at providing signoff M3D GDS. It relies on a M3D Process Design Kit and the use of a commercial Place and Route tool. We demonstrate an area reduction of 23.61 % at iso performance and power compared to a 2D RISC-V micro-controller based System on Chip (SoC) while creating space to increase (2x) the RISC-V instruction memory.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131654635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards Best-effort Approximation: Applying NAS to General-purpose Approximate Computing 迈向最优逼近:NAS在通用逼近计算中的应用
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116250
Weiwei Chen, Ying Wang, Shuang Yang, Chen Liu, Lei Zhang
The design of neural network architecture for code approximation involves a large number of hyper-parameters to explore, it is a non-trivial task to find an neural-based approximate computing solution that meets the demand of application-specified accuracy and Quality of Service (QoS). Prior works do not address the problem of ‘optimal’ network architectures design in program approximation, which depends on the user-specified constraints, the complexity of dataset and the hardware configuration. In this paper, we apply Neural Architecture Search (NAS) for searching and selecting the neural approximate computing and provide an automatic framework that tries to generate the best-effort approximation result while satisfying the user-specified QoS/accuracy constraints. Compared with previous method, this work achieves more than 1.43x speedup and 1.74x energy reduction on average when applied to the AxBench benchmarks.
代码逼近神经网络体系结构的设计涉及到大量的超参数探索,如何找到一种满足应用指定精度和服务质量(QoS)要求的基于神经网络的近似计算解是一项非常重要的任务。先前的工作没有解决程序近似中“最优”网络架构设计的问题,这取决于用户指定的约束、数据集的复杂性和硬件配置。在本文中,我们应用神经架构搜索(NAS)来搜索和选择神经近似计算,并提供一个自动框架,在满足用户指定的QoS/精度约束的情况下,尝试生成最佳努力的近似结果。与以前的方法相比,该工作在AxBench基准测试中平均实现了1.43倍以上的加速和1.74倍以上的能耗降低。
{"title":"Towards Best-effort Approximation: Applying NAS to General-purpose Approximate Computing","authors":"Weiwei Chen, Ying Wang, Shuang Yang, Chen Liu, Lei Zhang","doi":"10.23919/DATE48585.2020.9116250","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116250","url":null,"abstract":"The design of neural network architecture for code approximation involves a large number of hyper-parameters to explore, it is a non-trivial task to find an neural-based approximate computing solution that meets the demand of application-specified accuracy and Quality of Service (QoS). Prior works do not address the problem of ‘optimal’ network architectures design in program approximation, which depends on the user-specified constraints, the complexity of dataset and the hardware configuration. In this paper, we apply Neural Architecture Search (NAS) for searching and selecting the neural approximate computing and provide an automatic framework that tries to generate the best-effort approximation result while satisfying the user-specified QoS/accuracy constraints. Compared with previous method, this work achieves more than 1.43x speedup and 1.74x energy reduction on average when applied to the AxBench benchmarks.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128759408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DC-CNN: Computational Flow Redefinition for Efficient CNN through Structural Decoupling DC-CNN:基于结构解耦的高效CNN计算流重新定义
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116429
Fuxun Yu, Zhuwei Qin, Di Wang, Ping Xu, Chenchen Liu, Zhi Tian, Xiang Chen
Recently Convolutional Neural Networks (CNNs) are widely applied into novel intelligent applications and systems. However, the CNN computation performance is significantly hindered by its computation flow, which computes the model structure sequentially by layers with massive convolution operations. Such a layer-wise sequential computation flow can cause certain performance issues, such as resource under-utilization, huge memory overhead, etc. To solve these problems, we propose a novel CNN structural decoupling method, which could decouple CNN models into "critical paths" and eliminate the inter-layer data dependency. Based on this method, we redefine the CNN computation flow into parallel and cascade computing paradigms, which can significantly enhance the CNN computation performance with both multi-core and single-core CPU processors. Experiments show that, our DC-CNN framework could reduce 24% to 33% latency on multi-core CPUs for CIFAR and ImageNet. On small-capacity mobile platforms, cascade computing could reduce the latency by average 24% on ImageNet and 42% on CIFAR10. Meanwhile, the memory reduction could also reach average 21% and 64%, respectively.
近年来,卷积神经网络被广泛应用于新型智能应用和系统中。然而,CNN的计算流程严重阻碍了其计算性能,该流程通过大量的卷积操作逐层逐级计算模型结构。这种分层顺序计算流可能会导致某些性能问题,例如资源利用不足、巨大的内存开销等。为了解决这些问题,我们提出了一种新的CNN结构解耦方法,该方法可以将CNN模型解耦为“关键路径”,并消除层间数据依赖。基于该方法,我们将CNN的计算流程重新定义为并行和级联计算范式,可以显著提高CNN在多核和单核CPU处理器下的计算性能。实验表明,我们的DC-CNN框架可以将CIFAR和ImageNet在多核cpu上的延迟降低24%到33%。在小容量的移动平台上,级联计算可以在ImageNet上平均减少24%的延迟,在CIFAR10上平均减少42%。同时,内存减少也可以达到平均21%和64%。
{"title":"DC-CNN: Computational Flow Redefinition for Efficient CNN through Structural Decoupling","authors":"Fuxun Yu, Zhuwei Qin, Di Wang, Ping Xu, Chenchen Liu, Zhi Tian, Xiang Chen","doi":"10.23919/DATE48585.2020.9116429","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116429","url":null,"abstract":"Recently Convolutional Neural Networks (CNNs) are widely applied into novel intelligent applications and systems. However, the CNN computation performance is significantly hindered by its computation flow, which computes the model structure sequentially by layers with massive convolution operations. Such a layer-wise sequential computation flow can cause certain performance issues, such as resource under-utilization, huge memory overhead, etc. To solve these problems, we propose a novel CNN structural decoupling method, which could decouple CNN models into \"critical paths\" and eliminate the inter-layer data dependency. Based on this method, we redefine the CNN computation flow into parallel and cascade computing paradigms, which can significantly enhance the CNN computation performance with both multi-core and single-core CPU processors. Experiments show that, our DC-CNN framework could reduce 24% to 33% latency on multi-core CPUs for CIFAR and ImageNet. On small-capacity mobile platforms, cascade computing could reduce the latency by average 24% on ImageNet and 42% on CIFAR10. Meanwhile, the memory reduction could also reach average 21% and 64%, respectively.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Heat-Recirculation-Aware VM Placement Strategy for Data Centers 数据中心热再循环感知VM放置策略
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116356
Hao Feng, Yuhui Deng, Yi Zhou
Data centers consisted of a great number of IT devices (e.g., servers, switches and etc.) which generates a massive amount of heat emission. Due to the special arrangement of racks in the data center, heat-recirculation often occurs between nodes. It can cause a sharp rise in temperature of the equipment coupled with local hot spots in data centers. Existing VM placement strategies can minimize energy consumption of data centers by optimizing resource allocation in terms of multiple physical resources (e.g., memory, bandwidth, cpu and etc.). However, existing strategies ignore the role of heat-recirculation in the data center. To address this problem, in this study, we propose a heat-recirculation-aware VM placement strategy and design a Simulated Annealing Based Algorithm (SABA) to lower the energy consumption of data centers. Different from the existing SA algorithm, SABA optimize the distribution of the initial solution and the way of iteration. We quantitatively evaluate SABA’s performance in terms of algorithm efficiency, the activated servers and the energy saving against with XINT-GA algorithm (Thermal-aware task scheduling Strategy), FCFS (First-Come First-Served), and SA. Experimental results indicate that our heat-recirculation-aware VM placement strategy provides a powerful solution for improving energy efficiency of data centers.
数据中心由大量的IT设备(如服务器、交换机等)组成,这些设备会产生大量的热量。由于数据中心机架的特殊布置,节点之间经常发生热循环。它可能导致设备温度急剧上升,再加上数据中心的局部热点。现有的虚拟机放置策略可以通过优化多个物理资源(如内存、带宽、cpu等)的资源分配来最小化数据中心的能源消耗。然而,现有的策略忽略了数据中心热循环的作用。为了解决这一问题,本研究提出了一种热循环感知的VM放置策略,并设计了一种基于模拟退火的算法(SABA)来降低数据中心的能耗。与现有的SA算法不同,SABA算法优化了初始解的分布和迭代方式。我们从算法效率、激活服务器数量和节能三个方面定量评价了SABA算法与XINT-GA算法(热感知任务调度策略)、FCFS算法(先到先得)和SA算法的性能。实验结果表明,我们的热循环感知VM放置策略为提高数据中心的能源效率提供了强有力的解决方案。
{"title":"A Heat-Recirculation-Aware VM Placement Strategy for Data Centers","authors":"Hao Feng, Yuhui Deng, Yi Zhou","doi":"10.23919/DATE48585.2020.9116356","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116356","url":null,"abstract":"Data centers consisted of a great number of IT devices (e.g., servers, switches and etc.) which generates a massive amount of heat emission. Due to the special arrangement of racks in the data center, heat-recirculation often occurs between nodes. It can cause a sharp rise in temperature of the equipment coupled with local hot spots in data centers. Existing VM placement strategies can minimize energy consumption of data centers by optimizing resource allocation in terms of multiple physical resources (e.g., memory, bandwidth, cpu and etc.). However, existing strategies ignore the role of heat-recirculation in the data center. To address this problem, in this study, we propose a heat-recirculation-aware VM placement strategy and design a Simulated Annealing Based Algorithm (SABA) to lower the energy consumption of data centers. Different from the existing SA algorithm, SABA optimize the distribution of the initial solution and the way of iteration. We quantitatively evaluate SABA’s performance in terms of algorithm efficiency, the activated servers and the energy saving against with XINT-GA algorithm (Thermal-aware task scheduling Strategy), FCFS (First-Come First-Served), and SA. Experimental results indicate that our heat-recirculation-aware VM placement strategy provides a powerful solution for improving energy efficiency of data centers.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125403818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Go Unary: A Novel Synapse Coding and Mapping Scheme for Reliable ReRAM-based Neuromorphic Computing 一种新的突触编码和映射方案,用于可靠的基于reram的神经形态计算
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116555
Chang Ma, Yanan Sun, Weikang Qian, Ziqi Meng, Rui Yang, Li Jiang
Neural network (NN) computing contains a large number of multiply-and-accumulate (MAC) operations, which is the speed bottleneck in traditional von Neumann architecture. Resistive random access memory (ReRAM)-based crossbar is well suited for matrix-vector multiplication. Existing ReRAM-based NNs are mainly based on the binary coding for synaptic weights. However, the imperfect fabrication process combined with stochastic filament-based switching leads to resistance variations, which can significantly affect the weights in binary synapses and degrade the accuracy of NNs. Further, as multi-level cells (MLCs) are being developed for reducing hardware overhead, the NN accuracy deteriorates more due to the resistance variations in the binary coding. In this paper, a novel unary coding of synaptic weights is presented to overcome the resistance variations of MLCs and achieve reliable ReRAM-based neuromorphic computing. The priority mapping is also proposed in compliance with the unary coding to guarantee high accuracy by mapping those bits with lower resistance states to ReRAMs with smaller resistance variations. Our experimental results show that the proposed method provides less than 0.45% and 5.48% accuracy loss on LeNet (on MNIST dataset) and VGG16 (on CIFAR-10 dataset), respectively, with acceptable hardware cost.
神经网络(NN)计算中包含大量的乘法累加(MAC)运算,这是传统冯诺依曼体系结构的速度瓶颈。基于电阻随机存取存储器(ReRAM)的交叉棒非常适合于矩阵-向量乘法。现有的基于reram的神经网络主要是基于突触权值的二进制编码。然而,由于制作工艺的不完善,加上基于丝的随机开关,导致电阻变化,这将显著影响二元突触的权重,降低神经网络的精度。此外,由于为了减少硬件开销而开发多级单元(MLCs),由于二进制编码中的阻力变化,神经网络的精度会进一步恶化。本文提出了一种新的突触权值一元编码方法,克服了MLCs的阻力变化,实现了可靠的基于reram的神经形态计算。为了保证较高的精度,还提出了符合一元编码的优先级映射,将电阻状态较低的位映射到电阻变化较小的reram上。实验结果表明,该方法在LeNet (MNIST数据集)和VGG16 (CIFAR-10数据集)上的准确率损失分别小于0.45%和5.48%,硬件成本可接受。
{"title":"Go Unary: A Novel Synapse Coding and Mapping Scheme for Reliable ReRAM-based Neuromorphic Computing","authors":"Chang Ma, Yanan Sun, Weikang Qian, Ziqi Meng, Rui Yang, Li Jiang","doi":"10.23919/DATE48585.2020.9116555","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116555","url":null,"abstract":"Neural network (NN) computing contains a large number of multiply-and-accumulate (MAC) operations, which is the speed bottleneck in traditional von Neumann architecture. Resistive random access memory (ReRAM)-based crossbar is well suited for matrix-vector multiplication. Existing ReRAM-based NNs are mainly based on the binary coding for synaptic weights. However, the imperfect fabrication process combined with stochastic filament-based switching leads to resistance variations, which can significantly affect the weights in binary synapses and degrade the accuracy of NNs. Further, as multi-level cells (MLCs) are being developed for reducing hardware overhead, the NN accuracy deteriorates more due to the resistance variations in the binary coding. In this paper, a novel unary coding of synaptic weights is presented to overcome the resistance variations of MLCs and achieve reliable ReRAM-based neuromorphic computing. The priority mapping is also proposed in compliance with the unary coding to guarantee high accuracy by mapping those bits with lower resistance states to ReRAMs with smaller resistance variations. Our experimental results show that the proposed method provides less than 0.45% and 5.48% accuracy loss on LeNet (on MNIST dataset) and VGG16 (on CIFAR-10 dataset), respectively, with acceptable hardware cost.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126068511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Delay Sensitivity Polynomials Based Design- Dependent Performance Monitors for Wide Operating Ranges 基于延迟灵敏度多项式的设计相关宽工作范围性能监视器
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116243
Rui-xin Shi, Liang Yang, Hao Wang
The downsizing of CMOS technology makes circuit performance more sensitive to on-chip parameter variations. Previous proposed design-dependent ring oscillator (DDRO) method provides an efficient way to monitor circuit performance at runtime. However, the linear delay sensitivity expression may be inadequate, especially in a wide range of operating conditions. To overcome it, a new design-dependent performance monitor (DDPM) method is proposed in this work, which formulates the delay sensitivity as high-order polynomials, makes it possible to accurately track the nonlinear timing behavior for wide operating ranges. A 28nm technology is used for design evaluation, and quite a low error rate is achieved in circuit performance monitoring comparison.
CMOS技术的小型化使得电路性能对片上参数的变化更加敏感。先前提出的设计相关环形振荡器(DDRO)方法提供了一种在运行时监测电路性能的有效方法。然而,线性延迟灵敏度表达式可能是不够的,特别是在大范围的工作条件下。为了克服这一问题,本文提出了一种新的基于设计的性能监测(DDPM)方法,该方法将延迟灵敏度表述为高阶多项式,使得在大工作范围内精确跟踪非线性时序行为成为可能。设计评估采用28nm工艺,电路性能监测比较错误率较低。
{"title":"Delay Sensitivity Polynomials Based Design- Dependent Performance Monitors for Wide Operating Ranges","authors":"Rui-xin Shi, Liang Yang, Hao Wang","doi":"10.23919/DATE48585.2020.9116243","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116243","url":null,"abstract":"The downsizing of CMOS technology makes circuit performance more sensitive to on-chip parameter variations. Previous proposed design-dependent ring oscillator (DDRO) method provides an efficient way to monitor circuit performance at runtime. However, the linear delay sensitivity expression may be inadequate, especially in a wide range of operating conditions. To overcome it, a new design-dependent performance monitor (DDPM) method is proposed in this work, which formulates the delay sensitivity as high-order polynomials, makes it possible to accurately track the nonlinear timing behavior for wide operating ranges. A 28nm technology is used for design evaluation, and quite a low error rate is achieved in circuit performance monitoring comparison.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120886634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1