首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design 对扭曲调度器友好的STT-RAM/SRAM混合GPGPU寄存器文件设计
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203850
Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang
Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.
现代图形处理单元(gpu)广泛采用基于SRAM的大寄存器文件(RF)来实现快速上下文切换。一个大的SRAM RF可能会消耗20%到40%的GPU功率,这已经成为GPU的主要设计挑战之一。最近的研究通过混合射频设计缓解了这个问题,该设计构建了一个大型STT-RAM(自旋传递扭矩磁存储器)射频和一个小型SRAM缓冲区。然而,STT-RAM的长写延迟限制了STT-RAM和SRAM之间的数据交换,这不利于使用频繁上下文切换的warp调度器,例如轮询调度器。在本文中,我们提出了HC-RF,一种基于SRAM/STT-RAM混合单元(HC)结构的扭曲调度友好型混合RF设计。HC-RF利用小区级集成来提高STT-RAM和SRAM之间的有效带宽。通过在不阻塞射频库的情况下实现从SRAM到STT-RAM的静默数据传输,HC-RF支持并发上下文切换,并解耦了对warp调程程序的依赖。实验结果表明,采用LRR(Loose Round Robin) warp scheduler时,HC-RF比粗粒度混合设计平均性能提高50%,能耗降低44%。
{"title":"Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design","authors":"Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang","doi":"10.1109/ICCAD.2017.8203850","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203850","url":null,"abstract":"Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Dedicated synthesis for MZI-based optical circuits based on AND-inverter graphs 基于and -逆变图的mzi光电路专用合成
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203783
Arighna Deb, R. Wille, R. Drechsler
Optical circuits received significant interest as a promising alternative to existing electronic systems. Because of this, also the synthesis of optical circuits receives increasing attention. However, initial solutions for the synthesis of optical circuits either rely on manual design or rather straight-forward mappings from established data-structures such as BDDs, SoPs/ESoPs, etc. to the corresponding optical netlist. These approaches hardly utilize the full potential of the gate libraries available in this domain. In this paper, we propose an alternative synthesis solution based on AND-Inverter Graphs (AIGs) which is capable of utilizing this potential. That is, a scheme is presented which dedicatedly maps the given function representation to the desired circuit in a one-to-one fashion — yielding significantly smaller circuit sizes. Experimental evaluations confirm that the proposed solution generates optical circuits with up to 97% less number of gates as compared to existing synthesis approaches.
光学电路作为现有电子系统的一种有希望的替代方案受到了极大的兴趣。正因为如此,光电路的合成也越来越受到重视。然而,光电路合成的初始解决方案要么依赖于手工设计,要么依赖于从已建立的数据结构(如bdd、sop /ESoPs等)到相应的光网络表的直接映射。这些方法几乎没有充分利用这个领域中可用的gate库的潜力。在本文中,我们提出了一种基于and -逆变器图(AIGs)的替代综合解决方案,该方案能够利用这种潜力。也就是说,提出了一种方案,专门将给定的函数表示以一对一的方式映射到所需的电路-产生显着更小的电路尺寸。实验评估证实,与现有的合成方法相比,所提出的解决方案产生的光电路的门数减少了97%。
{"title":"Dedicated synthesis for MZI-based optical circuits based on AND-inverter graphs","authors":"Arighna Deb, R. Wille, R. Drechsler","doi":"10.1109/ICCAD.2017.8203783","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203783","url":null,"abstract":"Optical circuits received significant interest as a promising alternative to existing electronic systems. Because of this, also the synthesis of optical circuits receives increasing attention. However, initial solutions for the synthesis of optical circuits either rely on manual design or rather straight-forward mappings from established data-structures such as BDDs, SoPs/ESoPs, etc. to the corresponding optical netlist. These approaches hardly utilize the full potential of the gate libraries available in this domain. In this paper, we propose an alternative synthesis solution based on AND-Inverter Graphs (AIGs) which is capable of utilizing this potential. That is, a scheme is presented which dedicatedly maps the given function representation to the desired circuit in a one-to-one fashion — yielding significantly smaller circuit sizes. Experimental evaluations confirm that the proposed solution generates optical circuits with up to 97% less number of gates as compared to existing synthesis approaches.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134429887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Leveraging value locality for efficient design of a hybrid cache in multicore processors 利用值局部性来有效地设计多核处理器中的混合缓存
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203753
M. Arjomand, A. Jadidi, M. Kandemir, C. Das
Owing to negligible leakage current, high density and superior scalability, Spin-Transfer Torque RAM (STT-RAM) technology becomes one of the promising candidates for low power and high capacity on-chip caches in multicore systems. While STT-RAM read access latency is comparable to that of SRAM, write operations in STT-RAM are more challenging: writes are slow, consume a large energy, and the lifetime of STT-RAM is limited by the number of write operations to each cell. To overcome these challenges in STT-RAM caches, this paper explores the potential of eliminating redundant writes using the phenomenon of frequent value locality (FVL). According to FLV, few distinct values appear in a large fraction of memory transactions, with emphasis on cache memories in this work. By leveraging frequent value locality, we propose a novel value-based hybrid (STT-RAM +, SRAM) cache that has benefits of both SRAM and STT-RAM technologies — i.e., it is high-performance, power-efficient, and scalable. Our evaluation results for a 8-core chip-multiprocessor with 6MB last-level cache show that our proposed design is able to reduce power consumption of a STT-RAM cache by up to 90% (an average of 82%), enhances its lifetime by up to 52% (29% on average), and improves the system performance by up 30% (11% on average), for a wide range of multi-threaded and multi-program workloads.
由于泄漏电流可忽略,高密度和优越的可扩展性,自旋转移扭矩RAM (STT-RAM)技术成为多核系统中低功耗、高容量片上缓存的有希望的候选技术之一。虽然STT-RAM的读访问延迟与SRAM相当,但STT-RAM中的写操作更具挑战性:写操作缓慢,消耗大量能量,并且STT-RAM的生命周期受到每个单元的写操作数量的限制。为了克服STT-RAM缓存中的这些挑战,本文探讨了使用频繁值局部性(FVL)现象消除冗余写的潜力。根据FLV,很少有不同的值出现在大部分内存事务中,在这项工作中强调缓存内存。通过利用频繁的值局域性,我们提出了一种新的基于值的混合(STT-RAM +, SRAM)缓存,它具有SRAM和STT-RAM技术的优点-即,它是高性能,节能和可扩展的。我们对具有6MB最后一级缓存的8核芯片多处理器的评估结果表明,我们提出的设计能够将STT-RAM缓存的功耗降低高达90%(平均82%),将其寿命提高高达52%(平均29%),并将系统性能提高高达30%(平均11%),适用于广泛的多线程和多程序工作负载。
{"title":"Leveraging value locality for efficient design of a hybrid cache in multicore processors","authors":"M. Arjomand, A. Jadidi, M. Kandemir, C. Das","doi":"10.1109/ICCAD.2017.8203753","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203753","url":null,"abstract":"Owing to negligible leakage current, high density and superior scalability, Spin-Transfer Torque RAM (STT-RAM) technology becomes one of the promising candidates for low power and high capacity on-chip caches in multicore systems. While STT-RAM read access latency is comparable to that of SRAM, write operations in STT-RAM are more challenging: writes are slow, consume a large energy, and the lifetime of STT-RAM is limited by the number of write operations to each cell. To overcome these challenges in STT-RAM caches, this paper explores the potential of eliminating redundant writes using the phenomenon of frequent value locality (FVL). According to FLV, few distinct values appear in a large fraction of memory transactions, with emphasis on cache memories in this work. By leveraging frequent value locality, we propose a novel value-based hybrid (STT-RAM +, SRAM) cache that has benefits of both SRAM and STT-RAM technologies — i.e., it is high-performance, power-efficient, and scalable. Our evaluation results for a 8-core chip-multiprocessor with 6MB last-level cache show that our proposed design is able to reduce power consumption of a STT-RAM cache by up to 90% (an average of 82%), enhances its lifetime by up to 52% (29% on average), and improves the system performance by up 30% (11% on average), for a wide range of multi-threaded and multi-program workloads.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128086256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ORCHARD: Visual object recognition accelerator based on approximate in-memory processing ORCHARD:基于近似内存处理的视觉对象识别加速器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203756
Yeseong Kim, M. Imani, T. Simunic
In recent years, machine learning for visual object recognition has been applied to various domains, e.g., autonomous vehicle, heath diagnose, and home automation. However, the recognition procedures still consume a lot of processing energy and incur a high cost of data movement for memory accesses. In this paper, we propose a novel hardware accelerator design, called ORCHARD, which processes the object recognition tasks inside memory. The proposed design accelerates both the image feature extraction and boosting-based learning algorithm, which are key subtasks of the state-of-the-art image recognition approaches. We optimize the recognition procedures by leveraging approximate computing and emerging non-volatile memory (NVM) technology. The NVM-based in-memory processing allows the proposed design to mitigate the CMOS-based computation overhead, highly improving the system efficiency. In our evaluation conducted on circuit- and device-level simulations, we show that ORCHARD successfully performs practical image recognition tasks, including text, face, pedestrian, and vehicle recognition with 0.3% of accuracy loss made by computation approximation. In addition, our design significantly improves the performance and energy efficiency by up to 376x and 1896x, respectively, compared to the existing processor-based implementation.
近年来,机器学习在视觉对象识别方面的应用已广泛应用于自动驾驶汽车、健康诊断和家庭自动化等领域。然而,识别过程仍然消耗大量的处理能量,并且产生较高的内存访问数据移动成本。在本文中,我们提出了一种新的硬件加速器设计,称为ORCHARD,它在内存中处理目标识别任务。提出的设计加速了图像特征提取和基于增强的学习算法,这是最先进的图像识别方法的关键子任务。我们通过利用近似计算和新兴的非易失性存储器(NVM)技术来优化识别过程。基于nvm的内存处理允许所提出的设计减少基于cmos的计算开销,极大地提高了系统效率。在我们对电路和设备级模拟进行的评估中,我们表明ORCHARD成功地执行了实际的图像识别任务,包括文本、人脸、行人和车辆识别,通过计算近似获得的精度损失为0.3%。此外,与现有的基于处理器的实现相比,我们的设计显著提高了性能和能源效率,分别提高了376x和1896x。
{"title":"ORCHARD: Visual object recognition accelerator based on approximate in-memory processing","authors":"Yeseong Kim, M. Imani, T. Simunic","doi":"10.1109/ICCAD.2017.8203756","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203756","url":null,"abstract":"In recent years, machine learning for visual object recognition has been applied to various domains, e.g., autonomous vehicle, heath diagnose, and home automation. However, the recognition procedures still consume a lot of processing energy and incur a high cost of data movement for memory accesses. In this paper, we propose a novel hardware accelerator design, called ORCHARD, which processes the object recognition tasks inside memory. The proposed design accelerates both the image feature extraction and boosting-based learning algorithm, which are key subtasks of the state-of-the-art image recognition approaches. We optimize the recognition procedures by leveraging approximate computing and emerging non-volatile memory (NVM) technology. The NVM-based in-memory processing allows the proposed design to mitigate the CMOS-based computation overhead, highly improving the system efficiency. In our evaluation conducted on circuit- and device-level simulations, we show that ORCHARD successfully performs practical image recognition tasks, including text, face, pedestrian, and vehicle recognition with 0.3% of accuracy loss made by computation approximation. In addition, our design significantly improves the performance and energy efficiency by up to 376x and 1896x, respectively, compared to the existing processor-based implementation.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
ApproxLUT: A novel approximate lookup table-based accelerator ApproxLUT:一种新颖的基于近似查找表的加速器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203810
Ye Tian, Ting Wang, Qian Zhang, Q. Xu
Computing with memory, which stores function responses of some input patterns into lookup tables offline and retrieves their values when encountering similar patterns (instead of performing online calculation), is a promising energy-efficient computing technique. No doubt to say, with a given lookup table size, the efficiency of this technique depends on which function responses are stored and how they are organized. In this paper, we propose a novel adaptive approximate lookup table based accelerator, wherein we store function responses in a hierarchical manner with increasing fine-grained granularity and accuracy. In addition, the proposed accelerator provides lightweight compensation on output results at different precision levels according to input patterns and output quality requirements. Moreover, our accelerator conducts adaptive lookup table search by exploiting input locality. Experimental results on various computation kernels show significant energy savings of the proposed accelerator over prior solutions.
内存计算是一种很有前途的节能计算技术,它将一些输入模式的函数响应离线存储到查找表中,并在遇到类似模式时检索它们的值(而不是执行在线计算)。毫无疑问,对于给定的查找表大小,这种技术的效率取决于存储哪些函数响应以及它们是如何组织的。在本文中,我们提出了一种新的基于自适应近似查找表的加速器,其中我们以分层的方式存储函数响应,增加了细粒度的粒度和精度。此外,所提出的加速器根据输入模式和输出质量要求,对不同精度级别的输出结果提供轻量级补偿。此外,我们的加速器通过利用输入局部性进行自适应查找表搜索。在不同计算核上的实验结果表明,所提出的加速器比先前的解决方案节能显著。
{"title":"ApproxLUT: A novel approximate lookup table-based accelerator","authors":"Ye Tian, Ting Wang, Qian Zhang, Q. Xu","doi":"10.1109/ICCAD.2017.8203810","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203810","url":null,"abstract":"Computing with memory, which stores function responses of some input patterns into lookup tables offline and retrieves their values when encountering similar patterns (instead of performing online calculation), is a promising energy-efficient computing technique. No doubt to say, with a given lookup table size, the efficiency of this technique depends on which function responses are stored and how they are organized. In this paper, we propose a novel adaptive approximate lookup table based accelerator, wherein we store function responses in a hierarchical manner with increasing fine-grained granularity and accuracy. In addition, the proposed accelerator provides lightweight compensation on output results at different precision levels according to input patterns and output quality requirements. Moreover, our accelerator conducts adaptive lookup table search by exploiting input locality. Experimental results on various computation kernels show significant energy savings of the proposed accelerator over prior solutions.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124524554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Approximate image storage with multi-level cell STT-MRAM main memory 近似图像存储与多级单元STT-MRAM主存储器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203788
Hengyu Zhao, Linuo Xue, Ping Chi, Jishen Zhao
Images consume significant storage and space in both consumer devices and in the cloud. As such, image processing applications impose high energy consumption in loading and accessing the image data in the memory. Fortunately, most image processing applications can tolerate approximate image data storage. In addition, multi-level cell spin-transfer torque MRAM (STT-MRAM) offers unique design opportunities as the image memory: the two bits in the memory cell require asymmetric write current — the soft bit requires much less write current than the hard bit. This paper proposes an approximate image processing scheme that improves system energy efficiency without upsetting image quality requirement of applications. Our design consists of (i) an approximate image storage mechanism that strives to only write the soft bits in MLC STT-MRAM main memory with small write current and (ii) a memory mode controller that determines the approximation of image data and coordinates across precise/approximate memory access modes. Our experimental results with various image processing functionalities demonstrate that our design reduces memory access energy consumption by 53% and 2.3 x with 100% user's satisfaction compared with traditional DRAM-based and MLC phase-change-memory-based main memory, respectively.
无论是在消费者设备还是在云中,映像都会消耗大量的存储和空间。因此,图像处理应用程序在加载和访问内存中的图像数据时施加了高能耗。幸运的是,大多数图像处理应用程序可以容忍近似的图像数据存储。此外,多级单元自旋传递扭矩MRAM (STT-MRAM)作为图像存储器提供了独特的设计机会:存储单元中的两个比特需要不对称的写入电流-软比特比硬比特需要更少的写入电流。本文提出了一种近似的图像处理方案,在不影响应用对图像质量要求的前提下,提高了系统的能量效率。我们的设计包括(i)一个近似的图像存储机制,努力只在MLC STT-MRAM主存储器中写入软位,写入电流小;(ii)一个存储模式控制器,确定图像数据的近似和精确/近似存储访问模式之间的坐标。我们的实验结果表明,与传统的基于dram和基于MLC相变存储器的主存储器相比,我们的设计将存储器访问能耗分别降低了53%和2.3倍,用户满意度为100%。
{"title":"Approximate image storage with multi-level cell STT-MRAM main memory","authors":"Hengyu Zhao, Linuo Xue, Ping Chi, Jishen Zhao","doi":"10.1109/ICCAD.2017.8203788","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203788","url":null,"abstract":"Images consume significant storage and space in both consumer devices and in the cloud. As such, image processing applications impose high energy consumption in loading and accessing the image data in the memory. Fortunately, most image processing applications can tolerate approximate image data storage. In addition, multi-level cell spin-transfer torque MRAM (STT-MRAM) offers unique design opportunities as the image memory: the two bits in the memory cell require asymmetric write current — the soft bit requires much less write current than the hard bit. This paper proposes an approximate image processing scheme that improves system energy efficiency without upsetting image quality requirement of applications. Our design consists of (i) an approximate image storage mechanism that strives to only write the soft bits in MLC STT-MRAM main memory with small write current and (ii) a memory mode controller that determines the approximation of image data and coordinates across precise/approximate memory access modes. Our experimental results with various image processing functionalities demonstrate that our design reduces memory access energy consumption by 53% and 2.3 x with 100% user's satisfaction compared with traditional DRAM-based and MLC phase-change-memory-based main memory, respectively.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Sequential engineering change order under retiming and resynthesis 再定时和再合成下的顺序工程变更顺序
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203767
Nian-Ze Lee, Victor N. Kravets, J. H. Jiang
Engineering change order (ECO) is pivotal in rectifying late design changes that occur commonly due to ever-increasing system complexity. Existing functional ECO methods focus on combinational equivalence assuming a known input correspondence between the old implementation and new specification. They are inadequate for rectifying circuits under sequential transformations. This inadequacy hinders the utilization of powerful and effective sequential optimization methods using retiming and resynthesis. As retiming and/or resynthesis gains increasing adoption in industry, incorporating sequential ECO techniques into the hardware design flow becomes essential. In this paper, we provide the first attempt to extend ECO to designs under retiming and resynthesis in an industrial flow by leveraging conventional combinational ECO engine. Experimental results over industrial ECO benchmarks justify the promising practicality of our methods.
工程变更顺序(ECO)对于纠正由于系统复杂性不断增加而经常发生的后期设计变更至关重要。现有的功能性ECO方法侧重于组合等效,假设旧实现和新规范之间存在已知的输入对应关系。它们不适用于顺序变换下的整流电路。这种不足阻碍了使用重定时和重合成的强大而有效的顺序优化方法的使用。随着重新定时和/或重新合成在工业中越来越多地采用,将顺序ECO技术纳入硬件设计流程变得至关重要。在本文中,我们首次尝试将ECO扩展到工业流程中再定时和再合成的设计中,利用传统的组合ECO引擎。在工业ECO基准上的实验结果证明了我们的方法具有良好的实用性。
{"title":"Sequential engineering change order under retiming and resynthesis","authors":"Nian-Ze Lee, Victor N. Kravets, J. H. Jiang","doi":"10.1109/ICCAD.2017.8203767","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203767","url":null,"abstract":"Engineering change order (ECO) is pivotal in rectifying late design changes that occur commonly due to ever-increasing system complexity. Existing functional ECO methods focus on combinational equivalence assuming a known input correspondence between the old implementation and new specification. They are inadequate for rectifying circuits under sequential transformations. This inadequacy hinders the utilization of powerful and effective sequential optimization methods using retiming and resynthesis. As retiming and/or resynthesis gains increasing adoption in industry, incorporating sequential ECO techniques into the hardware design flow becomes essential. In this paper, we provide the first attempt to extend ECO to designs under retiming and resynthesis in an industrial flow by leveraging conventional combinational ECO engine. Experimental results over industrial ECO benchmarks justify the promising practicality of our methods.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
State retention for power gated design with non-uniform multi-bit retention latches 非均匀多位保持锁存器的电源门控设计的状态保持
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203833
Guo-Gin Fan, Mark Po-Hung Lin
Retention registers/latches are commonly applied to power-gated circuits for state retention during the sleep mode. Recent studies have shown that applying uniform multi-bit retention registers (MBRRs) can reduce the storage size, and hence save more chip area and leakage power compared with single-bit retention registers. In this paper, a new problem formulation of power-gated circuit optimization with nonuniform MBRRs is studied for achieving even more storage saving and higher storage utilization. An ILP-based approach is proposed to effectively explore different combinations of nonuniform MBRR replacement. Experiment results show that the proposed approach can reduce 36% storage size, compared with the state-of-the-art uniform MBRR replacement, while achieving 100% storage utilization.
保持寄存器/锁存器通常应用于电源门控电路中,用于睡眠模式期间的状态保持。近年来的研究表明,采用统一的多比特保持寄存器(mbrr)可以减小存储尺寸,从而比单比特保持寄存器节省更多的芯片面积和泄漏功率。为了实现更大的存储节省和更高的存储利用率,本文研究了一种具有非均匀mbrr的功率门控电路优化问题的新公式。提出了一种基于ilp的方法来有效地探索非均匀MBRR替换的不同组合。实验结果表明,与目前最先进的均匀MBRR替换方法相比,该方法可以减少36%的存储容量,同时实现100%的存储利用率。
{"title":"State retention for power gated design with non-uniform multi-bit retention latches","authors":"Guo-Gin Fan, Mark Po-Hung Lin","doi":"10.1109/ICCAD.2017.8203833","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203833","url":null,"abstract":"Retention registers/latches are commonly applied to power-gated circuits for state retention during the sleep mode. Recent studies have shown that applying uniform multi-bit retention registers (MBRRs) can reduce the storage size, and hence save more chip area and leakage power compared with single-bit retention registers. In this paper, a new problem formulation of power-gated circuit optimization with nonuniform MBRRs is studied for achieving even more storage saving and higher storage utilization. An ILP-based approach is proposed to effectively explore different combinations of nonuniform MBRR replacement. Experiment results show that the proposed approach can reduce 36% storage size, compared with the state-of-the-art uniform MBRR replacement, while achieving 100% storage utilization.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121379501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Early SoC security validation by VP-based static information flow analysis 早期SoC安全性验证,基于vp的静态信息流分析
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203805
Muhammad Hassan, V. Herdt, H. M. Le, Daniel Große, R. Drechsler
Security is one of the most burning issues in embedded system design nowadays. The majority of strategies to secure embedded systems are being implemented in software. However, a potential hardware backdoor that allows unprivileged software access to confidential data will render even the perfectly secure software useless. As the underlying SoC cannot be patched after deployment, it is very critical to detect and correct SoC hardware security issues in the design phase. To prevent costly fixes in later stages, security validation should start as early as possible. In this paper, we propose a novel approach to SoC security validation at the system level using Virtual Prototypes (VP). At the heart of the approach is a scalable static information flow analysis that can detect potential security breaches such as data leakage and untrusted access; confidentiality and integrity issues, respectively. We demonstrate the applicability of the approach on real-world VPs.
安全性是当今嵌入式系统设计中最亟待解决的问题之一。大多数保护嵌入式系统的策略都是在软件中实现的。然而,一个潜在的允许非授权软件访问机密数据的硬件后门将使即使是非常安全的软件也变得无用。由于底层SoC在部署后无法修补,因此在设计阶段检测和纠正SoC硬件安全问题非常关键。为了防止在后期进行代价高昂的修复,安全验证应该尽可能早地开始。在本文中,我们提出了一种使用虚拟原型(VP)在系统级进行SoC安全验证的新方法。该方法的核心是可扩展的静态信息流分析,可以检测潜在的安全漏洞,如数据泄漏和不可信访问;保密性和完整性问题。我们证明了该方法在实际vp中的适用性。
{"title":"Early SoC security validation by VP-based static information flow analysis","authors":"Muhammad Hassan, V. Herdt, H. M. Le, Daniel Große, R. Drechsler","doi":"10.1109/ICCAD.2017.8203805","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203805","url":null,"abstract":"Security is one of the most burning issues in embedded system design nowadays. The majority of strategies to secure embedded systems are being implemented in software. However, a potential hardware backdoor that allows unprivileged software access to confidential data will render even the perfectly secure software useless. As the underlying SoC cannot be patched after deployment, it is very critical to detect and correct SoC hardware security issues in the design phase. To prevent costly fixes in later stages, security validation should start as early as possible. In this paper, we propose a novel approach to SoC security validation at the system level using Virtual Prototypes (VP). At the heart of the approach is a scalable static information flow analysis that can detect potential security breaches such as data leakage and untrusted access; confidentiality and integrity issues, respectively. We demonstrate the applicability of the approach on real-world VPs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115443954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
AEP: An error-bearing neural network accelerator for energy efficiency and model protection 用于能源效率和模型保护的容错神经网络加速器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203854
Lei Zhao, Youtao Zhang, Jun Yang
Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.
神经网络由于其优越的推理精度,近年来在现代应用领域得到了广泛的应用。随着问题规模和复杂性的不断增长,现代的神经网络,如cnn(卷积神经网络)和dnn(深度神经网络),包含了大量的权重,这不仅需要准备有代表性的训练数据集,而且需要付出巨大的努力来训练网络。神经网络权矩阵作为神经网络领域的一项新兴知识产权,其保护需求日益增长。不幸的是,采用传统的加密方法面临着显著的性能和能耗开销。本文提出了一种基于DianNao的知识产权保护神经网络加速器AEP设计。AEP积极降低DRAM时序,以生成一个设备相关的错误掩码,即一组错误单元,而这些单元的分布由于工艺变化而与设备相关。AEP在神经网络训练过程中引入了错误掩码,使得训练出的权值与设备相关,有效地防止了IP盗版,因为将权值导出到其他设备无法产生令人满意的推理精度。此外,AEP加速了神经网络的推理速度,并且由于主存在DianNao加速器中占据了主要的能量消耗,从而实现了显著的能量降低。我们的评估结果表明,通过注入0.1%到5%的内存错误,AEP在目标设备上的推理精度损失可以忽略不计,而在其他设备上则表现出不可接受的精度下降。此外,AEP实现了平均72%的性能提高和44%的能源减少,比电老基线。
{"title":"AEP: An error-bearing neural network accelerator for energy efficiency and model protection","authors":"Lei Zhao, Youtao Zhang, Jun Yang","doi":"10.1109/ICCAD.2017.8203854","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203854","url":null,"abstract":"Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1