首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
Adaptive error recovery in MEDA biochips based on droplet-aliquot operations and predictive analysis 基于微滴等分操作和预测分析的MEDA生物芯片自适应误差恢复
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203834
Zhanwei Zhong, Zipeng Li, K. Chakrabarty
Digital microfluidic biochips (DMFBs) are being increasingly used in biochemistry labs for automating bioassays. However, traditional DMFBs suffer from some key shortcomings: 1) inability to vary droplet volume in a flexible manner; 2) difficulty of integrating on-chip sensors; 3) the need for special fabrication processes. To overcome these problems, DMFBs based on micro-electrode-dot-array (MEDA) have recently be-en proposed. However, errors are likely to occur on a MEDA DMFB due to chip defects and the unpredictability inherent to biochemical experiments. We present fine-grained error-recovery solutions for MEDA by exploiting real-time sensing and advanced MEDA-specific droplet operations. The proposed methods rely on adaptive droplet-aliquot operations and predictive analysis of mixing. Experimental results on three representative benchmarks demonstrate the efficiency of the proposed error-recovery strategy.
数字微流控生物芯片(dmfb)越来越多地用于生物化学实验室的自动化生物分析。然而,传统的dmfb存在一些关键缺点:1)不能灵活地改变液滴体积;2)片上传感器集成困难;3)需要特殊的制造工艺。为了克服这些问题,最近提出了基于微电极点阵列(MEDA)的dmfb。然而,由于芯片缺陷和生化实验固有的不可预测性,在MEDA DMFB上可能会发生错误。通过利用实时传感和先进的MEDA特异性液滴操作,我们提出了精细的MEDA错误恢复解决方案。所提出的方法依赖于自适应液滴等分操作和混合的预测分析。在三个有代表性的基准测试上的实验结果证明了所提出的错误恢复策略的有效性。
{"title":"Adaptive error recovery in MEDA biochips based on droplet-aliquot operations and predictive analysis","authors":"Zhanwei Zhong, Zipeng Li, K. Chakrabarty","doi":"10.1109/ICCAD.2017.8203834","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203834","url":null,"abstract":"Digital microfluidic biochips (DMFBs) are being increasingly used in biochemistry labs for automating bioassays. However, traditional DMFBs suffer from some key shortcomings: 1) inability to vary droplet volume in a flexible manner; 2) difficulty of integrating on-chip sensors; 3) the need for special fabrication processes. To overcome these problems, DMFBs based on micro-electrode-dot-array (MEDA) have recently be-en proposed. However, errors are likely to occur on a MEDA DMFB due to chip defects and the unpredictability inherent to biochemical experiments. We present fine-grained error-recovery solutions for MEDA by exploiting real-time sensing and advanced MEDA-specific droplet operations. The proposed methods rely on adaptive droplet-aliquot operations and predictive analysis of mixing. Experimental results on three representative benchmarks demonstrate the efficiency of the proposed error-recovery strategy.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134245260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardware 基于忆阻器的感知器分类器:增加复杂性和应对不完善的硬件
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203825
F. Merrikh-Bayat, M. Prezioso, B. Chakrabarti, I. Kataeva, D. Strukov
We experimentally demonstrate classification of 4×4 binary images into 4 classes, using a 3-layer mixed-signal neuromorphic network (“MLP perceptron”), based on two passive 20×20 memristive crossbar arrays, board-integrated with discrete CMOS components. The network features 10 hidden-layer and 4 output-layer analog CMOS neurons and 428 metal-oxide memristors, i.e. is almost an order of magnitude more complex than any previously reported functional passive (0T1R) memristor classifier. Moreover, the inference operation of this classifier is performed entirely in the integrated hardware. To deal with larger crossbar arrays, we have developed a semiautomatic approach to their forming and testing, and compared several memristor training schemes for coping with imperfect behavior of these devices, as well as with variability of analog CMOS neurons. The effectiveness of the proposed schemes for defect and variation tolerance was verified experimentally using the implemented network and, additionally, by modeling the operation of a larger network, with 300 hidden-layer neurons, on the MNIST benchmark. Finally, we propose a simple modification of the implemented memristor-based vector-by-matrix multiplier to allow its operation in a wider temperature range.
我们通过实验证明了将4×4二值图像分为4类,使用3层混合信号神经形态网络(“MLP感知器”),基于两个无源20×20记忆交叉棒阵列,与分立CMOS元件板集成。该网络具有10个隐藏层和4个输出层模拟CMOS神经元和428个金属氧化物忆阻器,即几乎比以前报道的任何功能无源(0T1R)忆阻器分类器复杂一个数量级。此外,该分类器的推理操作完全在集成硬件中完成。为了处理更大的交叉棒阵列,我们开发了一种半自动方法来形成和测试它们,并比较了几种记忆电阻训练方案,以应对这些设备的不完美行为,以及模拟CMOS神经元的可变性。通过实验验证了所提出的缺陷和变异容差方案的有效性,此外,通过在MNIST基准上对具有300个隐藏层神经元的更大网络的运行进行建模。最后,我们提出了一个简单的修改实现基于忆阻器的矢量乘矩阵乘法器,使其在更宽的温度范围内工作。
{"title":"Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardware","authors":"F. Merrikh-Bayat, M. Prezioso, B. Chakrabarti, I. Kataeva, D. Strukov","doi":"10.1109/ICCAD.2017.8203825","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203825","url":null,"abstract":"We experimentally demonstrate classification of 4×4 binary images into 4 classes, using a 3-layer mixed-signal neuromorphic network (“MLP perceptron”), based on two passive 20×20 memristive crossbar arrays, board-integrated with discrete CMOS components. The network features 10 hidden-layer and 4 output-layer analog CMOS neurons and 428 metal-oxide memristors, i.e. is almost an order of magnitude more complex than any previously reported functional passive (0T1R) memristor classifier. Moreover, the inference operation of this classifier is performed entirely in the integrated hardware. To deal with larger crossbar arrays, we have developed a semiautomatic approach to their forming and testing, and compared several memristor training schemes for coping with imperfect behavior of these devices, as well as with variability of analog CMOS neurons. The effectiveness of the proposed schemes for defect and variation tolerance was verified experimentally using the implemented network and, additionally, by modeling the operation of a larger network, with 300 hidden-layer neurons, on the MNIST benchmark. Finally, we propose a simple modification of the implemented memristor-based vector-by-matrix multiplier to allow its operation in a wider temperature range.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Mixed-cell-height detailed placement considering complex minimum-implant-area constraints 考虑复杂最小植入面积约束的混合细胞高度详细放置
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203761
Yen-Yi Wu, Yao-Wen Chang
Mixed-cell-height circuits have prevailed in advanced technology to address various design needs. Along with device scaling, complex minimum-implant-area (MIA) constraints arise as an emerging challenge in modern circuit designs, adding to the difficulties in mixed-cell-height placement. Existing MIA-aware detailed placement with single-row-height standard cells is insufficient for mixed-cell-height designs: (1) filler insertion, typically used to resolve MIA violations, might incur unaffordable area and wirelength overheads, and (2) mixed-height cell perturbation could cause severe inter-row MIA violations. This paper presents the first work to address the mixed-cell-height detailed placement problem considering both intra- and inter-row MIA constraints. We first fix intra-row violations by clustering violating mixed-height cells of the same threshold voltage, and then perturb each cluster to obtain a desired cell permutation by applying an efficient, optimal dynamic-programming-based algorithm for a special case and Algorithm DLX for general ones, where a provably constant performance ratio for a mixed-cell-height reshaping problem can be achieved. With a network-flow-based formulation, remaining violating cells are placed in appropriate filler-insertion positions to fix cell violations and minimize area. After performing mixed-cell-height detailed placement, we finally fix inter-row violations by shifting violating cells in minimum displacement. Compared with a filler insertion method and a greedy clustering approach, experimental results show that our proposed algorithm can resolve all MIA violations with smallest HPWL and area overheads in reasonable running time.
混合单元高度电路已在先进技术中盛行,以满足各种设计需求。随着器件的缩放,复杂的最小植入面积(MIA)限制成为现代电路设计中的一个新挑战,增加了混合单元高度放置的困难。现有的带有单行高度标准单元的MIA感知详细布局不足以用于混合单元高度设计:(1)填充插入,通常用于解决MIA违规,可能会产生难以承受的面积和无线开销;(2)混合高度单元扰动可能会导致严重的行间MIA违规。本文提出了考虑行内和行间MIA约束的混合单元高度详细布局问题的第一个工作。我们首先通过聚集违反相同阈值电压的混合高度单元来修复行内违规,然后通过对特殊情况应用有效的最优动态规划算法和对一般情况应用算法DLX来扰动每个集群以获得所需的单元排列,其中混合单元高度重塑问题可以实现可证明的恒定性能比。在基于网络流的配方中,剩余的违规单元被放置在适当的填充插入位置,以修复违规单元并最小化面积。在进行混合细胞高度的详细放置后,我们最终通过最小位移移动违规细胞来修复行间违规。实验结果表明,该算法能够在合理的运行时间内以最小的HPWL和面积开销解决所有MIA违规问题。
{"title":"Mixed-cell-height detailed placement considering complex minimum-implant-area constraints","authors":"Yen-Yi Wu, Yao-Wen Chang","doi":"10.1109/ICCAD.2017.8203761","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203761","url":null,"abstract":"Mixed-cell-height circuits have prevailed in advanced technology to address various design needs. Along with device scaling, complex minimum-implant-area (MIA) constraints arise as an emerging challenge in modern circuit designs, adding to the difficulties in mixed-cell-height placement. Existing MIA-aware detailed placement with single-row-height standard cells is insufficient for mixed-cell-height designs: (1) filler insertion, typically used to resolve MIA violations, might incur unaffordable area and wirelength overheads, and (2) mixed-height cell perturbation could cause severe inter-row MIA violations. This paper presents the first work to address the mixed-cell-height detailed placement problem considering both intra- and inter-row MIA constraints. We first fix intra-row violations by clustering violating mixed-height cells of the same threshold voltage, and then perturb each cluster to obtain a desired cell permutation by applying an efficient, optimal dynamic-programming-based algorithm for a special case and Algorithm DLX for general ones, where a provably constant performance ratio for a mixed-cell-height reshaping problem can be achieved. With a network-flow-based formulation, remaining violating cells are placed in appropriate filler-insertion positions to fix cell violations and minimize area. After performing mixed-cell-height detailed placement, we finally fix inter-row violations by shifting violating cells in minimum displacement. Compared with a filler insertion method and a greedy clustering approach, experimental results show that our proposed algorithm can resolve all MIA violations with smallest HPWL and area overheads in reasonable running time.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121906542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
NEMESIS: A software approach for computing in presence of soft errors NEMESIS:一种在存在软错误时进行计算的软件方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203792
Moslem Didehban, Aviral Shrivastava, Sai Ram Dheeraj Lokam
Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS — a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.
软误差被认为是亚纳米微处理器可靠性面临的主要挑战。软件级别的软错误恢复方案是可取的,因为它们不需要修改硬件,而且它们的保护可以根据应用程序需求进行调优。然而,现有的软件级容错方案并没有提供高级别的保护。在这项工作中,我们提出了NEMESIS——一种编译器级别的细粒度软错误检测、诊断和恢复技术,可以提供高度的错误弹性。NEMESIS运行三个版本的计算,并通过检查所有内存写和分支操作的结果来检测软错误。在不匹配的情况下,NEMESIS恢复例程从程序的体系结构状态中恢复错误的影响,程序恢复其正常执行。我们广泛的μ架构级故障注入实验结果表明,NEMESIS变换能够检测到所有软错误,并从检测到的错误中恢复97%。
{"title":"NEMESIS: A software approach for computing in presence of soft errors","authors":"Moslem Didehban, Aviral Shrivastava, Sai Ram Dheeraj Lokam","doi":"10.1109/ICCAD.2017.8203792","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203792","url":null,"abstract":"Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS — a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130935080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A spike-based long short-term memory on a neurosynaptic processor 在神经突触处理器上的基于尖峰的长短期记忆
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203836
Amar Shrestha, Khadeer Ahmed, Yanzhi Wang, D. Widemann, A. Moody, B. V. Essen, Qinru Qiu
Low-power brain-inspired hardware systems have gained significant traction in recent years. They offer high energy efficiency and massive parallelism due to the distributed and asynchronous nature of neural computation through low-energy spikes. One such platform is the IBM TrueNorth Neurosynaptic System. Recently TrueNorth compatible representation learning algorithms have emerged, achieving close to state-of-the-art performance in various datasets. An exception is its application in temporal sequence processing models such as recurrent neural networks (RNNs), which is still at the proof of concept level. This is partly due to the hardware constraints in connectivity and syn-aptic weight resolution, and the inherent difficulty in capturing temporal dynamics of an RNN using spiking neurons. This work presents a design flow that overcomes the aforementioned difficulties and maps a special case of recurrent networks called Long Short-Term Memory (LSTM) onto a spike-based platform. The framework is built on top of various approximation techniques, weight and activation discretization, spiking neuron sub-circuits that implements the complex gating mechanisms and a store-and-release technique to enable neuron synchronization and faithful storage. While many of the techniques can be applied to map LSTM to any SNN simulator/emulator, here we demonstrate this approach on the TrueNorth chip adhering to its constraints. Two benchmark LSTM applications, parity check and Extended Reber Grammar, are evaluated and their accuracy, energy and speed tradeoffs are analyzed.
近年来,低功耗大脑启发的硬件系统获得了极大的关注。由于神经计算通过低能量峰值的分布式和异步特性,它们提供了高能效和大量并行性。IBM TrueNorth神经突触系统就是这样一个平台。最近出现了与TrueNorth兼容的表示学习算法,在各种数据集中实现了接近最先进的性能。一个例外是它在时间序列处理模型中的应用,如递归神经网络(RNNs),这仍然处于概念验证阶段。这部分是由于连通性和突触权重分辨率的硬件限制,以及使用尖峰神经元捕获RNN的时间动态的固有困难。这项工作提出了一个克服上述困难的设计流程,并将称为长短期记忆(LSTM)的循环网络的特殊情况映射到基于峰值的平台上。该框架建立在各种近似技术、权重和激活离散化、实现复杂门控机制的尖峰神经元子电路以及实现神经元同步和忠实存储的存储和释放技术的基础上。虽然许多技术可以应用于将LSTM映射到任何SNN模拟器/模拟器,但这里我们在TrueNorth芯片上演示这种方法,并遵守其约束。评估了两种基准LSTM应用,奇偶校验和扩展Reber语法,并分析了它们的准确性、能量和速度权衡。
{"title":"A spike-based long short-term memory on a neurosynaptic processor","authors":"Amar Shrestha, Khadeer Ahmed, Yanzhi Wang, D. Widemann, A. Moody, B. V. Essen, Qinru Qiu","doi":"10.1109/ICCAD.2017.8203836","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203836","url":null,"abstract":"Low-power brain-inspired hardware systems have gained significant traction in recent years. They offer high energy efficiency and massive parallelism due to the distributed and asynchronous nature of neural computation through low-energy spikes. One such platform is the IBM TrueNorth Neurosynaptic System. Recently TrueNorth compatible representation learning algorithms have emerged, achieving close to state-of-the-art performance in various datasets. An exception is its application in temporal sequence processing models such as recurrent neural networks (RNNs), which is still at the proof of concept level. This is partly due to the hardware constraints in connectivity and syn-aptic weight resolution, and the inherent difficulty in capturing temporal dynamics of an RNN using spiking neurons. This work presents a design flow that overcomes the aforementioned difficulties and maps a special case of recurrent networks called Long Short-Term Memory (LSTM) onto a spike-based platform. The framework is built on top of various approximation techniques, weight and activation discretization, spiking neuron sub-circuits that implements the complex gating mechanisms and a store-and-release technique to enable neuron synchronization and faithful storage. While many of the techniques can be applied to map LSTM to any SNN simulator/emulator, here we demonstrate this approach on the TrueNorth chip adhering to its constraints. Two benchmark LSTM applications, parity check and Extended Reber Grammar, are evaluated and their accuracy, energy and speed tradeoffs are analyzed.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115943220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Exploring cache bypassing and partitioning for multi-tasking on GPUs 探索gpu上的多任务缓存绕过和分区
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203754
Yun Liang, Xiuhong Li, Xiaolong Xie
Graphics Processing Units (GPUs) computing has become ubiquitous for embedded system, evidenced by its wide adoption for various general purpose applications. As more and more applications are accelerated by GPUs, multi-tasking scenario starts to emerge. Multi-tasking allows multiple applications to simultaneously execute on the same GPU and share the resource. This brings new challenges due to the contention among the different applications for the shared resources such as caches. However, the caches on GPUs are difficult to use. If used inappropriately, it may hurt the performance instead of improving it. In this paper, we propose to use cache partitioning together with cache bypassing as the shared cache management mechanism for multi-tasking on GPUs. The combined approach aims to reduce the interference among the tasks and preserve the locality for each task. However, the interplay among the cache partitioning and bypassing brings greater challenges. On one hand, the partitioned cache space to each task affects its cache bypassing decision. On the other hand, cache bypassing affects the cache capacity required for each task. To address this, we propose a two-step approach. First, we use cache partitioning to assign dedicated cache space to each task to reduce the interference among the tasks. During this process, we compare cache partitioning with coarse-grained cache bypassing. Then, we use fine-grained cache bypassing to selectively bypass certain data requests and threads for each task. We explore different cache partitioning and bypassing designs and demonstrate the potential benefits of this approach. Experiments using a wide range of applications demonstrate that our technique improves the overall system throughput by 52% on average compared to the default multi-tasking solution on GPUs.
图形处理单元(gpu)计算在嵌入式系统中已经变得无处不在,它被各种通用应用程序广泛采用。随着越来越多的应用被gpu加速,多任务场景开始出现。多任务允许多个应用程序同时在同一GPU上执行并共享资源。由于不同的应用程序之间对共享资源(如缓存)的争夺,这带来了新的挑战。然而,gpu上的缓存很难使用。如果使用不当,它可能会损害而不是提高性能。在本文中,我们建议使用缓存分区和缓存绕过作为gpu上多任务的共享缓存管理机制。该组合方法旨在减少任务之间的干扰,并保持每个任务的局部性。但是,缓存分区和旁路之间的相互作用带来了更大的挑战。一方面,为每个任务划分的缓存空间影响其缓存绕过决策。另一方面,缓存绕过会影响每个任务所需的缓存容量。为了解决这个问题,我们提出了一个两步走的方法。首先,我们使用缓存分区为每个任务分配专用的缓存空间,以减少任务之间的干扰。在此过程中,我们将缓存分区与粗粒度缓存绕过进行比较。然后,我们使用细粒度缓存绕过来选择性地绕过每个任务的某些数据请求和线程。我们探讨了不同的缓存分区和绕过设计,并演示了这种方法的潜在好处。使用广泛应用程序的实验表明,与gpu上的默认多任务解决方案相比,我们的技术将整体系统吞吐量平均提高了52%。
{"title":"Exploring cache bypassing and partitioning for multi-tasking on GPUs","authors":"Yun Liang, Xiuhong Li, Xiaolong Xie","doi":"10.1109/ICCAD.2017.8203754","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203754","url":null,"abstract":"Graphics Processing Units (GPUs) computing has become ubiquitous for embedded system, evidenced by its wide adoption for various general purpose applications. As more and more applications are accelerated by GPUs, multi-tasking scenario starts to emerge. Multi-tasking allows multiple applications to simultaneously execute on the same GPU and share the resource. This brings new challenges due to the contention among the different applications for the shared resources such as caches. However, the caches on GPUs are difficult to use. If used inappropriately, it may hurt the performance instead of improving it. In this paper, we propose to use cache partitioning together with cache bypassing as the shared cache management mechanism for multi-tasking on GPUs. The combined approach aims to reduce the interference among the tasks and preserve the locality for each task. However, the interplay among the cache partitioning and bypassing brings greater challenges. On one hand, the partitioned cache space to each task affects its cache bypassing decision. On the other hand, cache bypassing affects the cache capacity required for each task. To address this, we propose a two-step approach. First, we use cache partitioning to assign dedicated cache space to each task to reduce the interference among the tasks. During this process, we compare cache partitioning with coarse-grained cache bypassing. Then, we use fine-grained cache bypassing to selectively bypass certain data requests and threads for each task. We explore different cache partitioning and bypassing designs and demonstrate the potential benefits of this approach. Experiments using a wide range of applications demonstrate that our technique improves the overall system throughput by 52% on average compared to the default multi-tasking solution on GPUs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126684519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scalable N-worst algorithms for dynamic timing and activity analysis 动态定时和活动分析的可扩展n -最差算法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203830
Hari Cherupalli, J. Sartori
As the overheads for ensuring the correctness of electronic designs continue to increase with continued technology scaling and increased variability, better-than-worst-case (BTWC) design has gained significant attention. Many BTWC design techniques utilize dynamic timing and activity information for design analysis and optimization. These techniques rely on path-based analysis that enumerates the exercised paths in a design as targets for analysis and optimization. However, path-based dynamic analysis techniques are not scalable and cannot be used to analyze full processors and full applications. On the other hand, graph-based techniques like those that form the foundational building blocks of electronic design automation tools are scalable and can efficiently analyze large designs. In this paper, we extend graph-based analysis to provide the fundamental dynamic analysis tools necessary for BTWC design, analysis, and optimization. Specifically, we present scalable graph-based techniques to report the N-worst exercised paths in a design for three metrics — timing criticality (slack), activity (toggle count), and activity subject to delay constraints. Compared to existing path-based techniques, our scalable dynamic analysis techniques improve average performance by 977 x, 163 x, and 113 x, respectively, and enable scalable analysis for a full processor design running full applications.
随着技术的不断扩展和变异性的增加,确保电子设计正确性的开销不断增加,比最坏情况好(BTWC)的设计得到了极大的关注。许多BTWC设计技术利用动态时序和活动信息进行设计分析和优化。这些技术依赖于基于路径的分析,它列举设计中经过实践的路径作为分析和优化的目标。然而,基于路径的动态分析技术是不可扩展的,不能用于分析完整的处理器和完整的应用程序。另一方面,基于图形的技术,如那些构成电子设计自动化工具的基本构建块的技术,是可扩展的,可以有效地分析大型设计。在本文中,我们扩展了基于图的分析,以提供BTWC设计,分析和优化所需的基本动态分析工具。具体来说,我们提出了可扩展的基于图的技术来报告设计中三个指标的n个最差的运动路径-时间临界性(松弛),活动(切换计数)和受延迟约束的活动。与现有的基于路径的技术相比,我们的可扩展动态分析技术将平均性能分别提高了977倍、163倍和113倍,并支持对运行完整应用程序的完整处理器设计进行可扩展分析。
{"title":"Scalable N-worst algorithms for dynamic timing and activity analysis","authors":"Hari Cherupalli, J. Sartori","doi":"10.1109/ICCAD.2017.8203830","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203830","url":null,"abstract":"As the overheads for ensuring the correctness of electronic designs continue to increase with continued technology scaling and increased variability, better-than-worst-case (BTWC) design has gained significant attention. Many BTWC design techniques utilize dynamic timing and activity information for design analysis and optimization. These techniques rely on path-based analysis that enumerates the exercised paths in a design as targets for analysis and optimization. However, path-based dynamic analysis techniques are not scalable and cannot be used to analyze full processors and full applications. On the other hand, graph-based techniques like those that form the foundational building blocks of electronic design automation tools are scalable and can efficiently analyze large designs. In this paper, we extend graph-based analysis to provide the fundamental dynamic analysis tools necessary for BTWC design, analysis, and optimization. Specifically, we present scalable graph-based techniques to report the N-worst exercised paths in a design for three metrics — timing criticality (slack), activity (toggle count), and activity subject to delay constraints. Compared to existing path-based techniques, our scalable dynamic analysis techniques improve average performance by 977 x, 163 x, and 113 x, respectively, and enable scalable analysis for a full processor design running full applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127402322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Functional safety methodologies for automotive applications 汽车应用的功能安全方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203886
A. Nardi, A. Armato
Safety-critical automotive applications have stringent demands for functional safety and reliability. Traditionally, functional safety requirements have been managed by car manufacturers and system providers. However, with the increasing complexity of electronics involved, the responsibility of addressing functional safety is now propagating through the supply chain to semiconductor companies and design tool providers. This paper introduces some basic concepts of functional safety analysis and optimization and shows the bridge with the tradition design flow. Considerations are presented on how design methodologies are capturing and addressing the new safety metrics.
安全关键型汽车应用对功能安全性和可靠性有着严格的要求。传统上,功能安全要求是由汽车制造商和系统供应商管理的。然而,随着电子产品的日益复杂,解决功能安全的责任现在正在通过供应链传播到半导体公司和设计工具提供商。本文介绍了功能安全分析与优化的一些基本概念,并以传统的设计流程展示了该桥梁。介绍了设计方法如何捕获和处理新的安全度量。
{"title":"Functional safety methodologies for automotive applications","authors":"A. Nardi, A. Armato","doi":"10.1109/ICCAD.2017.8203886","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203886","url":null,"abstract":"Safety-critical automotive applications have stringent demands for functional safety and reliability. Traditionally, functional safety requirements have been managed by car manufacturers and system providers. However, with the increasing complexity of electronics involved, the responsibility of addressing functional safety is now propagating through the supply chain to semiconductor companies and design tool providers. This paper introduces some basic concepts of functional safety analysis and optimization and shows the bridge with the tradition design flow. Considerations are presented on how design methodologies are capturing and addressing the new safety metrics.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130976831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
GRASP based metaheuristics for layout pattern classification 基于GRASP的布局模式分类元启发式方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203820
M. Woo, Seungwon Kim, Seokhyeong Kang
Layout pattern classification has been recently utilized in IC design. It clusters hotspot patterns for design-space analysis or yield optimization. In pattern classification, an optimal clustering is essential, as well as its runtime and accuracy. Within the research-oriented infrastructure used in the ICCAD 2016 contest, we have developed a fast metaheuristic for the pattern classification that utilizes the Greedy Randomized Adaptive Search Procedure (GRASP). Our proposed metaheuristic outperforms the best-reported results on all of the ICCAD 2016 benchmarks. In addition, we achieve up to a 50% cluster count reduction, and improve a runtime significantly compared to a commercial EDA tool provided in the ICCAD 2016 contest [1].
布局模式分类是近年来在集成电路设计中的应用。它将热点模式聚类,用于设计空间分析或良率优化。在模式分类中,最优聚类及其运行时间和准确率至关重要。在ICCAD 2016竞赛中使用的研究型基础设施中,我们开发了一种快速的元启发式模式分类方法,该方法利用贪婪随机自适应搜索程序(GRASP)。我们提出的元启发式优于所有ICCAD 2016基准测试中报告的最佳结果。此外,与ICCAD 2016竞赛[1]中提供的商业EDA工具相比,我们实现了多达50%的集群计数减少,并显着提高了运行时间。
{"title":"GRASP based metaheuristics for layout pattern classification","authors":"M. Woo, Seungwon Kim, Seokhyeong Kang","doi":"10.1109/ICCAD.2017.8203820","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203820","url":null,"abstract":"Layout pattern classification has been recently utilized in IC design. It clusters hotspot patterns for design-space analysis or yield optimization. In pattern classification, an optimal clustering is essential, as well as its runtime and accuracy. Within the research-oriented infrastructure used in the ICCAD 2016 contest, we have developed a fast metaheuristic for the pattern classification that utilizes the Greedy Randomized Adaptive Search Procedure (GRASP). Our proposed metaheuristic outperforms the best-reported results on all of the ICCAD 2016 benchmarks. In addition, we achieve up to a 50% cluster count reduction, and improve a runtime significantly compared to a commercial EDA tool provided in the ICCAD 2016 contest [1].","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"457 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131433063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Thermosiphon: A thermal aware NUCA architecture for write energy reduction of the STT-MRAM based LLCs 热虹吸:用于减少STT-MRAM有限责任公司写入能量的热感知NUCA架构
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203815
Bi Wu, Yuanqing Cheng, Pengcheng Dai, Jianlei Yang, Youguang Zhang, Dijun Liu, Y. Wang, Weisheng Zhao
As the speed gap of the modern processor and the off-chip main memory enlarges, on-chip cache capacity increases to sustain the performance scaling. As a result, the cache power occupies a large portion of the total power budget. STT-MRAM (Spin Transfer Torque Magnetic Memory) is proposed as a promising solution for the low power cache design due to its high integration density and ultra-low leakage. Nevertheless, the high write power and latency of STT-MRAM become new barriers for the commercialization of this emerging technology. In this paper, we investigate the thermal effect on the access performance of STT-MRAM and observe that the temperature can affect the write delay and energy significantly. Then, we explore the NUCA (Non-Uniform Cache Access) design of the CMPs (Chip-Multi-Processors)with STT-MRAM based LLC (Last Level Cache). A thermal aware data migration policy, called “Thermosiphon”, which takes advantage of the thermal property of STT-MRAM, is proposed to reduce the LLC write energy. This policy splits the LLC into different regions based on the thermal distribution and adaptively migrate write intensive data considering the temperature gradient among different thermal regions. Compared to the conventional NUCA design, our proposed design can save 22.5% write energy with negligible hardware overhead.
随着现代处理器和片外主存储器的速度差距的扩大,片上缓存容量增加以维持性能扩展。因此,缓存功率占用了总功率预算的很大一部分。STT-MRAM (Spin Transfer Torque Magnetic Memory,自旋传递转矩磁记忆体)由于其高集成度和超低泄漏,被认为是低功耗高速缓存设计的一个很有前途的解决方案。然而,STT-MRAM的高写入功率和延迟成为这一新兴技术商业化的新障碍。本文研究了热效应对STT-MRAM存取性能的影响,发现温度会显著影响写入延迟和能量。然后,我们探讨了基于STT-MRAM的LLC(最后一级缓存)的cmp (Chip-Multi-Processors)的NUCA (Non-Uniform Cache Access)设计。利用STT-MRAM的热特性,提出了一种热感知数据迁移策略,称为“热虹吸”,以降低LLC写入能量。该策略根据热分布将LLC划分为不同的区域,并考虑不同热区域之间的温度梯度自适应迁移写密集型数据。与传统的NUCA设计相比,我们提出的设计可以节省22.5%的写入能量,而硬件开销可以忽略不计。
{"title":"Thermosiphon: A thermal aware NUCA architecture for write energy reduction of the STT-MRAM based LLCs","authors":"Bi Wu, Yuanqing Cheng, Pengcheng Dai, Jianlei Yang, Youguang Zhang, Dijun Liu, Y. Wang, Weisheng Zhao","doi":"10.1109/ICCAD.2017.8203815","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203815","url":null,"abstract":"As the speed gap of the modern processor and the off-chip main memory enlarges, on-chip cache capacity increases to sustain the performance scaling. As a result, the cache power occupies a large portion of the total power budget. STT-MRAM (Spin Transfer Torque Magnetic Memory) is proposed as a promising solution for the low power cache design due to its high integration density and ultra-low leakage. Nevertheless, the high write power and latency of STT-MRAM become new barriers for the commercialization of this emerging technology. In this paper, we investigate the thermal effect on the access performance of STT-MRAM and observe that the temperature can affect the write delay and energy significantly. Then, we explore the NUCA (Non-Uniform Cache Access) design of the CMPs (Chip-Multi-Processors)with STT-MRAM based LLC (Last Level Cache). A thermal aware data migration policy, called “Thermosiphon”, which takes advantage of the thermal property of STT-MRAM, is proposed to reduce the LLC write energy. This policy splits the LLC into different regions based on the thermal distribution and adaptively migrate write intensive data considering the temperature gradient among different thermal regions. Compared to the conventional NUCA design, our proposed design can save 22.5% write energy with negligible hardware overhead.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123863464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1