首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Reducing DRAM Access Latency via Helper Rows 通过Helper行减少DRAM访问延迟
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218719
Xin Xin, Youtao Zhang, Jun Yang
The DRAM technology advancement has seen success in memory density and throughput improvement, but less in access latency reduction. This is mainly due to the intrinsic limitation of capacitance based bit store and access mechanism. The reduction of access latency has been well explored in literature. However, the recently proposed DRAM techniques, such as RowClone and Half-DRAM, offer new opportunities to further optimise the access latency.In this paper, we propose an efficient access strategy to improve the performance of DRAM by optionally discarding the restore. When activating a new row, our technique makes a copy of the row leveraging the RowClone method. Next time when accessing the same row, the cloned row is opened for sensing but it is not restored as the data is preserved in the original row. To improve the efficiency of our proposed strategy, we further exploit three schemes to minimize the copy overhead and increase the reuse of the cloned row. Experimental results show that our proposed strategy can achieve 11% performance improvement on average.
DRAM技术的进步在内存密度和吞吐量方面取得了成功,但在减少访问延迟方面却不太成功。这主要是由于基于电容的位存储和访问机制的固有局限性。减少访问延迟已经在文献中得到了很好的探讨。然而,最近提出的DRAM技术,如RowClone和Half-DRAM,为进一步优化访问延迟提供了新的机会。在本文中,我们提出了一种有效的存取策略,通过选择性地放弃恢复来提高DRAM的性能。在激活新行时,我们的技术利用RowClone方法生成该行的副本。下次访问同一行时,将打开克隆行以供检测,但不会将其还原,因为数据保留在原始行中。为了提高我们提出的策略的效率,我们进一步利用了三种方案来最小化复制开销并增加克隆行的重用。实验结果表明,我们提出的策略可以使性能平均提高11%。
{"title":"Reducing DRAM Access Latency via Helper Rows","authors":"Xin Xin, Youtao Zhang, Jun Yang","doi":"10.1109/DAC18072.2020.9218719","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218719","url":null,"abstract":"The DRAM technology advancement has seen success in memory density and throughput improvement, but less in access latency reduction. This is mainly due to the intrinsic limitation of capacitance based bit store and access mechanism. The reduction of access latency has been well explored in literature. However, the recently proposed DRAM techniques, such as RowClone and Half-DRAM, offer new opportunities to further optimise the access latency.In this paper, we propose an efficient access strategy to improve the performance of DRAM by optionally discarding the restore. When activating a new row, our technique makes a copy of the row leveraging the RowClone method. Next time when accessing the same row, the cloned row is opened for sensing but it is not restored as the data is preserved in the original row. To improve the efficiency of our proposed strategy, we further exploit three schemes to minimize the copy overhead and increase the reuse of the cloned row. Experimental results show that our proposed strategy can achieve 11% performance improvement on average.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Pragmatic Approach to On-device Incremental Learning System with Selective Weight Updates 具有选择性权重更新的设备上增量学习系统的实用方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218507
Jaekang Shin, Seungkyu Choi, Yeongjae Choi, L. Kim
Incremental learning is drawing attention to widen capabilities of device-AI. Previous works have researched to reduce numerous computations and memory accesses required for the training process of IL, but they could not show a noticeable improvement in the weight gradient computation (WGC) phase. Therefore, we propose a selective weight update technique that searches for critical weights to be updated by applying the IL algorithm that training per-task binary masks. Also, we introduce a novel dataflow for the implementation of selective WGC on typical NPUs with minimum overheads. On average, our system shows a 2.9× speed up and 2.5× energy efficiency in WGC without degrading training quality.
渐进式学习正在引起人们对扩大设备人工智能能力的关注。以前的工作已经研究了减少IL训练过程中所需的大量计算和内存访问,但在权重梯度计算(weight gradient computation, WGC)阶段没有显示出明显的改善。因此,我们提出了一种选择性权重更新技术,该技术通过应用训练每任务二进制掩码的IL算法来搜索需要更新的关键权重。此外,我们还引入了一种新的数据流,用于在典型npu上以最小的开销实现选择性WGC。在不影响训练质量的情况下,我们的系统在WGC中平均表现出2.9倍的速度提升和2.5倍的能量效率。
{"title":"A Pragmatic Approach to On-device Incremental Learning System with Selective Weight Updates","authors":"Jaekang Shin, Seungkyu Choi, Yeongjae Choi, L. Kim","doi":"10.1109/DAC18072.2020.9218507","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218507","url":null,"abstract":"Incremental learning is drawing attention to widen capabilities of device-AI. Previous works have researched to reduce numerous computations and memory accesses required for the training process of IL, but they could not show a noticeable improvement in the weight gradient computation (WGC) phase. Therefore, we propose a selective weight update technique that searches for critical weights to be updated by applying the IL algorithm that training per-task binary masks. Also, we introduce a novel dataflow for the implementation of selective WGC on typical NPUs with minimum overheads. On average, our system shows a 2.9× speed up and 2.5× energy efficiency in WGC without degrading training quality.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Layer RBER Variation Aware Read Performance Optimization for 3D Flash Memories 三维闪存层RBER变化感知读取性能优化
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218631
Shiqiang Nie, Youtao Zhang, Weiguo Wu, Jun Yang
3D NAND flash enables the construction of large capacity Solid-State Drives (SSDs) for modern computer systems. While effectively reducing per bit cost, 3D NAND flash exhibits non-negligible process variations and thus RBER (raw bit error rate) difference across layers, which leads to sub-optimal read performance for applications with either small or large I/O requests. In this paper, we propose LRR, Layer RBER variation aware Read optimization schemes, to address the challenge. LRR consists of two schemes — LRR subpage read scheduling (SRS) and LRR fullpage allocation (FPA). SRS groups small read requests from the layers with similar RBERs to reduce the average read latency of subpage sized read requests. FPA distributes the data of a large write to multiple layers, which improves the read latency when reading from layers with large RBERs. Our experimental results show that our proposed scheme LRR reduces 46% read latency on average over the state-of-the-art.
3D NAND闪存能够为现代计算机系统构建大容量固态硬盘(ssd)。在有效降低每比特成本的同时,3D NAND闪存显示出不可忽略的过程变化,从而导致不同层之间的RBER(原始误码率)差异,这导致无论是小I/O请求还是大I/O请求的应用程序的读取性能都不理想。在本文中,我们提出了LRR,层RBER变化感知读取优化方案,以解决这一挑战。LRR包括两种方案:LRR子页读取调度(SRS)和LRR全页分配(FPA)。SRS将来自具有相似rber的层的小读请求分组,以减少子页面大小的读请求的平均读延迟。FPA将大的写数据分布到多个层,从而提高了从rber大的层读时的读时延。我们的实验结果表明,我们提出的LRR方案比最先进的方案平均减少了46%的读延迟。
{"title":"Layer RBER Variation Aware Read Performance Optimization for 3D Flash Memories","authors":"Shiqiang Nie, Youtao Zhang, Weiguo Wu, Jun Yang","doi":"10.1109/DAC18072.2020.9218631","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218631","url":null,"abstract":"3D NAND flash enables the construction of large capacity Solid-State Drives (SSDs) for modern computer systems. While effectively reducing per bit cost, 3D NAND flash exhibits non-negligible process variations and thus RBER (raw bit error rate) difference across layers, which leads to sub-optimal read performance for applications with either small or large I/O requests. In this paper, we propose LRR, Layer RBER variation aware Read optimization schemes, to address the challenge. LRR consists of two schemes — LRR subpage read scheduling (SRS) and LRR fullpage allocation (FPA). SRS groups small read requests from the layers with similar RBERs to reduce the average read latency of subpage sized read requests. FPA distributes the data of a large write to multiple layers, which improves the read latency when reading from layers with large RBERs. Our experimental results show that our proposed scheme LRR reduces 46% read latency on average over the state-of-the-art.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129922224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Formal Approach for Detecting Vulnerabilities to Transient Execution Attacks in Out-of-Order Processors 一种检测乱序处理器瞬态执行攻击漏洞的形式化方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218572
M. R. Fadiheh, Johannes Müller, R. Brinkmann, S. Mitra, D. Stoffel, W. Kunz
Transient execution attacks, such as Spectre and Meltdown, create a new and serious attack surface in modern processors. In spite of all countermeasures taken during recent years, the cycles of alarm and patch are ongoing and call for a better formal understanding of the threat and possible preventions.This paper introduces a formal definition of security with respect to transient execution attacks, formulated as a HW property. We present a formal method for security verification by HW property checking based on extending Unique Program Execution Checking (UPEC) to out-of-order processors. UPEC can be used to systematically detect all vulnerabilities to transient execution attacks, including vulnerabilities unknown so far. The feasibility of our approach is demonstrated at the example of the BOOM processor, which is a design with more than 650,000 state bits. In BOOM our approach detects a new, so far unknown vulnerability, called Spectre-STC, indicating that also single-threaded processors can be vulnerable to contention-based Spectre attacks.
瞬态执行攻击,如Spectre和Meltdown,在现代处理器中创造了一个新的、严重的攻击面。尽管近年来采取了各种对策,但警报和修补的循环仍在继续,需要对威胁和可能的预防措施有更正式的了解。本文介绍了一个关于暂态执行攻击的安全的正式定义,它被表述为一个HW属性。在将唯一程序执行检查(upc)扩展到无序处理器的基础上,提出了一种通过HW属性检查进行安全验证的形式化方法。upc可用于系统地检测瞬态执行攻击的所有漏洞,包括迄今为止未知的漏洞。我们的方法的可行性在BOOM处理器的例子中得到了证明,这是一个超过65万个状态位的设计。在BOOM中,我们的方法检测到一个新的,迄今为止未知的漏洞,称为Spectre- stc,这表明单线程处理器也容易受到基于争用的Spectre攻击。
{"title":"A Formal Approach for Detecting Vulnerabilities to Transient Execution Attacks in Out-of-Order Processors","authors":"M. R. Fadiheh, Johannes Müller, R. Brinkmann, S. Mitra, D. Stoffel, W. Kunz","doi":"10.1109/DAC18072.2020.9218572","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218572","url":null,"abstract":"Transient execution attacks, such as Spectre and Meltdown, create a new and serious attack surface in modern processors. In spite of all countermeasures taken during recent years, the cycles of alarm and patch are ongoing and call for a better formal understanding of the threat and possible preventions.This paper introduces a formal definition of security with respect to transient execution attacks, formulated as a HW property. We present a formal method for security verification by HW property checking based on extending Unique Program Execution Checking (UPEC) to out-of-order processors. UPEC can be used to systematically detect all vulnerabilities to transient execution attacks, including vulnerabilities unknown so far. The feasibility of our approach is demonstrated at the example of the BOOM processor, which is a design with more than 650,000 state bits. In BOOM our approach detects a new, so far unknown vulnerability, called Spectre-STC, indicating that also single-threaded processors can be vulnerable to contention-based Spectre attacks.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123376483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Robust Design of Large Area Flexible Electronics via Compressed Sensing 基于压缩传感的大面积柔性电子器件鲁棒设计
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218570
Leilai Shao, Ting Lei, Tsung-Ching Huang, Zhenan Bao, Kwang-Ting Cheng
Large area flexible electronics (FE) is emerging for low-cost, light-weight wearable electronics, artificial skins and IoT nodes, benefiting from its low-cost fabrication and mechanical flexibility. How-ever, the low temperature requirement for fabrication on a flexible substrate and the large-area nature of flexible sensor arrays inevitably result in inadequate device yield, reliability and stability. Therefore, it is essential to develop design methodologies for large area sensing applications which can ensure system robustness with-out relying on highly reliable devices. Based on the observation that most signals sensed by body sensor arrays exhibit sparse statistical characteristics, we propose a system design method which lever-ages the sparse nature via compressed sensing (CS). Specifically, we use flexible circuitry to implement a CS encoder and decode the compressed signal in the silicon side. As a system demonstration, we fabricated the temperature sensor array, shift register and amplifier to illustrate the feasibility of the encoder design using carbon-nanotube-based flexible thin-film transistors. To evaluate the improvement of system robustness achieved by the proposed sensing schema, we conducted two case studies: temperature imaging and tactile-sensor based object recognition. With ∼10% sparse errors (due to either device defects or transient errors), we achieved reduction of root-mean-square-error (RMSE) from 0.20 to 0.05 for temperature sensing and boost the classification accuracy from 65% to 84% for tactile-sensing based object recognition.
由于其低成本制造和机械灵活性,大面积柔性电子产品(FE)正在出现在低成本,轻质可穿戴电子产品,人造皮肤和物联网节点中。然而,在柔性衬底上制造的低温要求和柔性传感器阵列的大面积性质不可避免地导致器件成品率,可靠性和稳定性不足。因此,开发大面积传感应用的设计方法至关重要,这些方法可以确保系统的鲁棒性,而不依赖于高度可靠的设备。基于人体传感器阵列感知到的大多数信号具有稀疏统计特征,提出了一种利用压缩感知(CS)的稀疏特性的系统设计方法。具体来说,我们使用柔性电路来实现CS编码器,并在硅侧解码压缩信号。作为系统演示,我们制作了温度传感器阵列,移位寄存器和放大器,以说明使用碳纳米管柔性薄膜晶体管设计编码器的可行性。为了评估所提出的传感模式对系统鲁棒性的改善,我们进行了两个案例研究:温度成像和基于触觉传感器的物体识别。在约10%的稀疏误差(由于设备缺陷或瞬态误差)下,我们实现了将温度传感的均方根误差(RMSE)从0.20降低到0.05,并将基于触觉传感的物体识别的分类精度从65%提高到84%。
{"title":"Robust Design of Large Area Flexible Electronics via Compressed Sensing","authors":"Leilai Shao, Ting Lei, Tsung-Ching Huang, Zhenan Bao, Kwang-Ting Cheng","doi":"10.1109/DAC18072.2020.9218570","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218570","url":null,"abstract":"Large area flexible electronics (FE) is emerging for low-cost, light-weight wearable electronics, artificial skins and IoT nodes, benefiting from its low-cost fabrication and mechanical flexibility. How-ever, the low temperature requirement for fabrication on a flexible substrate and the large-area nature of flexible sensor arrays inevitably result in inadequate device yield, reliability and stability. Therefore, it is essential to develop design methodologies for large area sensing applications which can ensure system robustness with-out relying on highly reliable devices. Based on the observation that most signals sensed by body sensor arrays exhibit sparse statistical characteristics, we propose a system design method which lever-ages the sparse nature via compressed sensing (CS). Specifically, we use flexible circuitry to implement a CS encoder and decode the compressed signal in the silicon side. As a system demonstration, we fabricated the temperature sensor array, shift register and amplifier to illustrate the feasibility of the encoder design using carbon-nanotube-based flexible thin-film transistors. To evaluate the improvement of system robustness achieved by the proposed sensing schema, we conducted two case studies: temperature imaging and tactile-sensor based object recognition. With ∼10% sparse errors (due to either device defects or transient errors), we achieved reduction of root-mean-square-error (RMSE) from 0.20 to 0.05 for temperature sensing and boost the classification accuracy from 65% to 84% for tactile-sensing based object recognition.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning From A Big Brother - Mimicking Neural Networks in Profiled Side-channel Analysis 向老大哥学习--在侧信道轮廓分析中模仿神经网络
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218520
Daan van der Valk, Marina Krček, S. Picek, S. Bhasin
Recently, deep learning has emerged as a powerful technique for side-channel attacks, capable of even breaking common countermeasures. Still, trained models are generally large, and thus, performing evaluation becomes resource-intensive. The resource requirements increase in realistic settings where traces can be noisy, and countermeasures are active. In this work, we exploit mimicking to compress the learned models. We demonstrate up to 300 times compression of a state-of-the-art CNN. The mimic shallow network can also achieve much better accuracy as compared to when trained on original data and even reach the performance of a deeper network.
最近,深度学习已成为一种强大的侧信道攻击技术,甚至能够破解常见的反制措施。不过,训练好的模型一般都比较大,因此进行评估时会耗费大量资源。在轨迹可能存在噪声且反制措施活跃的现实环境中,资源需求会增加。在这项工作中,我们利用模仿来压缩所学模型。我们展示了比最先进的 CNN 压缩多达 300 倍的效果。与在原始数据上训练时相比,模仿浅层网络还能获得更好的准确性,甚至达到更深层网络的性能。
{"title":"Learning From A Big Brother - Mimicking Neural Networks in Profiled Side-channel Analysis","authors":"Daan van der Valk, Marina Krček, S. Picek, S. Bhasin","doi":"10.1109/DAC18072.2020.9218520","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218520","url":null,"abstract":"Recently, deep learning has emerged as a powerful technique for side-channel attacks, capable of even breaking common countermeasures. Still, trained models are generally large, and thus, performing evaluation becomes resource-intensive. The resource requirements increase in realistic settings where traces can be noisy, and countermeasures are active. In this work, we exploit mimicking to compress the learned models. We demonstrate up to 300 times compression of a state-of-the-art CNN. The mimic shallow network can also achieve much better accuracy as compared to when trained on original data and even reach the performance of a deeper network.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Machine Leaming to Set Meta-Heuristic Specific Parameters for High-Level Synthesis Design Space Exploration 机器学习为高层次综合设计空间探索设置元启发式特定参数
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218674
Z. Wang, B. C. Schafer
Raising the level of VLSI design abstraction to C leads to many advantages compared to the use of low-level Hardware Description Languages (HDLs). One key advantage is that it allows the generation of micro-architectures with different trade-offs by simply setting unique combinations of synthesis options. Because the number of these synthesis options is typically very large, exhaustive enumerations are not possible. Hence, heuristics are required. Meta-heuristics like Simulated Annealing (SA), Genetic Algorithm (GA) and Ant Colony Optimizations (ACO) have shown to lead to good results for these types of multi-objective optimization problems. The main problem with these meta-heuristics is that they are very sensitive to their hyper-parameter settings, e.g. in the GA case, the mutation and crossover rate and the number of parents pairs. To address this, in this work we present a machine learning based approach to automatically set the search parameters for these three meta-heuristics such that a new unseen behavioral description given in C can be effectively explored. Moreover, we present an exploration technique that combines the SA, GA and ACO together and show that our proposed exploration method outperforms a single meta-heuristic.
与使用低级硬件描述语言(hdl)相比,将VLSI设计抽象级别提高到C具有许多优势。一个关键的优点是,它允许通过简单地设置合成选项的独特组合来生成具有不同权衡的微架构。由于这些合成选项的数量通常非常大,因此不可能进行详尽的枚举。因此,启发式是必需的。模拟退火(SA)、遗传算法(GA)和蚁群优化(ACO)等元启发式方法在这类多目标优化问题中取得了良好的效果。这些元启发式的主要问题是它们对超参数设置非常敏感,例如,在遗传的情况下,突变和交叉率以及父母对的数量。为了解决这个问题,在这项工作中,我们提出了一种基于机器学习的方法来自动设置这三种元启发式的搜索参数,以便可以有效地探索C语言中给出的新的看不见的行为描述。此外,我们提出了一种结合了SA、GA和ACO的探索技术,并表明我们提出的探索方法优于单一的元启发式方法。
{"title":"Machine Leaming to Set Meta-Heuristic Specific Parameters for High-Level Synthesis Design Space Exploration","authors":"Z. Wang, B. C. Schafer","doi":"10.1109/DAC18072.2020.9218674","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218674","url":null,"abstract":"Raising the level of VLSI design abstraction to C leads to many advantages compared to the use of low-level Hardware Description Languages (HDLs). One key advantage is that it allows the generation of micro-architectures with different trade-offs by simply setting unique combinations of synthesis options. Because the number of these synthesis options is typically very large, exhaustive enumerations are not possible. Hence, heuristics are required. Meta-heuristics like Simulated Annealing (SA), Genetic Algorithm (GA) and Ant Colony Optimizations (ACO) have shown to lead to good results for these types of multi-objective optimization problems. The main problem with these meta-heuristics is that they are very sensitive to their hyper-parameter settings, e.g. in the GA case, the mutation and crossover rate and the number of parents pairs. To address this, in this work we present a machine learning based approach to automatically set the search parameters for these three meta-heuristics such that a new unseen behavioral description given in C can be effectively explored. Moreover, we present an exploration technique that combines the SA, GA and ACO together and show that our proposed exploration method outperforms a single meta-heuristic.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127984312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Routing Topology and Time-Division Multiplexing Co-Optimization for Multi-FPGA Systems 多fpga系统的路由拓扑与时分复用协同优化
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218667
Tung-Wei Lin, Wei-Chen Tai, Yu-Cheng Lin, I. Jiang
Time-division multiplexing (TDM) is widely used to overcome bandwidth limitations and thus enhances routability in multi-FPGA systems due to the shortage of I/O pins in an FPGA. However, multiplexed signals induce significant delays. To evaluate timing degradation, nets with similar criticalities are often grouped to form NetGroups. In this paper, we propose a framework concerning routing topology and time-division multiplexing co-optimization for multi-FPGA systems. The proposed framework first generates high-quality topologies considering Net-Group criticalities. Then, inspired by column generation, TDM ratio assignment is solved optimally by Lagrangian relaxation. Experimental results show that our approach outperforms the top three entries of ICCAD 2019 CAD Contest. Moreover, our TDM ratio assignment algorithm can further improve the results of the top three winners to almost as good as ours.
时分多路复用(TDM)被广泛用于克服带宽限制,从而提高多FPGA系统中由于FPGA的I/O引脚不足而导致的可达性。然而,多路复用信号会引起明显的延迟。为了评估时间退化,具有相似临界的网络通常被分组形成网络组。在本文中,我们提出了一个关于多fpga系统的路由拓扑和时分复用协同优化框架。所提出的框架首先考虑到网络-组的关键性,生成高质量的拓扑结构。然后,受列生成的启发,采用拉格朗日松弛法最优求解TDM比例分配问题。实验结果表明,我们的方法优于ICCAD 2019 CAD大赛的前三名。此外,我们的TDM比例分配算法可以进一步提高前三名的结果,几乎和我们的结果一样好。
{"title":"Routing Topology and Time-Division Multiplexing Co-Optimization for Multi-FPGA Systems","authors":"Tung-Wei Lin, Wei-Chen Tai, Yu-Cheng Lin, I. Jiang","doi":"10.1109/DAC18072.2020.9218667","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218667","url":null,"abstract":"Time-division multiplexing (TDM) is widely used to overcome bandwidth limitations and thus enhances routability in multi-FPGA systems due to the shortage of I/O pins in an FPGA. However, multiplexed signals induce significant delays. To evaluate timing degradation, nets with similar criticalities are often grouped to form NetGroups. In this paper, we propose a framework concerning routing topology and time-division multiplexing co-optimization for multi-FPGA systems. The proposed framework first generates high-quality topologies considering Net-Group criticalities. Then, inspired by column generation, TDM ratio assignment is solved optimally by Lagrangian relaxation. Experimental results show that our approach outperforms the top three entries of ICCAD 2019 CAD Contest. Moreover, our TDM ratio assignment algorithm can further improve the results of the top three winners to almost as good as ours.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Topological Structure and Physical Layout Codesign for Wavelength-Routed Optical Networks-on-Chip 波长路由片上光网络拓扑结构与物理布局协同设计
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218625
Yu-Sheng Lu, Sheng-Jung Yu, Yao-Wen Chang
The wavelength-routed optical network-on-chip (WRONoC) is a promising solution for signal transmission in modern system-on-chip (SoC) designs. Previous works do not handle three main issues for WRONoCs: correlations between the topological structure and physical layout, trade-offs between the maximum insertion loss and wavelength power, and a fully automated flow to generate predictable designs. As a result, the insertion loss estimation is inaccurate, and thus only suboptimal results are obtained. To remedy these disadvantages, we present a fully automated topological structure and physical layout codesign flow to minimize the maximum insertion loss and the wavelength power simultaneously with a significant speedup. Experimental results show that our codesign flow significantly outperforms state-of-the-art works in the maximum insertion loss, wavelength power, and runtimes.
波长路由片上光网络(WRONoC)是现代片上系统(SoC)设计中一种很有前途的信号传输解决方案。以前的工作没有处理WRONoCs的三个主要问题:拓扑结构和物理布局之间的相关性,最大插入损耗和波长功率之间的权衡,以及生成可预测设计的全自动流程。因此,插入损失估计是不准确的,因此只能得到次优结果。为了弥补这些缺点,我们提出了一个完全自动化的拓扑结构和物理布局协同设计流程,以最大限度地减少最大插入损耗和波长功率,同时显著加快。实验结果表明,我们的协同设计流程在最大插入损耗、波长功率和运行时间方面明显优于最先进的工作。
{"title":"Topological Structure and Physical Layout Codesign for Wavelength-Routed Optical Networks-on-Chip","authors":"Yu-Sheng Lu, Sheng-Jung Yu, Yao-Wen Chang","doi":"10.1109/DAC18072.2020.9218625","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218625","url":null,"abstract":"The wavelength-routed optical network-on-chip (WRONoC) is a promising solution for signal transmission in modern system-on-chip (SoC) designs. Previous works do not handle three main issues for WRONoCs: correlations between the topological structure and physical layout, trade-offs between the maximum insertion loss and wavelength power, and a fully automated flow to generate predictable designs. As a result, the insertion loss estimation is inaccurate, and thus only suboptimal results are obtained. To remedy these disadvantages, we present a fully automated topological structure and physical layout codesign flow to minimize the maximum insertion loss and the wavelength power simultaneously with a significant speedup. Experimental results show that our codesign flow significantly outperforms state-of-the-art works in the maximum insertion loss, wavelength power, and runtimes.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115977592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning to Predict IR Drop with Effective Training for ReRAM-based Neural Network Hardware 基于reram的神经网络硬件有效训练预测IR下降
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218735
Sugil Lee, Giju Jung, M. Fouda, Jongeun Lee, A. Eltawil, F. Kurdahi
Due to the inevitability of the IR drop problem in passive ReRAM crossbar arrays, finding a software solution that can predict the effect of IR drop without the need of expensive SPICE simulations, is very desirable. In this paper, two simple neural networks are proposed as software solution to predict the effect of IR drop. These networks can be easily integrated in any deep neural network framework to incorporate the IR drop problem during training. As an example, the proposed solution is integrated in BinaryNet framework and the test validation results, done through SPICE simulations, show very high improvement in performance close to the baseline performance, which demonstrates the efficacy of the proposed method. In addition, the proposed solution outperforms the prior work on challenging datasets such as CIFAR10 and SVHN.
由于无源ReRAM交叉栅阵列中不可避免的红外下降问题,找到一种无需昂贵的SPICE模拟即可预测红外下降效果的软件解决方案是非常可取的。本文提出了两种简单的神经网络作为预测红外下降影响的软件解决方案。这些网络可以很容易地集成到任何深度神经网络框架中,以解决训练过程中的IR下降问题。作为实例,将该方法集成到BinaryNet框架中,通过SPICE仿真测试验证结果表明,该方法的性能得到了非常高的提高,接近于基线性能,证明了该方法的有效性。此外,提出的解决方案在具有挑战性的数据集(如CIFAR10和SVHN)上优于先前的工作。
{"title":"Learning to Predict IR Drop with Effective Training for ReRAM-based Neural Network Hardware","authors":"Sugil Lee, Giju Jung, M. Fouda, Jongeun Lee, A. Eltawil, F. Kurdahi","doi":"10.1109/DAC18072.2020.9218735","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218735","url":null,"abstract":"Due to the inevitability of the IR drop problem in passive ReRAM crossbar arrays, finding a software solution that can predict the effect of IR drop without the need of expensive SPICE simulations, is very desirable. In this paper, two simple neural networks are proposed as software solution to predict the effect of IR drop. These networks can be easily integrated in any deep neural network framework to incorporate the IR drop problem during training. As an example, the proposed solution is integrated in BinaryNet framework and the test validation results, done through SPICE simulations, show very high improvement in performance close to the baseline performance, which demonstrates the efficacy of the proposed method. In addition, the proposed solution outperforms the prior work on challenging datasets such as CIFAR10 and SVHN.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132472697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1