首页 > 最新文献

Proceedings of the Great Lakes Symposium on VLSI 2022最新文献

英文 中文
Energy-Efficient In-SRAM Accumulation for CMOS-based CNN Accelerators 基于cmos的CNN加速器的高效内存储器积累
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530319
Wanqian Li, Yinhe Han, Xiaoming Chen
State-of-the-art convolutional neural network (CNN) accelerators are typically communication-dominate architectures. To reduce the energy consumption of data accesses and also to maintain the high performance, researches have adopted large amounts of on-chip register resources and proposed various methods to concentrate communication on on-chip register accesses. As a result, the on-chip register accesses become the energy bottleneck. To further reduce the energy consumption, in this work we propose an in-SRAM accumulation architecture to replace the conventional register files and digital accumulators in the processing elements of CNN accelerators. Compared with the existing in-SRAM computing approaches (which may not be targeted at CNN accelerators), the presented in-SRAM computing architecture not only realizes in-memory accumulation, but also solves the structure contention problem which occurs frequently when embedding in-memory architectures into CNN accelerators. HSPICE simulation results based on the 45nm technology demonstrate that with the proposed in-SRAM accumulator, the overall energy efficiency of a state-of-the-art communication-optimal CNN accelerator is increased by 29% on average.
最先进的卷积神经网络(CNN)加速器是典型的通信主导架构。为了降低数据访问的能耗,同时保持高性能,研究人员采用了大量的片上寄存器资源,并提出了各种将通信集中在片上寄存器访问上的方法。因此,片上寄存器的存取成为能量的瓶颈。为了进一步降低能耗,本文提出了一种sram内累加架构来取代传统的CNN加速器处理单元中的寄存器文件和数字累加器。与现有的in-SRAM计算方法(可能不是针对CNN加速器)相比,本文提出的in-SRAM计算架构不仅实现了内存积累,而且解决了在CNN加速器中嵌入内存架构时经常出现的结构争用问题。基于45nm技术的HSPICE仿真结果表明,采用所提出的sram内蓄能器,最先进的通信优化CNN加速器的整体能量效率平均提高了29%。
{"title":"Energy-Efficient In-SRAM Accumulation for CMOS-based CNN Accelerators","authors":"Wanqian Li, Yinhe Han, Xiaoming Chen","doi":"10.1145/3526241.3530319","DOIUrl":"https://doi.org/10.1145/3526241.3530319","url":null,"abstract":"State-of-the-art convolutional neural network (CNN) accelerators are typically communication-dominate architectures. To reduce the energy consumption of data accesses and also to maintain the high performance, researches have adopted large amounts of on-chip register resources and proposed various methods to concentrate communication on on-chip register accesses. As a result, the on-chip register accesses become the energy bottleneck. To further reduce the energy consumption, in this work we propose an in-SRAM accumulation architecture to replace the conventional register files and digital accumulators in the processing elements of CNN accelerators. Compared with the existing in-SRAM computing approaches (which may not be targeted at CNN accelerators), the presented in-SRAM computing architecture not only realizes in-memory accumulation, but also solves the structure contention problem which occurs frequently when embedding in-memory architectures into CNN accelerators. HSPICE simulation results based on the 45nm technology demonstrate that with the proposed in-SRAM accumulator, the overall energy efficiency of a state-of-the-art communication-optimal CNN accelerator is increased by 29% on average.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124096202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of Machine Learning for Electronic Design Automation 电子设计自动化中的机器学习研究综述
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530834
Kevin Immanuel Gubbi, Sayed Aresh Beheshti-Shirazi, T. Sheaves, Soheil Salehi, Sai Manoj Pudukotai Dinakarrao, S. Rafatirad, Avesta Sasan, H. Homayoun
An increase in demand for semiconductor ICs, recent advancements in machine learning, and the slowing down of Moore's law have all contributed to the increased interest in using Machine Learning (ML) to enhance Electronic Design Automation (EDA) and Computer-Aided Design (CAD) tools and processes. This paper provides a comprehensive survey of available EDA and CAD tools, methods, processes, and techniques for Integrated Circuits (ICs) that use machine learning algorithms. The ML-based EDA/CAD tools are classified based on the IC design steps. They are utilized in Synthesis, Physical Design (Floorplanning, Placement, Clock Tree Synthesis, Routing), IR drop analysis, Static Timing Analysis (STA), Design for Test (DFT), Power Delivery Network analysis, and Sign-off. The current landscape of ML-based VLSI-CAD tools, current trends, and future perspectives of ML in VLSI-CAD are also discussed.
半导体集成电路需求的增加、机器学习的最新进展以及摩尔定律的放缓,都促进了人们对使用机器学习(ML)来增强电子设计自动化(EDA)和计算机辅助设计(CAD)工具和流程的兴趣的增加。本文提供了使用机器学习算法的集成电路(ic)的可用EDA和CAD工具,方法,过程和技术的全面调查。基于机器学习的EDA/CAD工具根据集成电路设计步骤进行了分类。它们用于综合、物理设计(平面规划、布局、时钟树综合、路由)、IR下降分析、静态时序分析(STA)、测试设计(DFT)、电力输送网络分析和签字。本文还讨论了基于ML的VLSI-CAD工具的现状、当前趋势以及ML在VLSI-CAD中的未来前景。
{"title":"Survey of Machine Learning for Electronic Design Automation","authors":"Kevin Immanuel Gubbi, Sayed Aresh Beheshti-Shirazi, T. Sheaves, Soheil Salehi, Sai Manoj Pudukotai Dinakarrao, S. Rafatirad, Avesta Sasan, H. Homayoun","doi":"10.1145/3526241.3530834","DOIUrl":"https://doi.org/10.1145/3526241.3530834","url":null,"abstract":"An increase in demand for semiconductor ICs, recent advancements in machine learning, and the slowing down of Moore's law have all contributed to the increased interest in using Machine Learning (ML) to enhance Electronic Design Automation (EDA) and Computer-Aided Design (CAD) tools and processes. This paper provides a comprehensive survey of available EDA and CAD tools, methods, processes, and techniques for Integrated Circuits (ICs) that use machine learning algorithms. The ML-based EDA/CAD tools are classified based on the IC design steps. They are utilized in Synthesis, Physical Design (Floorplanning, Placement, Clock Tree Synthesis, Routing), IR drop analysis, Static Timing Analysis (STA), Design for Test (DFT), Power Delivery Network analysis, and Sign-off. The current landscape of ML-based VLSI-CAD tools, current trends, and future perspectives of ML in VLSI-CAD are also discussed.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
CAD-FSL: Code-Aware Data Generation based Few-Shot Learning for Efficient Malware Detection CAD-FSL:基于代码感知的数据生成,用于有效的恶意软件检测
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530825
Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, H. Homayoun, Sai Manoj Pudukotai Dinakarrao
One of the pivotal security threats for embedded computing systems is malicious softwarea.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require updating the ML model frequently with newer benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation-based few-shot learning technique. CAD-FSL generates multiple mutated samples of the limitedly seen malware for efficient malware detection. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware, restore malware functionality and mitigate the impractical samples. Such developed synthetic malware is incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited (few-shot) exposure. The experimental results demonstrate that with the proposed "Code-Aware Data Generation" technique, we detect malware with 90% accuracy, which is approximately 9% higher while training classifiers with only limitedly available training data.
嵌入式计算系统的主要安全威胁之一是恶意软件。近年来,机器学习(ML)以其高效和有效的特点被广泛应用于恶意软件检测。尽管效率很高,但现有技术需要经常使用较新的良性和恶意软件样本更新ML模型,以训练和建模有效的恶意软件检测器。此外,由于缺乏有效训练所需的足够的恶意软件样本,这些约束限制了对新兴恶意软件样本的检测。为了解决这些问题,我们引入了一种基于代码感知数据生成的少镜头学习技术。CAD-FSL生成有限的恶意软件的多个突变样本,用于有效的恶意软件检测。损失最小化确保生成的样本紧密模仿有限的恶意软件,恢复恶意软件的功能,并减轻不切实际的样本。这种开发的合成恶意软件被纳入训练集,以制定模型,可以有效地检测新出现的恶意软件,尽管有有限的(几次)暴露。实验结果表明,使用本文提出的“代码感知数据生成”技术,我们检测恶意软件的准确率达到90%,与仅使用有限的训练数据训练分类器相比,准确率提高了约9%。
{"title":"CAD-FSL: Code-Aware Data Generation based Few-Shot Learning for Efficient Malware Detection","authors":"Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, H. Homayoun, Sai Manoj Pudukotai Dinakarrao","doi":"10.1145/3526241.3530825","DOIUrl":"https://doi.org/10.1145/3526241.3530825","url":null,"abstract":"One of the pivotal security threats for embedded computing systems is malicious softwarea.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require updating the ML model frequently with newer benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation-based few-shot learning technique. CAD-FSL generates multiple mutated samples of the limitedly seen malware for efficient malware detection. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware, restore malware functionality and mitigate the impractical samples. Such developed synthetic malware is incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited (few-shot) exposure. The experimental results demonstrate that with the proposed \"Code-Aware Data Generation\" technique, we detect malware with 90% accuracy, which is approximately 9% higher while training classifiers with only limitedly available training data.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122960543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
RVVRadar: A Framework for Supporting the Programmer in Vectorization for RISC-V RVVRadar:一个支持RISC-V矢量化编程的框架
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530388
Lucas Klemmer, Manfred Schlägl, Daniel Große
In this paper, we present RVVRadar, a framework to support the programmer over the four major steps of development, verification, measurement, and evaluation during the vectorization process of an algorithm. We demonstrate the advantages of RVVRadar for vectorization on several practical relevant algorithms. This includes in particular the widely-used libpng library where we vectorized all filter computations resulting in speedups of up to 5.43. We made RVVRadar as well as all benchmarks (including the RVV-based libpng) open source.
在本文中,我们提出了RVVRadar,这是一个框架,用于支持程序员在算法矢量化过程中的开发,验证,测量和评估四个主要步骤。在几个实际的相关算法上,我们展示了RVVRadar在矢量化方面的优势。这特别包括广泛使用的libpng库,我们对所有过滤器计算进行了矢量化,从而使速度提高了5.43。我们将RVVRadar以及所有基准测试(包括基于rvv的libpng)开源。
{"title":"RVVRadar: A Framework for Supporting the Programmer in Vectorization for RISC-V","authors":"Lucas Klemmer, Manfred Schlägl, Daniel Große","doi":"10.1145/3526241.3530388","DOIUrl":"https://doi.org/10.1145/3526241.3530388","url":null,"abstract":"In this paper, we present RVVRadar, a framework to support the programmer over the four major steps of development, verification, measurement, and evaluation during the vectorization process of an algorithm. We demonstrate the advantages of RVVRadar for vectorization on several practical relevant algorithms. This includes in particular the widely-used libpng library where we vectorized all filter computations resulting in speedups of up to 5.43. We made RVVRadar as well as all benchmarks (including the RVV-based libpng) open source.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116685170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 2A: Hardware Security 会话详细信息:会话2A:硬件安全
Pub Date : 2022-06-06 DOI: 10.1145/3542684
K. Gaj
{"title":"Session details: Session 2A: Hardware Security","authors":"K. Gaj","doi":"10.1145/3542684","DOIUrl":"https://doi.org/10.1145/3542684","url":null,"abstract":"","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125585051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for the Selection of Applied Tests when a Stored Test Produces Many Applied Tests 当存储的测试产生许多应用测试时,用于选择应用测试的算法
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530359
Hari Addepalli, I. Pomeranz
Improving the quality of a test set without storing additional tests is important when a higher fault coverage is required but test data storage is limited. Such an improvement can be achieved by using every test in a base test set to apply several different tests. In this paper, we consider the case where a base test set for basic faults is used for detecting more complex faults. Depending on the operators used for producing different applied tests, the number of options available for applied tests can be large. In such cases it is necessary to select a subset of applied tests from all the available ones. We develop algorithms for solving this problem in two scenarios. In the first scenario additional coverage is required for a small subset of faults associated with specific gates. In the second scenario additional coverage is required for the entire circuit. Experimental results are presented for benchmark circuits and logic blocks of the OpenSPARC T1 microprocessor to demonstrate the effectiveness of these algorithms.
当需要更高的故障覆盖率但测试数据存储有限时,在不存储额外测试的情况下提高测试集的质量非常重要。这种改进可以通过使用基本测试集中的每个测试来应用几个不同的测试来实现。在本文中,我们考虑使用基本故障的基本测试集来检测更复杂的故障。根据用于生成不同应用测试的操作符,可用于应用测试的选项数量可能很大。在这种情况下,有必要从所有可用的测试中选择应用测试的子集。我们在两种情况下开发了解决这个问题的算法。在第一个场景中,需要对与特定门相关的一小部分故障进行额外的覆盖。在第二种情况下,需要对整个电路进行额外的覆盖。在OpenSPARC T1微处理器的基准电路和逻辑块上进行了实验,验证了这些算法的有效性。
{"title":"Algorithms for the Selection of Applied Tests when a Stored Test Produces Many Applied Tests","authors":"Hari Addepalli, I. Pomeranz","doi":"10.1145/3526241.3530359","DOIUrl":"https://doi.org/10.1145/3526241.3530359","url":null,"abstract":"Improving the quality of a test set without storing additional tests is important when a higher fault coverage is required but test data storage is limited. Such an improvement can be achieved by using every test in a base test set to apply several different tests. In this paper, we consider the case where a base test set for basic faults is used for detecting more complex faults. Depending on the operators used for producing different applied tests, the number of options available for applied tests can be large. In such cases it is necessary to select a subset of applied tests from all the available ones. We develop algorithms for solving this problem in two scenarios. In the first scenario additional coverage is required for a small subset of faults associated with specific gates. In the second scenario additional coverage is required for the entire circuit. Experimental results are presented for benchmark circuits and logic blocks of the OpenSPARC T1 microprocessor to demonstrate the effectiveness of these algorithms.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128801453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEGA-MAC: A Merged Accumulation based Approximate MAC Unit for Error Resilient Applications MEGA-MAC:一种基于合并累积的近似MAC单元,用于纠错应用
Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530384
Vishesh Mishra, Sparsh Mittal, Saurabh Singh, Divy Pandey, Rekha Singhal
This paper proposes a novel merged-accumulation-based approximate MAC (multiply-accumulate) unit, MEGA-MAC, for accelerating error-resilient applications. MEGA-MAC utilizes a novel rearrangement and compression strategy in the multiplication stage and a novel approximate "carry predicting adder" (CPA) in the accumulation stage. Addition and multiplication operations are merged, which reduces the delay. MEGA-MAC provides knobs to exercise a tradeoff between accuracy and resource overhead. Compared to the accurate MAC unit, MEGA-MAC(8,6) (i.e., a MEGA-MAC unit with a chunk size of 6 bits, operating on 8-bit input operands) reduces the power-delay-product (PDP) by 49.4%, while incurring a mean error percentage of only 4.2%. Compared to state-of-art approximate MAC units, MEGA-MAC achieves a better balance between resource-saving and accuracy-loss. The source code is available at https://sites.google.com/view/mega-mac-approximate-mac-unit/.
本文提出了一种新的基于合并累积的近似MAC(乘累积)单元,MEGA-MAC,用于加速容错应用。MEGA-MAC在乘法阶段采用了一种新的重排和压缩策略,在积累阶段采用了一种新的近似“进位预测加法器”(CPA)。加法和乘法运算合并,减少了延迟。MEGA-MAC提供了在准确性和资源开销之间进行权衡的旋钮。与精确的MAC单元相比,MEGA-MAC(8,6)(即,具有6位块大小的MEGA-MAC单元,在8位输入操作数上操作)将功率延迟积(PDP)降低了49.4%,而平均错误率仅为4.2%。与最先进的近似MAC单元相比,MEGA-MAC在资源节约和精度损失之间实现了更好的平衡。源代码可从https://sites.google.com/view/mega-mac-approximate-mac-unit/获得。
{"title":"MEGA-MAC: A Merged Accumulation based Approximate MAC Unit for Error Resilient Applications","authors":"Vishesh Mishra, Sparsh Mittal, Saurabh Singh, Divy Pandey, Rekha Singhal","doi":"10.1145/3526241.3530384","DOIUrl":"https://doi.org/10.1145/3526241.3530384","url":null,"abstract":"This paper proposes a novel merged-accumulation-based approximate MAC (multiply-accumulate) unit, MEGA-MAC, for accelerating error-resilient applications. MEGA-MAC utilizes a novel rearrangement and compression strategy in the multiplication stage and a novel approximate \"carry predicting adder\" (CPA) in the accumulation stage. Addition and multiplication operations are merged, which reduces the delay. MEGA-MAC provides knobs to exercise a tradeoff between accuracy and resource overhead. Compared to the accurate MAC unit, MEGA-MAC(8,6) (i.e., a MEGA-MAC unit with a chunk size of 6 bits, operating on 8-bit input operands) reduces the power-delay-product (PDP) by 49.4%, while incurring a mean error percentage of only 4.2%. Compared to state-of-art approximate MAC units, MEGA-MAC achieves a better balance between resource-saving and accuracy-loss. The source code is available at https://sites.google.com/view/mega-mac-approximate-mac-unit/.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125822538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RACE: A Reinforcement Learning Framework for Improved Adaptive Control of NoC Channel Buffers RACE:用于改进NoC信道缓冲自适应控制的强化学习框架
Pub Date : 2022-05-26 DOI: 10.1145/3526241.3530335
Kamil Khan, S. Pasricha, R. Kim
Network-on-chip (NoC) architectures rely on buffers to store flits to cope with contention for router resources during packet switching. Recently, reversible multi-function channel (RMC) buffers have been proposed to simultaneously reduce power and enable adaptive NoC buffering between adjacent routers. While adaptive buffering can improve NoC performance by maximizing buffer utilization, controlling the RMC buffer allocations requires a congestion-aware, scalable, and proactive policy. In this work, we present RACE, a novel reinforcement learning (RL) framework that utilizes better awareness of network congestion and a new reward metric ("falsefulls") to help guide the RL agent towards better RMC buffer control decisions. We show that RACE reduces NoC latency by up to 48.9%, and energy consumption by up to 47.1% against state-of-the-art NoC buffer control policies.
片上网络(NoC)体系结构依靠缓冲区来存储文件,以应对分组交换过程中对路由器资源的争夺。最近,人们提出了可逆多功能通道(RMC)缓冲区,以同时降低功耗并实现相邻路由器之间的自适应NoC缓冲。虽然自适应缓冲可以通过最大化缓冲区利用率来提高NoC性能,但控制RMC缓冲区分配需要一个感知拥塞、可扩展和主动的策略。在这项工作中,我们提出了RACE,这是一种新的强化学习(RL)框架,它利用更好的网络拥塞意识和新的奖励指标(“错误”)来帮助引导RL代理做出更好的RMC缓冲控制决策。我们展示了RACE将NoC延迟降低了48.9%,并将能耗降低了47.1%,与最先进的NoC缓冲控制策略相比。
{"title":"RACE: A Reinforcement Learning Framework for Improved Adaptive Control of NoC Channel Buffers","authors":"Kamil Khan, S. Pasricha, R. Kim","doi":"10.1145/3526241.3530335","DOIUrl":"https://doi.org/10.1145/3526241.3530335","url":null,"abstract":"Network-on-chip (NoC) architectures rely on buffers to store flits to cope with contention for router resources during packet switching. Recently, reversible multi-function channel (RMC) buffers have been proposed to simultaneously reduce power and enable adaptive NoC buffering between adjacent routers. While adaptive buffering can improve NoC performance by maximizing buffer utilization, controlling the RMC buffer allocations requires a congestion-aware, scalable, and proactive policy. In this work, we present RACE, a novel reinforcement learning (RL) framework that utilizes better awareness of network congestion and a new reward metric (\"falsefulls\") to help guide the RL agent towards better RMC buffer control decisions. We show that RACE reduces NoC latency by up to 48.9%, and energy consumption by up to 47.1% against state-of-the-art NoC buffer control policies.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"8 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120844317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Embedded Systems Education in the 2020s: Challenges, Reflections, and Future Directions 21世纪20年代的嵌入式系统教育:挑战、反思和未来方向
Pub Date : 2022-05-17 DOI: 10.1145/3526241.3530348
S. Pasricha
Embedded computing systems are pervasive in our everyday lives, imparting digital intelligence to a variety of electronic platforms used in our vehicles, smart appliances, wearables, mobile devices, and computers. The need to train the next generation of embedded systems designers and engineers with relevant skills across hardware, software, and their co-design remains pressing today. This paper describes the evolution of embedded systems education over the past two decades and challenges facing the designers and instructors of embedded systems curricula in the 2020s. Reflections from over a decade of teaching the design of embedded computing systems are presented, with insights on strategies that show promise to address these challenges. Lastly, some important future directions in embedded systems education are highlighted.
嵌入式计算系统在我们的日常生活中无处不在,为我们的车辆、智能家电、可穿戴设备、移动设备和计算机中使用的各种电子平台赋予数字智能。培训下一代嵌入式系统设计师和工程师的相关技能,包括硬件、软件及其协同设计,在今天仍然很紧迫。本文描述了过去二十年来嵌入式系统教育的发展,以及本世纪20年代嵌入式系统课程的设计者和教师面临的挑战。本文介绍了十多年来教学嵌入式计算系统设计的感想,以及对解决这些挑战的策略的见解。最后,对嵌入式系统教育的发展方向进行了展望。
{"title":"Embedded Systems Education in the 2020s: Challenges, Reflections, and Future Directions","authors":"S. Pasricha","doi":"10.1145/3526241.3530348","DOIUrl":"https://doi.org/10.1145/3526241.3530348","url":null,"abstract":"Embedded computing systems are pervasive in our everyday lives, imparting digital intelligence to a variety of electronic platforms used in our vehicles, smart appliances, wearables, mobile devices, and computers. The need to train the next generation of embedded systems designers and engineers with relevant skills across hardware, software, and their co-design remains pressing today. This paper describes the evolution of embedded systems education over the past two decades and challenges facing the designers and instructors of embedded systems curricula in the 2020s. Reflections from over a decade of teaching the design of embedded computing systems are presented, with insights on strategies that show promise to address these challenges. Lastly, some important future directions in embedded systems education are highlighted.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121822876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization 用于非均匀量化卷积神经网络的硅光子加速器
Pub Date : 2022-05-17 DOI: 10.1145/3526241.3530364
Febin P. Sunny, M. Nikdast, S. Pasricha
Parameter quantization in convolutional neural networks (CNNs) can help generate efficient models with lower memory footprint and computational complexity. But, homogeneous quantization can result in significant degradation of CNN model accuracy. In contrast, heterogeneous quantization represents a promising approach to realize compact, quantized models with higher inference accuracies. In this paper, we propose HQNNA, a CNN accelerator based on non-coherent silicon photonics that can accelerate both homogeneously quantized and heterogeneously quantized CNN models. Our analyses show that HQNNA achieves up to 73.8x better energy-per-bit and 159.5x better throughput-energy efficiency than state-of-the-art photonic CNN accelerators
卷积神经网络(cnn)中的参数量化有助于生成具有较低内存占用和计算复杂度的高效模型。但是,均匀量化会导致CNN模型精度的显著下降。相比之下,异构量化是一种很有前途的方法,可以实现具有更高推理精度的紧凑量化模型。在本文中,我们提出了一种基于非相干硅光子学的CNN加速器HQNNA,它可以加速均匀量子化和异构量子化CNN模型。我们的分析表明,与最先进的光子CNN加速器相比,HQNNA实现了高达73.8倍的每比特能量和159.5倍的吞吐量-能量效率
{"title":"A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization","authors":"Febin P. Sunny, M. Nikdast, S. Pasricha","doi":"10.1145/3526241.3530364","DOIUrl":"https://doi.org/10.1145/3526241.3530364","url":null,"abstract":"Parameter quantization in convolutional neural networks (CNNs) can help generate efficient models with lower memory footprint and computational complexity. But, homogeneous quantization can result in significant degradation of CNN model accuracy. In contrast, heterogeneous quantization represents a promising approach to realize compact, quantized models with higher inference accuracies. In this paper, we propose HQNNA, a CNN accelerator based on non-coherent silicon photonics that can accelerate both homogeneously quantized and heterogeneously quantized CNN models. Our analyses show that HQNNA achieves up to 73.8x better energy-per-bit and 159.5x better throughput-energy efficiency than state-of-the-art photonic CNN accelerators","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124393681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the Great Lakes Symposium on VLSI 2022
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1