首页 > 最新文献

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文 中文
Revolutionizing Cyber Security: Exploring the Synergy of Machine Learning and Logical Reasoning for Cyber Threats and Mitigation 网络安全革命:探索机器学习和逻辑推理对网络威胁和缓解的协同作用
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238483
Deepak Puthal, S. Mohanty, Amit Kumar Mishra, C. Yeun, Ernesto Damiani
The integration of machine learning (ML) and logical reasoning (LR) in cyber security is an emerging field that shows great potential for improving the efficiency and effectiveness of security systems. While ML can detect anomalies and patterns in large amounts of data, LR can provide a higher-level understanding of threats and enable better decision-making. This paper explores the future of ML and LR in cyber security and highlights how the integration of these two approaches can lead to more robust security systems. We discuss several use cases that demonstrate the effectiveness of the integrated approach, such as threat detection and response, vulnerability assessment, and security policy enforcement. Finally, we identify several research directions that will help advance the field, including the development of more explainable ML models and the integration of human-in-the-loop approaches.
机器学习(ML)和逻辑推理(LR)在网络安全中的整合是一个新兴领域,在提高安全系统的效率和有效性方面显示出巨大的潜力。虽然机器学习可以检测大量数据中的异常和模式,但LR可以提供对威胁的更高层次的理解,并实现更好的决策。本文探讨了机器学习和LR在网络安全中的未来,并强调了这两种方法的集成如何导致更强大的安全系统。我们讨论了几个展示集成方法有效性的用例,例如威胁检测和响应、漏洞评估和安全策略实施。最后,我们确定了几个有助于推进该领域的研究方向,包括开发更可解释的ML模型和人在环方法的集成。
{"title":"Revolutionizing Cyber Security: Exploring the Synergy of Machine Learning and Logical Reasoning for Cyber Threats and Mitigation","authors":"Deepak Puthal, S. Mohanty, Amit Kumar Mishra, C. Yeun, Ernesto Damiani","doi":"10.1109/ISVLSI59464.2023.10238483","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238483","url":null,"abstract":"The integration of machine learning (ML) and logical reasoning (LR) in cyber security is an emerging field that shows great potential for improving the efficiency and effectiveness of security systems. While ML can detect anomalies and patterns in large amounts of data, LR can provide a higher-level understanding of threats and enable better decision-making. This paper explores the future of ML and LR in cyber security and highlights how the integration of these two approaches can lead to more robust security systems. We discuss several use cases that demonstrate the effectiveness of the integrated approach, such as threat detection and response, vulnerability assessment, and security policy enforcement. Finally, we identify several research directions that will help advance the field, including the development of more explainable ML models and the integration of human-in-the-loop approaches.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123458793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Hardware Design for the VVC Affine Motion Compensation Exploiting Multiple Constant Multiplication 利用多常数乘法实现VVC仿射运动补偿的高效硬件设计
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238551
Marcello M. Muñoz, Denis Maass, Murilo R. Perleberg, Luciano Agostini, M. Porto
The Affine Motion Estimation (AME) is a new and high-complexity task of the Versatile Video Coding (VVC) standard. The AME requires the Affine Motion Compensation (MC) to be performed for 4$times$ 4 subblocks, where one among 156-tap interpolation filters was adopted to interpolate each sample of the 4$times$ 4 subblock according to the motion vector relative to this subblock. This work presents two dedicated hardware implementations for the Affine MC of the VVC standard, the first focusing on the reduction of power dissipation and the second on the area requirement. The ASIC synthesis results of these architectures for TSMC 40nm standard cells show an area requirement of 54. 43k gates and power dissipation of 12. 8mW for the power efficient variant, while for the hardware efficient, the area requirement is 21. 91k gates and power dissipation of 14.41mW.
仿射运动估计(AME)是通用视频编码(VVC)标准中一项新的高复杂度任务。AME要求对4$times$ 4子块执行仿射运动补偿(MC),其中156个抽头中的一个根据相对于该子块的运动向量对4$times$ 4子块的每个样本进行插值。本工作提出了VVC标准仿射MC的两种专用硬件实现,第一个侧重于降低功耗,第二个侧重于面积要求。这些架构在台积电40nm标准晶片上的ASIC合成结果显示,所需面积为54。43k栅极,功耗12。功率高效型为8mW,而硬件高效型的面积要求为21。91k栅极,功耗14.41mW。
{"title":"Efficient Hardware Design for the VVC Affine Motion Compensation Exploiting Multiple Constant Multiplication","authors":"Marcello M. Muñoz, Denis Maass, Murilo R. Perleberg, Luciano Agostini, M. Porto","doi":"10.1109/ISVLSI59464.2023.10238551","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238551","url":null,"abstract":"The Affine Motion Estimation (AME) is a new and high-complexity task of the Versatile Video Coding (VVC) standard. The AME requires the Affine Motion Compensation (MC) to be performed for 4$times$ 4 subblocks, where one among 156-tap interpolation filters was adopted to interpolate each sample of the 4$times$ 4 subblock according to the motion vector relative to this subblock. This work presents two dedicated hardware implementations for the Affine MC of the VVC standard, the first focusing on the reduction of power dissipation and the second on the area requirement. The ASIC synthesis results of these architectures for TSMC 40nm standard cells show an area requirement of 54. 43k gates and power dissipation of 12. 8mW for the power efficient variant, while for the hardware efficient, the area requirement is 21. 91k gates and power dissipation of 14.41mW.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource Provisioning for CPU-FPGA Environments with Adaptive HLS-Versioning and DVFS 基于自适应hls版本控制和DVFS的CPU-FPGA环境资源配置
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238639
M. Jordan, Guilherme Korol, Tiago Knorst, M. B. Rutzig, A. C. S. Beck
Cloud warehouses have been adopting CPU-FPGA environments to accelerate clients’ applications with scalability. On the CPU side, DVFS improves energy efficiency. On the FPGA side, High-Level Synthesis enables hardware optimizations that lead to designs with variant characteristics (e.g., latency and power). Although both techniques have been used, they have never been cooperatively exploited to improve execution efficiency. For that, we propose RAHD, a framework that bridges the gap between DVFS, HLS multiple design versions, and CPU-FPGA environments. RAHD offers automatic fine-tuning selection of design versions and DVFS to efficiently balance workload, achieving 32.86x energy improvements over a standard provisioning strategy.
云仓库一直在采用CPU-FPGA环境来加速客户端应用程序的可伸缩性。在CPU方面,DVFS提高了能源效率。在FPGA方面,高级综合实现了硬件优化,导致设计具有不同的特性(例如,延迟和功耗)。虽然使用了这两种技术,但它们从未被合作地利用来提高执行效率。为此,我们提出RAHD,这是一个在DVFS、HLS多设计版本和CPU-FPGA环境之间架起桥梁的框架。RAHD提供自动微调选择设计版本和DVFS,以有效地平衡工作负载,实现比标准配置策略32.86倍的能源改进。
{"title":"Resource Provisioning for CPU-FPGA Environments with Adaptive HLS-Versioning and DVFS","authors":"M. Jordan, Guilherme Korol, Tiago Knorst, M. B. Rutzig, A. C. S. Beck","doi":"10.1109/ISVLSI59464.2023.10238639","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238639","url":null,"abstract":"Cloud warehouses have been adopting CPU-FPGA environments to accelerate clients’ applications with scalability. On the CPU side, DVFS improves energy efficiency. On the FPGA side, High-Level Synthesis enables hardware optimizations that lead to designs with variant characteristics (e.g., latency and power). Although both techniques have been used, they have never been cooperatively exploited to improve execution efficiency. For that, we propose RAHD, a framework that bridges the gap between DVFS, HLS multiple design versions, and CPU-FPGA environments. RAHD offers automatic fine-tuning selection of design versions and DVFS to efficiently balance workload, achieving 32.86x energy improvements over a standard provisioning strategy.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126541826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DREAM: Distributed Reinforcement Learning Enabled Adaptive Mixed-Critical NoC 梦想:分布式强化学习实现自适应混合临界NoC
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238569
Nidhi Anantharajaiah, Yunhe Xu, Fabian Lesniak, T. Harbaum, Jürgen Becker
Applications of different criticality sharing the same System-on-Chip (SoC) platform are increasing in popularity to reduce overall cost. Spatial and temporal isolation techniques are utilized to reduce inter application influence and to ensure real-time requirements are met. Spatial isolation involves partitioning communication resources and such partitions can result in irregular topologies. It is desirable that the on-chip interconnect on such systems support communication within all possible partition shapes using efficient routing techniques. To improve flexibility, adaptivity and reliability in such systems, it is desirable to incorporate topology agnostic routing algorithms which can compute optimal routes at runtime. For this purpose, we present a Distributed Reinforcement learning Enabled Adaptive Mixed-Critical Network-on-Chip (DREAM NoC) and supporting framework. DREAM is a distributed NoC which uses a topology agnostic reinforcement learning enabled routing algorithm based on the Ant Colony optimization (ACO) metaheuristic. We propose the DREAM framework which comprises of runtime discovery of paths and selection of optimal routes over time based on traffic fluctuations. We compare the performance against other topology agnostic algorithms under uniform random traffic and application traffic of a MPEG4 video decoder. The results show that the presented technique has upto 63% decrease in latency and 25% increase in throughput for certain irregular topologies under uniform random traffic scenario.
为了降低整体成本,不同关键度的应用共享同一个片上系统(SoC)平台越来越受欢迎。利用空间和时间隔离技术来减少应用程序之间的影响,并确保满足实时需求。空间隔离涉及对通信资源进行分区,这种分区可能导致不规则的拓扑结构。这类系统上的片上互连使用有效的路由技术支持所有可能分区形状内的通信是可取的。为了提高系统的灵活性、适应性和可靠性,需要引入拓扑不可知路由算法,以便在运行时计算出最优路由。为此,我们提出了一个分布式强化学习支持的自适应混合关键片上网络(DREAM NoC)和支持框架。DREAM是一个分布式NoC,它使用基于蚁群优化(ACO)元启发式的拓扑不可知强化学习路由算法。我们提出了DREAM框架,该框架包括运行时路径发现和基于流量波动的最佳路径选择。在均匀随机流量和MPEG4视频解码器的应用流量下,将该算法与其他拓扑不可知算法的性能进行了比较。结果表明,在均匀随机流量场景下,对于某些不规则拓扑,该技术的延迟降低了63%,吞吐量提高了25%。
{"title":"DREAM: Distributed Reinforcement Learning Enabled Adaptive Mixed-Critical NoC","authors":"Nidhi Anantharajaiah, Yunhe Xu, Fabian Lesniak, T. Harbaum, Jürgen Becker","doi":"10.1109/ISVLSI59464.2023.10238569","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238569","url":null,"abstract":"Applications of different criticality sharing the same System-on-Chip (SoC) platform are increasing in popularity to reduce overall cost. Spatial and temporal isolation techniques are utilized to reduce inter application influence and to ensure real-time requirements are met. Spatial isolation involves partitioning communication resources and such partitions can result in irregular topologies. It is desirable that the on-chip interconnect on such systems support communication within all possible partition shapes using efficient routing techniques. To improve flexibility, adaptivity and reliability in such systems, it is desirable to incorporate topology agnostic routing algorithms which can compute optimal routes at runtime. For this purpose, we present a Distributed Reinforcement learning Enabled Adaptive Mixed-Critical Network-on-Chip (DREAM NoC) and supporting framework. DREAM is a distributed NoC which uses a topology agnostic reinforcement learning enabled routing algorithm based on the Ant Colony optimization (ACO) metaheuristic. We propose the DREAM framework which comprises of runtime discovery of paths and selection of optimal routes over time based on traffic fluctuations. We compare the performance against other topology agnostic algorithms under uniform random traffic and application traffic of a MPEG4 video decoder. The results show that the presented technique has upto 63% decrease in latency and 25% increase in throughput for certain irregular topologies under uniform random traffic scenario.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131468958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Optimized Clock Tree Embedding for Auto-Generated FPGAs 自动生成fpga的性能优化时钟树嵌入
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238626
Grant Brown, Ganesh Gore, P. Gaillardon
Field Programmable Gate Arrays (FPGA) have grown in popularity in a myriad of applications due to their reconfigurablity and lower non-recurrent engineering costs when compared to application specific integrated circuits (ASIC). To keep pace with growing application needs and process technology improvements, commerical FPGAs have traditionally chosen full custom chip design approaches. However, embedded FPGAs (eFPGA) have redesigned FPGA uses to be more application specific, thereby producing the need for an agile design approach to accelerate the eFPGA design process. Hence, recent agile FPGA design methods have introduced automation in the design process, allowing for a semi-automated fine-tuning of physical and architectural parameters which reduces the physical design iteration time for FPGAs. The novel grid-based design methods render the usage of commercially available Clock Tree Synthesis (CTS) algorithms on modern FPGA fabrics ineffective. To overcome these deficiencies, we propose a novel clock tree embedding algorithm, utilizing a symmetrical clock tree to ensure skew minimization followed by an efficient pruning method leveraging traditional Static Timing Analysis (STA) to improve clock latency. Experimental results on $2times 2, 7times 7, 8times 8, 29times 29$, and $32times 32$ FPGAs show that our proposed CTS algorithm can achieve up to a 50% improvement in latency and over a $10times$ reduction in skew when compared to an implementation using commercial CTS methodology.
与专用集成电路(ASIC)相比,现场可编程门阵列(FPGA)由于其可重构性和较低的非经常性工程成本,在无数应用中越来越受欢迎。为了跟上不断增长的应用需求和工艺技术的改进,商业fpga传统上选择了完全定制的芯片设计方法。然而,嵌入式FPGA (eFPGA)已经重新设计了FPGA的用途,使其更加特定于应用,从而产生了对敏捷设计方法的需求,以加速eFPGA设计过程。因此,最近敏捷的FPGA设计方法在设计过程中引入了自动化,允许对物理和架构参数进行半自动微调,从而减少了FPGA的物理设计迭代时间。新的基于网格的设计方法使得商用时钟树合成(CTS)算法在现代FPGA结构上的使用无效。为了克服这些不足,我们提出了一种新的时钟树嵌入算法,利用对称时钟树来确保偏差最小化,然后利用传统的静态时序分析(STA)有效的修剪方法来改善时钟延迟。在$2 × 2、$ 7 × 7、$ 8 × 8、$ 29 × 29$和$32 × 32$ fpga上的实验结果表明,与使用商业CTS方法的实现相比,我们提出的CTS算法可以实现高达50%的延迟改进和超过$10 × $的倾斜减少。
{"title":"Performance Optimized Clock Tree Embedding for Auto-Generated FPGAs","authors":"Grant Brown, Ganesh Gore, P. Gaillardon","doi":"10.1109/ISVLSI59464.2023.10238626","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238626","url":null,"abstract":"Field Programmable Gate Arrays (FPGA) have grown in popularity in a myriad of applications due to their reconfigurablity and lower non-recurrent engineering costs when compared to application specific integrated circuits (ASIC). To keep pace with growing application needs and process technology improvements, commerical FPGAs have traditionally chosen full custom chip design approaches. However, embedded FPGAs (eFPGA) have redesigned FPGA uses to be more application specific, thereby producing the need for an agile design approach to accelerate the eFPGA design process. Hence, recent agile FPGA design methods have introduced automation in the design process, allowing for a semi-automated fine-tuning of physical and architectural parameters which reduces the physical design iteration time for FPGAs. The novel grid-based design methods render the usage of commercially available Clock Tree Synthesis (CTS) algorithms on modern FPGA fabrics ineffective. To overcome these deficiencies, we propose a novel clock tree embedding algorithm, utilizing a symmetrical clock tree to ensure skew minimization followed by an efficient pruning method leveraging traditional Static Timing Analysis (STA) to improve clock latency. Experimental results on $2times 2, 7times 7, 8times 8, 29times 29$, and $32times 32$ FPGAs show that our proposed CTS algorithm can achieve up to a 50% improvement in latency and over a $10times$ reduction in skew when compared to an implementation using commercial CTS methodology.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117173616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Digital Circuit Design by Combining Two - and Multi-Level Approximate Logic Synthesis 结合两级和多级近似逻辑综合的数字电路设计评价
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238642
Gabriel Ammes, P. Butzen, A. Reis, Renato P. Ribas
Approximate circuits are emerging as an alternative to save area, delay, and power consumption in error-resilient applications such as machine learning, computer vision, and signal processing. This work presents an evaluation of a logic synthesis approach by exploring two- and multi-level topologies in approximating digital circuit design. In such a strategy, two-level (2L) approximated logic synthesis (ALS) unlocks robust function optimization, whereas multi-level (ML) ALS acts over the structure simplification. Experimental results of combined exploitation of 2L- and ML-ALS have shown improvement in the average area and delay optimization compared to the state-of-the-art ML-ALS for 5% of error rate, being a reduction of up to 37% in circuit area and up to 31% in delay for the same error constraint.
近似电路正在成为机器学习、计算机视觉和信号处理等抗错误应用中节省面积、延迟和功耗的替代方案。这项工作通过探索近似数字电路设计中的两级和多级拓扑来评估逻辑综合方法。在这种策略中,两级(2L)近似逻辑综合(ALS)解锁了鲁棒函数优化,而多级(ML) ALS在结构简化上起作用。实验结果表明,与最先进的ML-ALS相比,2L- als和ML-ALS的平均面积和延迟优化改善了5%的错误率,在相同的误差约束下,电路面积减少了37%,延迟减少了31%。
{"title":"Evaluation of Digital Circuit Design by Combining Two - and Multi-Level Approximate Logic Synthesis","authors":"Gabriel Ammes, P. Butzen, A. Reis, Renato P. Ribas","doi":"10.1109/ISVLSI59464.2023.10238642","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238642","url":null,"abstract":"Approximate circuits are emerging as an alternative to save area, delay, and power consumption in error-resilient applications such as machine learning, computer vision, and signal processing. This work presents an evaluation of a logic synthesis approach by exploring two- and multi-level topologies in approximating digital circuit design. In such a strategy, two-level (2L) approximated logic synthesis (ALS) unlocks robust function optimization, whereas multi-level (ML) ALS acts over the structure simplification. Experimental results of combined exploitation of 2L- and ML-ALS have shown improvement in the average area and delay optimization compared to the state-of-the-art ML-ALS for 5% of error rate, being a reduction of up to 37% in circuit area and up to 31% in delay for the same error constraint.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact Model Parameter Extraction using Bayesian Machine Learning 基于贝叶斯机器学习的紧凑模型参数提取
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238563
Sachin Bhat, S. Kulkarni, C. A. Moritz
Compact models are integral part of large-scale integrated circuit simulations and validation of new technologies. With technology scaling, however, compact models have become complex with lots of parameters involved. Hence, parameter extraction for new device technology is rather challenging. In this paper, we propose a probabilistic approach to compact model parameter extraction. We devise a Bayesian optimization technique which is specifically tailored for efficient extraction of BSIMCMG parameters for fitting nanowire junctionless transistors and 14nm FinFETs. The Bayesian optimization based extraction results show excellent fit to drain current data, with 6.5% normalized root-mean-square error for nanowire junctionless transistors. For a 14nm FinFET, the technique achieves 6.3% and 1.5% for drain current and capacitance data, respectively. This compares favourably to current tools available as well and improves on current tools available including industrial ones.
紧凑模型是大规模集成电路仿真和新技术验证的重要组成部分。然而,随着技术的扩展,紧凑模型变得复杂,涉及到许多参数。因此,新器件技术的参数提取具有一定的挑战性。本文提出了一种紧凑模型参数抽取的概率方法。我们设计了一种贝叶斯优化技术,该技术专门用于有效提取BSIMCMG参数,用于拟合纳米线无结晶体管和14nm finfet。基于贝叶斯优化的提取结果与漏极电流数据拟合良好,对纳米线无结晶体管的归一化均方根误差为6.5%。对于14nm FinFET,该技术的漏极电流和电容数据分别达到6.3%和1.5%。这与现有的工具相比是有利的,并且改进了现有的工具,包括工业工具。
{"title":"Compact Model Parameter Extraction using Bayesian Machine Learning","authors":"Sachin Bhat, S. Kulkarni, C. A. Moritz","doi":"10.1109/ISVLSI59464.2023.10238563","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238563","url":null,"abstract":"Compact models are integral part of large-scale integrated circuit simulations and validation of new technologies. With technology scaling, however, compact models have become complex with lots of parameters involved. Hence, parameter extraction for new device technology is rather challenging. In this paper, we propose a probabilistic approach to compact model parameter extraction. We devise a Bayesian optimization technique which is specifically tailored for efficient extraction of BSIMCMG parameters for fitting nanowire junctionless transistors and 14nm FinFETs. The Bayesian optimization based extraction results show excellent fit to drain current data, with 6.5% normalized root-mean-square error for nanowire junctionless transistors. For a 14nm FinFET, the technique achieves 6.3% and 1.5% for drain current and capacitance data, respectively. This compares favourably to current tools available as well and improves on current tools available including industrial ones.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127263783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MCU-robust Interleaved Data/Detection SRAM for Space Environments 一种用于空间环境的mcu鲁棒交错数据/检测SRAM
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238542
L. H. Brendler, H. Lapuyade, Y. Deval, Ricardo Reis, F. Rivet
This work extends a new method to detect Multiple-Cell Upsets (MCU) in SRAM memories for space applications. The method involves spatially interleaving a memory plan with a network of memory radiation detectors. A 32kb interleaved data/detection SRAM was designed in the 28 nm FD-SOI Technology and tested using post-layout simulations. Results confirm the correct operation of the data and the detection cells of the memory, detecting single and multiple events inserted in different positions of the memory array. Considering the ratio between the number of data and detection cells used in this work (50%), the detection method can provide a probability of detecting MCUs in a memory plan that can reach close to 100%.
这项工作扩展了一种用于空间应用的SRAM存储器中检测多单元干扰(MCU)的新方法。该方法涉及将存储器计划与存储器辐射探测器网络在空间上交错。采用28nm FD-SOI技术设计了32kb的交错数据/检测SRAM,并进行了布局后仿真测试。结果确认数据和存储器的检测单元的正确操作,检测插入到存储器阵列不同位置的单个和多个事件。考虑到本工作中使用的数据数与检测单元数之比(50%),该检测方法可以提供在内存方案中检测到mcu的概率接近100%。
{"title":"A MCU-robust Interleaved Data/Detection SRAM for Space Environments","authors":"L. H. Brendler, H. Lapuyade, Y. Deval, Ricardo Reis, F. Rivet","doi":"10.1109/ISVLSI59464.2023.10238542","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238542","url":null,"abstract":"This work extends a new method to detect Multiple-Cell Upsets (MCU) in SRAM memories for space applications. The method involves spatially interleaving a memory plan with a network of memory radiation detectors. A 32kb interleaved data/detection SRAM was designed in the 28 nm FD-SOI Technology and tested using post-layout simulations. Results confirm the correct operation of the data and the detection cells of the memory, detecting single and multiple events inserted in different positions of the memory array. Considering the ratio between the number of data and detection cells used in this work (50%), the detection method can provide a probability of detecting MCUs in a memory plan that can reach close to 100%.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132140573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grep: Performance Enhancement in MultiCore Processors using an Adaptive Graph Prefetcher Grep:使用自适应图形预取器的多核处理器性能增强
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238634
Indranee Kashyap, Dipika Deb, Nityananda Sarma
Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.
内存延迟和片外带宽一直在努力跟上现代计算机系统的计算性能。在这方面,通过持续监视应用程序的内存访问模式,预取有助于掩盖各种缓存级别上的长内存访问延迟。一旦检测到一个模式,它会在使用之前预取缓存块。然而,复杂的模式,如定向或间接指针访问、链表等,并不遵循任何特定的模式,因此,使得预取不可能。本文提出了一种基于自适应图的数据预取器Grep,用于监控L1D缓存缺失和L2缓存中的预取块。与最先进的预取器不同,Grep不会在缺失流中搜索模式。相反,它通过构造一个存储后续缓存块访问的频率和顺序的发生图,在缓存失败之间生成一个前身-后继关系。因此,缺失流中的规则和不规则模式都可以预测。当出现图中的地址匹配时,Grep预取具有置信度值的块。实验结果表明,与SPP相比,该方法的预取覆盖率和预取准确率分别提高了35.5%和18.8%。
{"title":"Grep: Performance Enhancement in MultiCore Processors using an Adaptive Graph Prefetcher","authors":"Indranee Kashyap, Dipika Deb, Nityananda Sarma","doi":"10.1109/ISVLSI59464.2023.10238634","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238634","url":null,"abstract":"Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125371130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application Profiling Using Register-Instruction Hardware Performance Counters 使用寄存器指令硬件性能计数器的应用程序分析
Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238603
Anand Menon, Amisha Srivastava, Shamik Kundu, K. Basu
Kleptographic attacks are a type of security threat that involve weakening a cryptographic implementation in order to extract sensitive information from a computer system. These attacks can be particularly harmful when they target cryptographic keys or other security-critical information. Since software-based defenses are not robust, to address these threats, prior studies have explored the use of trusted hardware-based solutions, involving tailor-made Hardware Performance Counters (HPCs). However, these tailor-made HPCs lack the fine-grained characterization necessary to correctly differentiate between individual applications. As a result, a large number of HPCs are required to monitor the application, which incurs high overhead on the system. To this end, we propose the development of Register-Instruction Hardware Performance Counters (RIHPCs), a bespoke set of special-purpose registers designed to characterize applications, and thus detect Kleptographic attacks, with low granularity and low performance overhead. To assess the performance of RIHPCs against Kleptographic attacks, we profile NIST’s Post Quantum Cryptographic Key Encapsulation Mechanism (PQC-KEM) algorithms. Our results show that RIHPC traces can distinguish between PQC algorithms with an accuracy of over 99%, while furnishing up to 67% reduction in performance overhead in comparison to tailor-made HPCs.
盗窃攻击是一种安全威胁,涉及削弱加密实现,以便从计算机系统中提取敏感信息。当这些攻击以加密密钥或其他安全关键信息为目标时,它们可能特别有害。由于基于软件的防御并不强大,为了解决这些威胁,之前的研究已经探索了使用可信的基于硬件的解决方案,包括定制的硬件性能计数器(hpc)。然而,这些定制的hpc缺乏正确区分各个应用程序所需的细粒度特征。因此,需要大量的hpc来监视应用程序,这给系统带来了很高的开销。为此,我们建议开发寄存器指令硬件性能计数器(rihpc),这是一套定制的专用寄存器,用于表征应用程序,从而检测具有低粒度和低性能开销的盗窃攻击。为了评估rihpc抵御盗窃攻击的性能,我们分析了NIST的后量子加密密钥封装机制(PQC-KEM)算法。我们的研究结果表明,RIHPC迹线可以区分PQC算法,准确率超过99%,同时与定制的hpc相比,性能开销减少了67%。
{"title":"Application Profiling Using Register-Instruction Hardware Performance Counters","authors":"Anand Menon, Amisha Srivastava, Shamik Kundu, K. Basu","doi":"10.1109/ISVLSI59464.2023.10238603","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238603","url":null,"abstract":"Kleptographic attacks are a type of security threat that involve weakening a cryptographic implementation in order to extract sensitive information from a computer system. These attacks can be particularly harmful when they target cryptographic keys or other security-critical information. Since software-based defenses are not robust, to address these threats, prior studies have explored the use of trusted hardware-based solutions, involving tailor-made Hardware Performance Counters (HPCs). However, these tailor-made HPCs lack the fine-grained characterization necessary to correctly differentiate between individual applications. As a result, a large number of HPCs are required to monitor the application, which incurs high overhead on the system. To this end, we propose the development of Register-Instruction Hardware Performance Counters (RIHPCs), a bespoke set of special-purpose registers designed to characterize applications, and thus detect Kleptographic attacks, with low granularity and low performance overhead. To assess the performance of RIHPCs against Kleptographic attacks, we profile NIST’s Post Quantum Cryptographic Key Encapsulation Mechanism (PQC-KEM) algorithms. Our results show that RIHPC traces can distinguish between PQC algorithms with an accuracy of over 99%, while furnishing up to 67% reduction in performance overhead in comparison to tailor-made HPCs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1