Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238612
Shams Tarek, Hasan Al Shaikh, Sree Ranjani Rajendran, Farimah Farahmandi
Due to the increasing complexity of modern system-on-chips (SoCs) and the diversity of the attack surface, popular SoC verification approaches used in industry and academia for detecting security-critical vulnerabilities confront several challenges. Although novel SoC security verification techniques are being proposed to overcome these challenges, qualitative and quantitative critical comparisons among them are becoming increasingly difficult due to the lack of suitable, well-validated SoC-level hardware vulnerability benchmarks that can be used to evaluate the efficacy of these security verification techniques/tools on a level playing field. In this paper, we offer a comprehensive database of SoC vulnerabilities, with a particular emphasis on emerging hardware threats that may be exploited from the software layer by attackers to violate the security requirements of the system. In this regard, 32 register transfer level (RTL) hardware vulnerability benchmarks based on three distinct RISC-V-based ISA implementations have been established and made open-source to stimulate standardized research efforts in the community. In addition, we provide a comprehensive taxonomy of the benchmarks, complete with security implications and classifications. We also offer a discussion on exploitation strategies that attackers may employ, a set of security properties associated with each vulnerability in order to detect them formally, and the difficulties encountered by typical security verification methods when attempting to detect them.
{"title":"Benchmarking of SoC-Level Hardware Vulnerabilities: A Complete Walkthrough","authors":"Shams Tarek, Hasan Al Shaikh, Sree Ranjani Rajendran, Farimah Farahmandi","doi":"10.1109/ISVLSI59464.2023.10238612","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238612","url":null,"abstract":"Due to the increasing complexity of modern system-on-chips (SoCs) and the diversity of the attack surface, popular SoC verification approaches used in industry and academia for detecting security-critical vulnerabilities confront several challenges. Although novel SoC security verification techniques are being proposed to overcome these challenges, qualitative and quantitative critical comparisons among them are becoming increasingly difficult due to the lack of suitable, well-validated SoC-level hardware vulnerability benchmarks that can be used to evaluate the efficacy of these security verification techniques/tools on a level playing field. In this paper, we offer a comprehensive database of SoC vulnerabilities, with a particular emphasis on emerging hardware threats that may be exploited from the software layer by attackers to violate the security requirements of the system. In this regard, 32 register transfer level (RTL) hardware vulnerability benchmarks based on three distinct RISC-V-based ISA implementations have been established and made open-source to stimulate standardized research efforts in the community. In addition, we provide a comprehensive taxonomy of the benchmarks, complete with security implications and classifications. We also offer a discussion on exploitation strategies that attackers may employ, a set of security properties associated with each vulnerability in order to detect them formally, and the difficulties encountered by typical security verification methods when attempting to detect them.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134484710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238586
Vishnu Bathalapalli, S. Mohanty, E. Kougianos, Vasanth Iyer, Bibhudutta Rout
The scope of Smart electronics and its increasing market worldwide has made cybersecurity an important challenge. The Security-by-Design (SbD) principle, an emerging cybersecurity area, focuses on building security/privacy-enabled primitives at the design stage of an electronic system. This paper proposes a novel Physical Unclonable Function (PUF) based Trusted Platform Module (TPM) for SbD primitive. The proposed SbD primitive works by performing secure verification of the PUF key using TPM’s Encryption and Decryption engine. The securely verified PUF Key is then bound to TPM using Platform Configuration Registers (PCR). PCRs in TPM facilitate a secure boot process and effective access control to TPM’s NonVolatile memory through an enhanced authorization policy. By binding PUF with PCR in TPM, a novel PUF-based access control policy can be defined, bringing in a new security ecosystem for the emerging Internet-of-Everything era. The proposed SbD approach has been experimentally validated by successfully integrating various PUF topologies with Hardware TPM.
{"title":"iTPM: Exploring PUF-based Keyless TPM for Security-by-Design of Smart Electronics","authors":"Vishnu Bathalapalli, S. Mohanty, E. Kougianos, Vasanth Iyer, Bibhudutta Rout","doi":"10.1109/ISVLSI59464.2023.10238586","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238586","url":null,"abstract":"The scope of Smart electronics and its increasing market worldwide has made cybersecurity an important challenge. The Security-by-Design (SbD) principle, an emerging cybersecurity area, focuses on building security/privacy-enabled primitives at the design stage of an electronic system. This paper proposes a novel Physical Unclonable Function (PUF) based Trusted Platform Module (TPM) for SbD primitive. The proposed SbD primitive works by performing secure verification of the PUF key using TPM’s Encryption and Decryption engine. The securely verified PUF Key is then bound to TPM using Platform Configuration Registers (PCR). PCRs in TPM facilitate a secure boot process and effective access control to TPM’s NonVolatile memory through an enhanced authorization policy. By binding PUF with PCR in TPM, a novel PUF-based access control policy can be defined, bringing in a new security ecosystem for the emerging Internet-of-Everything era. The proposed SbD approach has been experimentally validated by successfully integrating various PUF topologies with Hardware TPM.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134026320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238640
R. E. Formigoni, Ricardo S. Ferreira, O. P. V. Neto, J. Nacif
The CMOS (Complementary Metal Oxide Semiconductor) technology is the industry standard for chip fabrication. Currently, CMOS faces ever-increasing thermal, power, and miniaturization challenges. As a result, researchers are putting efforts into novel alternative technologies to handle these issues, such as nanomagnetic logic (NML), which uses nanomagnets to perform binary logic. This paper presents a novel L-Shaped clocking scheme to synchronize NML circuits. Our proposal is scalable, simple to use, and reduces the number of constraints for placement and routing algorithms for circuit generation. In addition, the L-Shape clocking scheme introduces tiles with a multi-phase design, which allows for a reduced area overhead at the cost of latency, solves feedback path issues, and introduces a model to work with modern NML features. Our results demonstrate a small latency trade-off for a considerable area reduction. Finally, we validate our work with layouts in Topolinano.
{"title":"L-BANCS: A Multi-Phase Tile Design for Nanomagnetic Logic","authors":"R. E. Formigoni, Ricardo S. Ferreira, O. P. V. Neto, J. Nacif","doi":"10.1109/ISVLSI59464.2023.10238640","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238640","url":null,"abstract":"The CMOS (Complementary Metal Oxide Semiconductor) technology is the industry standard for chip fabrication. Currently, CMOS faces ever-increasing thermal, power, and miniaturization challenges. As a result, researchers are putting efforts into novel alternative technologies to handle these issues, such as nanomagnetic logic (NML), which uses nanomagnets to perform binary logic. This paper presents a novel L-Shaped clocking scheme to synchronize NML circuits. Our proposal is scalable, simple to use, and reduces the number of constraints for placement and routing algorithms for circuit generation. In addition, the L-Shape clocking scheme introduces tiles with a multi-phase design, which allows for a reduced area overhead at the cost of latency, solves feedback path issues, and introduces a model to work with modern NML features. Our results demonstrate a small latency trade-off for a considerable area reduction. Finally, we validate our work with layouts in Topolinano.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132966851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238486
E. A. Ramos, Ricardo Reis
The technology scaling of transistors makes them more susceptible to faults, such as those due to radiation effects and process variability. Faults related to process variability can cause circuits to operate outside their specification ranges. In most cases, simulations are used to analyze such effects, but simulations have high computational costs. This work aims to use the Mathematical Chaos Theory through the Lyapunov Exponents and the Entropy of a Circuit to analytically estimate the effects caused by the variability of the manufacturing process, resulting in a method that can estimate the variability to Power, Delay, and Power Delay Product (PDP) with an accuracy equivalent to simulation-based methods, but on average three hundred times faster.
{"title":"Using Lyapunov Exponents and Entropy to Estimate Sensitivity to Process Variability","authors":"E. A. Ramos, Ricardo Reis","doi":"10.1109/ISVLSI59464.2023.10238486","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238486","url":null,"abstract":"The technology scaling of transistors makes them more susceptible to faults, such as those due to radiation effects and process variability. Faults related to process variability can cause circuits to operate outside their specification ranges. In most cases, simulations are used to analyze such effects, but simulations have high computational costs. This work aims to use the Mathematical Chaos Theory through the Lyapunov Exponents and the Entropy of a Circuit to analytically estimate the effects caused by the variability of the manufacturing process, resulting in a method that can estimate the variability to Power, Delay, and Power Delay Product (PDP) with an accuracy equivalent to simulation-based methods, but on average three hundred times faster.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"12 2.1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238585
Vishu Saxena, Yash Jain, Sparsh Mittal
This research investigates how heavy-ion irradiation affects the single event transient (SET) response of 14nm silicon-on-insulator (SOI) FinFET. The researchers generally use a TCAD tool (e.g., Sentauras TCAD) for developing a SET pulse current model. However, the TCAD simulations are time-consuming, which prohibits efficient design-space exploration. We propose efficient models for predicting SET pulse current with high accuracy. We use (1) polynomial chaos (PC) based models (2) ML regression techniques (3) artificial neural networks and 1Dconvolution neural network based models. Striking of a heavy-ion leads to transient behavior, which is very different from the normal behavior. Hence, for all the above predictors, we also evaluate the corresponding piecewise predictors. While TCAD tools take 4 hours for each simulation on a high-end computer, our proposed models take much lower latency (e.g., few seconds). This allows designers to explore a larger design space. Our proposed piecewise 1D-CNN model achieves state-of-the-art MSE which is 2.15× 1$0^{-6}$ mA-squared. Overall, our study provides insights into how PC and ML-based regression models can be used to enhance the efficiency of SET analysis in circuit design.
{"title":"Machine Learning and Polynomial Chaos models for Accurate Prediction of SET Pulse Current","authors":"Vishu Saxena, Yash Jain, Sparsh Mittal","doi":"10.1109/ISVLSI59464.2023.10238585","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238585","url":null,"abstract":"This research investigates how heavy-ion irradiation affects the single event transient (SET) response of 14nm silicon-on-insulator (SOI) FinFET. The researchers generally use a TCAD tool (e.g., Sentauras TCAD) for developing a SET pulse current model. However, the TCAD simulations are time-consuming, which prohibits efficient design-space exploration. We propose efficient models for predicting SET pulse current with high accuracy. We use (1) polynomial chaos (PC) based models (2) ML regression techniques (3) artificial neural networks and 1Dconvolution neural network based models. Striking of a heavy-ion leads to transient behavior, which is very different from the normal behavior. Hence, for all the above predictors, we also evaluate the corresponding piecewise predictors. While TCAD tools take 4 hours for each simulation on a high-end computer, our proposed models take much lower latency (e.g., few seconds). This allows designers to explore a larger design space. Our proposed piecewise 1D-CNN model achieves state-of-the-art MSE which is 2.15× 1$0^{-6}$ mA-squared. Overall, our study provides insights into how PC and ML-based regression models can be used to enhance the efficiency of SET analysis in circuit design.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122553141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Verification of FPGA-based designs and comprehension of legacy designs can be aided by the process of reverse engineering the flattened Look-up Table (LUT) level netlists to high-level RTL representations. We propose a tool flow to extract Finite State Controllers by identifying control registers and progressively improving the accuracy of register classification. A control unit consists of one or more Finite State Machines (FSMs) which manage the execution of datapath units. The proposed tool flow has two phases. Phase 1 extracts the potential state/control registers. Phase 2 identifies the exact list of state/control registers and groups FSMs. The main goal of the proposed work is to improve the accuracy of control register identification. Three types of controllers used for experimental evaluation are standalone FSM designs with no datapath units, datapaths with a single FSM, and datapaths with multiple FSMs. Accuracy is observed to be 73% to 100% in controllers with multiple FSMs, 100% in controllers with a single FSM and standalone FSM controller designs. The average accuracy of control register detection over all the real-world designs considered is 98%.
{"title":"Reverse Engineering of RTL Controllers from Look-Up Table Netlists","authors":"Sundarakumar Muthukumaran, Aparajithan Nathamuni Venkatesan, Kishore Pula, Ram Venkat Narayanan, Ranga Vemuri, John Emmert","doi":"10.1109/ISVLSI59464.2023.10238540","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238540","url":null,"abstract":"Verification of FPGA-based designs and comprehension of legacy designs can be aided by the process of reverse engineering the flattened Look-up Table (LUT) level netlists to high-level RTL representations. We propose a tool flow to extract Finite State Controllers by identifying control registers and progressively improving the accuracy of register classification. A control unit consists of one or more Finite State Machines (FSMs) which manage the execution of datapath units. The proposed tool flow has two phases. Phase 1 extracts the potential state/control registers. Phase 2 identifies the exact list of state/control registers and groups FSMs. The main goal of the proposed work is to improve the accuracy of control register identification. Three types of controllers used for experimental evaluation are standalone FSM designs with no datapath units, datapaths with a single FSM, and datapaths with multiple FSMs. Accuracy is observed to be 73% to 100% in controllers with multiple FSMs, 100% in controllers with a single FSM and standalone FSM controller designs. The average accuracy of control register detection over all the real-world designs considered is 98%.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238552
Joydeb Dutta, Deepak Puthal
In the present era, data plays a crucial role across various disciplines, serving as the foundation for exploration and advancements. However, in the domain of eHealth, a readily available dataset for training AI models to predict cardiac arrest using the internet of medical things (IoMT) is lacking. To bridge this gap, this research article addresses the need for a synthesized dataset that can be utilized by researchers in the eHealth field to evaluate the effectiveness of their AI/ML models. The article presents a synthesized IoMT dataset specifically designed for cardiac arrest prediction, incorporating valid ranges of IoMT-based medical features sourced from peer-reviewed journals and articles. This study offers the capability to generate synthetic datasets of varying sizes, catering to the specific requirements of researchers focused on cardiac arrest prediction for individual subjects (patients). The availability of such a dataset will contribute to the advancement of AI-driven research in the eHealth domain.
{"title":"IoMT Synthetic Cardiac Arrest Dataset for eHealth with AI-based Validation","authors":"Joydeb Dutta, Deepak Puthal","doi":"10.1109/ISVLSI59464.2023.10238552","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238552","url":null,"abstract":"In the present era, data plays a crucial role across various disciplines, serving as the foundation for exploration and advancements. However, in the domain of eHealth, a readily available dataset for training AI models to predict cardiac arrest using the internet of medical things (IoMT) is lacking. To bridge this gap, this research article addresses the need for a synthesized dataset that can be utilized by researchers in the eHealth field to evaluate the effectiveness of their AI/ML models. The article presents a synthesized IoMT dataset specifically designed for cardiac arrest prediction, incorporating valid ranges of IoMT-based medical features sourced from peer-reviewed journals and articles. This study offers the capability to generate synthetic datasets of varying sizes, catering to the specific requirements of researchers focused on cardiac arrest prediction for individual subjects (patients). The availability of such a dataset will contribute to the advancement of AI-driven research in the eHealth domain.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238622
Hongtao Zhong, Yu Zhu, Longfei Luo, Taixin Li, Chen Wang, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Yongpan Liu, Liang Shi, Huazhong Yang, Xueqing Li
Graph convolutional network (GCN) has emerged as a powerful model for many graph-related tasks. In conventional von Neumann architectures, massive data movement and irregular memory access in GCN computation severely degrade the performance and computation efficiency. For GCN acceleration, processing-in-memory (PIM) is promising by reducing the data movement. However, with the emergence of large GCN computation tasks, existing 2D PIM GCN accelerators face the challenge of storing all the necessary data on chip due to the limited PIM memory capacity, resulting in unwanted external memory access and degradation of performance and energy efficiency. This paper presents Fe-GCN, a 3D PIM GCN accelerator with high memory density based on the ferroelectric field-effect transistor (FeFET) memory. Besides, to mitigate the impact of the increased latency of the 3D memory structure, several software-hardware co-optimizations are proposed. Furthermore, an edge merging technique is also proposed to increase the memory utilization for the 3D GCN mapping and computing. Experimental results show that Fe-GCN achieves on average 2,647x, 58x, 18x, and 35x speedup and 26,708x, 1,246x, 25x, and 57x energy efficiency improvement over CPU, GPU, the state-of-the-art accelerators based on RRAM PIM and ASIC, respectively.
{"title":"Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks","authors":"Hongtao Zhong, Yu Zhu, Longfei Luo, Taixin Li, Chen Wang, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Yongpan Liu, Liang Shi, Huazhong Yang, Xueqing Li","doi":"10.1109/ISVLSI59464.2023.10238622","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238622","url":null,"abstract":"Graph convolutional network (GCN) has emerged as a powerful model for many graph-related tasks. In conventional von Neumann architectures, massive data movement and irregular memory access in GCN computation severely degrade the performance and computation efficiency. For GCN acceleration, processing-in-memory (PIM) is promising by reducing the data movement. However, with the emergence of large GCN computation tasks, existing 2D PIM GCN accelerators face the challenge of storing all the necessary data on chip due to the limited PIM memory capacity, resulting in unwanted external memory access and degradation of performance and energy efficiency. This paper presents Fe-GCN, a 3D PIM GCN accelerator with high memory density based on the ferroelectric field-effect transistor (FeFET) memory. Besides, to mitigate the impact of the increased latency of the 3D memory structure, several software-hardware co-optimizations are proposed. Furthermore, an edge merging technique is also proposed to increase the memory utilization for the 3D GCN mapping and computing. Experimental results show that Fe-GCN achieves on average 2,647x, 58x, 18x, and 35x speedup and 26,708x, 1,246x, 25x, and 57x energy efficiency improvement over CPU, GPU, the state-of-the-art accelerators based on RRAM PIM and ASIC, respectively.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130699289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238604
Marcel Walter, B. Hien, R. Wille
Field-coupled Nanocomputing (FCN) is a promising beyond-CMOS technology that leverages physical field repulsion instead of electrical current flow to transmit information and perform computations, potentially leading to energy dissipation below the Landauer Limit and clock frequencies in the terahertz regime. Despite recent progress in the experimental realization of FCN using Silicon Dangling Bonds (SiDBs), the physical design of FCN circuits remains a challenging task due to different design constraints compared to CMOS technologies. In this paper, we present three core contributions to the FCN physical design problem, building on top of the fastest heuristic algorithm in the FCN literature, ortho. Via special routing structures called Signal Distribution Networks (SDNs), we 1) reduce area overhead, wire costs, and the number of wire-crossings in routing solutions by approximately 25%, 10%, and 17%, respectively; 2) allow the use of Majority gates to quantify their routing costs, which occur to be immense; and 3) enable the automatic placement and routing of sequential logic for the first time in the literature. Our approach can potentially pave the way for the practical implementation of the FCN technology and its advancement as a viable green alternative to conventional computing technologies.
{"title":"Versatile Signal Distribution Networks for Scalable Placement and Routing of Field-coupled Nanocomputing Technologies","authors":"Marcel Walter, B. Hien, R. Wille","doi":"10.1109/ISVLSI59464.2023.10238604","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238604","url":null,"abstract":"Field-coupled Nanocomputing (FCN) is a promising beyond-CMOS technology that leverages physical field repulsion instead of electrical current flow to transmit information and perform computations, potentially leading to energy dissipation below the Landauer Limit and clock frequencies in the terahertz regime. Despite recent progress in the experimental realization of FCN using Silicon Dangling Bonds (SiDBs), the physical design of FCN circuits remains a challenging task due to different design constraints compared to CMOS technologies. In this paper, we present three core contributions to the FCN physical design problem, building on top of the fastest heuristic algorithm in the FCN literature, ortho. Via special routing structures called Signal Distribution Networks (SDNs), we 1) reduce area overhead, wire costs, and the number of wire-crossings in routing solutions by approximately 25%, 10%, and 17%, respectively; 2) allow the use of Majority gates to quantify their routing costs, which occur to be immense; and 3) enable the automatic placement and routing of sequential logic for the first time in the literature. Our approach can potentially pave the way for the practical implementation of the FCN technology and its advancement as a viable green alternative to conventional computing technologies.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238679
Alessandro Nadalini, Georg Rutishauser, A. Burrello, Nazareno Bruschi, Angelo Garofalo, L. Benini, Francesco Conti, D. Rossi
The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN’s memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to $ 8.5 times$ speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by $ 2 times-2.5 times$ with respect to existing solutions using fully flexible programmable processors.
{"title":"A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks","authors":"Alessandro Nadalini, Georg Rutishauser, A. Burrello, Nazareno Bruschi, Angelo Garofalo, L. Benini, Francesco Conti, D. Rossi","doi":"10.1109/ISVLSI59464.2023.10238679","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238679","url":null,"abstract":"The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN’s memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to $ 8.5 times$ speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by $ 2 times-2.5 times$ with respect to existing solutions using fully flexible programmable processors.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124041917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}