Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129323
Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-mei W. Hwu, M. Margala
Today we see lightweight computer hardware utilized in large volumes, especially with the growing use of IoT devices in homes. However, such devices often ignore security until it is too late and sensitive data breaches have occurred.From here, the importance of finding lightweight cryptographic primitives to secure IoT devices is exponentially increasing, while not impacting the limited resources and limitation of the battery lifetime. In the search for a lightweight cryptographic standard, one must consider how to implement such algorithms optimally. For example, certain parts of an algorithm might be faster in hardware than in software and vice versa.This paper presents a hardware extension supporting the MicroBlaze softcore processor to efficiently implement one of the Lightweight Cryptography (LWC) finalists (TinyJAMBU) on Digilent Nexys A7-100T. The proposed hardware extension consists of a reconfigurable Non-Linear Feedback Shift Register (NLFSR), the central computing part for the authenticated encryption with associated data (AEAD) TinyJAMBU. The proposed NLFSR can run different variants of TinyJAMBU while only consuming 186 mWh in just ten minutes at 100 MHz. The total resources needed to host the proposed NLFSR on the FPGA are 610 LUT and 505 Flip-Flops while executable the binary size is 352 bytes smaller. Therefore, the proposed solution based on the hardware extension is x2.17 times faster than the pure software implementation of the whole TinyJAMBU using MicroBlaze while consuming six mWh more. To our knowledge, this is the first implementation of TinyJAMBU using software/hardware partitioning on FPGA with the softcore processor MicroBlaze.
{"title":"RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms","authors":"Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-mei W. Hwu, M. Margala","doi":"10.1109/ISQED57927.2023.10129323","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129323","url":null,"abstract":"Today we see lightweight computer hardware utilized in large volumes, especially with the growing use of IoT devices in homes. However, such devices often ignore security until it is too late and sensitive data breaches have occurred.From here, the importance of finding lightweight cryptographic primitives to secure IoT devices is exponentially increasing, while not impacting the limited resources and limitation of the battery lifetime. In the search for a lightweight cryptographic standard, one must consider how to implement such algorithms optimally. For example, certain parts of an algorithm might be faster in hardware than in software and vice versa.This paper presents a hardware extension supporting the MicroBlaze softcore processor to efficiently implement one of the Lightweight Cryptography (LWC) finalists (TinyJAMBU) on Digilent Nexys A7-100T. The proposed hardware extension consists of a reconfigurable Non-Linear Feedback Shift Register (NLFSR), the central computing part for the authenticated encryption with associated data (AEAD) TinyJAMBU. The proposed NLFSR can run different variants of TinyJAMBU while only consuming 186 mWh in just ten minutes at 100 MHz. The total resources needed to host the proposed NLFSR on the FPGA are 610 LUT and 505 Flip-Flops while executable the binary size is 352 bytes smaller. Therefore, the proposed solution based on the hardware extension is x2.17 times faster than the pure software implementation of the whole TinyJAMBU using MicroBlaze while consuming six mWh more. To our knowledge, this is the first implementation of TinyJAMBU using software/hardware partitioning on FPGA with the softcore processor MicroBlaze.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131412266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129399
Vidya A. Chhabria, S. Sapatnekar
Rapid thermal annealing (RTA) is an important step in semiconductor manufacturing. RTA-induced variability due to differences in die layout patterns can significantly contribute to transistor parameter variations, resulting in degraded chip performance and yield. The die layout patterns that drive these variations are related to the distribution of the density of transistors (silicon) and shallow trench isolation (silicon dioxide) across the die, which result in emissivity variations that change the die surface temperature during annealing. While prior art has developed pattern-dependent simulators and provided mitigation techniques for digital design, it has failed to consider the impact of the temperature-dependent thermal conductivity of silicon on RTA effects and has not analyzed the effects on memory. This work develops a novel 3D transient pattern-dependent RTA simulation methodology that accounts for the dependence of the thermal conductivity of silicon on temperature. The simulator is used to both analyze the effects of RTA on memory performance and to propose mitigation strategies for a 7nm FinFET SRAM design. It is shown that RTA effects degrade read and write delays by 16% and 20% and read static noise margin (SNM) by 15%, and the applied mitigation strategies can compensate for these degradations at the cost of a 16% increase in area for a 7.5% tolerance in SNM margin.
{"title":"Analysis of Pattern-dependent Rapid Thermal Annealing Effects on SRAM Design","authors":"Vidya A. Chhabria, S. Sapatnekar","doi":"10.1109/ISQED57927.2023.10129399","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129399","url":null,"abstract":"Rapid thermal annealing (RTA) is an important step in semiconductor manufacturing. RTA-induced variability due to differences in die layout patterns can significantly contribute to transistor parameter variations, resulting in degraded chip performance and yield. The die layout patterns that drive these variations are related to the distribution of the density of transistors (silicon) and shallow trench isolation (silicon dioxide) across the die, which result in emissivity variations that change the die surface temperature during annealing. While prior art has developed pattern-dependent simulators and provided mitigation techniques for digital design, it has failed to consider the impact of the temperature-dependent thermal conductivity of silicon on RTA effects and has not analyzed the effects on memory. This work develops a novel 3D transient pattern-dependent RTA simulation methodology that accounts for the dependence of the thermal conductivity of silicon on temperature. The simulator is used to both analyze the effects of RTA on memory performance and to propose mitigation strategies for a 7nm FinFET SRAM design. It is shown that RTA effects degrade read and write delays by 16% and 20% and read static noise margin (SNM) by 15%, and the applied mitigation strategies can compensate for these degradations at the cost of a 16% increase in area for a 7.5% tolerance in SNM margin.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124643030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129338
Sathwika Bavikadi, Purab Ranjan Sutradhar, A. Ganguly, Sai Manoj Pudukotai Dinakarrao
Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200× higher energy efficiency and 1.5× higher throughput than a DRAM-based LUT-based PIM architecture.
{"title":"Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration","authors":"Sathwika Bavikadi, Purab Ranjan Sutradhar, A. Ganguly, Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/ISQED57927.2023.10129338","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129338","url":null,"abstract":"Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200× higher energy efficiency and 1.5× higher throughput than a DRAM-based LUT-based PIM architecture.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129377
C. PrashanthH., S. SrinikethS., Shrikrishna Hebbar, R. Chinmaye, M. Rao
Square-root is an elementary arithmetic function that is utilized not only for image and signal processing applications but also to extract vector functionalities. The square-root module demands high energy and hardware resources, apart from being a complex design to implement. In the past, many techniques, including Iterative, New Non-Restoring (New-NR), CORDIC, Piece-wise-linear (PWL) approximation, Look-Up-Tables (LUTs), Digit-by-digit based integer (Digit-Int) format and fixed-point (Digit-FP) format implementations were reported to realize square-root function. Cartesian genetic programming (CGP) is a type of evolutionary algorithm that is suggested to evolve circuits by exploring a large solution space. This paper attempts to develop a library of square-root circuits ranging from 2-bits to 8-bits and also benchmark the proposed CGP evolved square-root circuits with the other hardware implementations. All designs were analyzed using both FPGA and ASIC (130 nm Skywater node) flow to characterize hardware parameters and evaluated using various error metrics. Among all the implementations, CGP-derived square-root designs of fixed-point format offered the best trade-off between hardware and error characteristics. All novel designs of this work are made freely available in [1] for further research and development usage.
{"title":"SQRTLIB : Library of Hardware Square Root Designs","authors":"C. PrashanthH., S. SrinikethS., Shrikrishna Hebbar, R. Chinmaye, M. Rao","doi":"10.1109/ISQED57927.2023.10129377","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129377","url":null,"abstract":"Square-root is an elementary arithmetic function that is utilized not only for image and signal processing applications but also to extract vector functionalities. The square-root module demands high energy and hardware resources, apart from being a complex design to implement. In the past, many techniques, including Iterative, New Non-Restoring (New-NR), CORDIC, Piece-wise-linear (PWL) approximation, Look-Up-Tables (LUTs), Digit-by-digit based integer (Digit-Int) format and fixed-point (Digit-FP) format implementations were reported to realize square-root function. Cartesian genetic programming (CGP) is a type of evolutionary algorithm that is suggested to evolve circuits by exploring a large solution space. This paper attempts to develop a library of square-root circuits ranging from 2-bits to 8-bits and also benchmark the proposed CGP evolved square-root circuits with the other hardware implementations. All designs were analyzed using both FPGA and ASIC (130 nm Skywater node) flow to characterize hardware parameters and evaluated using various error metrics. Among all the implementations, CGP-derived square-root designs of fixed-point format offered the best trade-off between hardware and error characteristics. All novel designs of this work are made freely available in [1] for further research and development usage.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124919717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129397
Lennart Weingarten, Alireza Mahzoon, Mehran Goli, R. Drechsler
Formal verification is an important task to ensure the correctness of a circuit. In the last 30 years, several formal methods have been proposed to verify various architectures. However, the space and time complexities of these methods are usually unknown, particularly, when it comes to complex designs, e.g., processors. As a result, there is always unpredictability in the performance of the verification tool. If we prove that a formal method has polynomial space and time complexities, we can successfully resolve the unpredictability problem and ensure the scalability of the method.In this paper, we propose a Polynomial Formal Verification (PFV) method based on Binary Decision Diagrams (BDDs) to fully verify a RISC-V processor. We take advantage of partial simulation to extract the hardware related to each instruction. Then, we create the reference BDD for each instruction with respect to its size and function. Finally, we run a symbolic simulation for each hardware instruction and compare it with the reference model. We prove that the whole verification task can be carried out in polynomial space and time. The experiments demonstrate that the PFV of a RISC-V RV32I processor can be performed in less than one second.
{"title":"Polynomial Formal Verification of a Processor: A RISC-V Case Study","authors":"Lennart Weingarten, Alireza Mahzoon, Mehran Goli, R. Drechsler","doi":"10.1109/ISQED57927.2023.10129397","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129397","url":null,"abstract":"Formal verification is an important task to ensure the correctness of a circuit. In the last 30 years, several formal methods have been proposed to verify various architectures. However, the space and time complexities of these methods are usually unknown, particularly, when it comes to complex designs, e.g., processors. As a result, there is always unpredictability in the performance of the verification tool. If we prove that a formal method has polynomial space and time complexities, we can successfully resolve the unpredictability problem and ensure the scalability of the method.In this paper, we propose a Polynomial Formal Verification (PFV) method based on Binary Decision Diagrams (BDDs) to fully verify a RISC-V processor. We take advantage of partial simulation to extract the hardware related to each instruction. Then, we create the reference BDD for each instruction with respect to its size and function. Finally, we run a symbolic simulation for each hardware instruction and compare it with the reference model. We prove that the whole verification task can be carried out in polynomial space and time. The experiments demonstrate that the PFV of a RISC-V RV32I processor can be performed in less than one second.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114546068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129341
Simon Friedrich, Shambhavi Balamuthu Sampath, R. Wittig, M. Vemparala, Nael Fasfous, E. Matús, W. Stechele, G. Fettweis
Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.
{"title":"Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands","authors":"Simon Friedrich, Shambhavi Balamuthu Sampath, R. Wittig, M. Vemparala, Nael Fasfous, E. Matús, W. Stechele, G. Fettweis","doi":"10.1109/ISQED57927.2023.10129341","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129341","url":null,"abstract":"Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":" 33","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120832453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129309
Bikrant Das Sharma, Abdul Rahman Ismail, Chris Meyers
USB has been the dominant external I/O in computing systems over the past two decades. With the increased adoption of USB-C with high data rates, USB hubs are becoming more popular. Existing power-saving mechanisms do not save much power in USB hubs when there is a steady bandwidth demand from devices. In this paper, we demonstrate significant power savings with a proactive scheduling policy for hubs. Our approach includes the introduction of a shallow U1/CL1 low-power state, resulting in better overall power savings due to the reduced entry and exit times to U1/CL1. Our results demonstrate power savings of tens of watts by increasing the scheduling interval up to the minimum latency tolerance across all devices connected to that hub. As USB moves to USB4 and hubs are used to connect to higher bandwidth devices, these power savings will become even more pronounced.
{"title":"Power Savings in USB Hubs Through A Proactive Scheduling Strategy","authors":"Bikrant Das Sharma, Abdul Rahman Ismail, Chris Meyers","doi":"10.1109/ISQED57927.2023.10129309","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129309","url":null,"abstract":"USB has been the dominant external I/O in computing systems over the past two decades. With the increased adoption of USB-C with high data rates, USB hubs are becoming more popular. Existing power-saving mechanisms do not save much power in USB hubs when there is a steady bandwidth demand from devices. In this paper, we demonstrate significant power savings with a proactive scheduling policy for hubs. Our approach includes the introduction of a shallow U1/CL1 low-power state, resulting in better overall power savings due to the reduced entry and exit times to U1/CL1. Our results demonstrate power savings of tens of watts by increasing the scheduling interval up to the minimum latency tolerance across all devices connected to that hub. As USB moves to USB4 and hubs are used to connect to higher bandwidth devices, these power savings will become even more pronounced.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127204392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129308
Md. Omar Faruque, Wenjie Che
Physically Unclonable Functions (PUFs) are emerging hardware security primitives that leverage random variations during chip fabrication to generate unique secrets. The amount of random secrets that can be extracted from a limited number of physical PUF components can be measured by entropy bits. Existing strategies of pairing or grouping N RO-PUF elements have an entropy upper bound limited by log2(N!) or O(N•log2(N)). A recently proposed entropy boosting technique [9] improves the entropy bits to be quadratically large at N(N-1)/2 or O(N^2), significantly improved the RO-PUF hardware utilization efficiency in generating secrets. However, the improved amount of random secrets comes at the cost of discarding a large portion of unreliable bits. In this paper, we propose an "Inter-Distance Offset (IDO)" technique that converts those unreliable pairs to be reliable by adjusting the pair inter-distance to an appropriate range. Theoretical analysis of the ratio of converted unreliable bits is provided along with experimental validations. Experimental evaluations on reliability, Entropy and reliability tradeoffs are given using real RO PUF datasets in [10]. Information leakage is analyzed and evaluated using PUF datasets to identify those offset ranges that leak no information. The proposed technique improves the portion of reliable (quadratically large) entropy bits by 20% and 100% respectively for different offset ranges. Hardware implementation on FPGAs demonstrates that the proposed technique is lightweight in implementation and runtime.
{"title":"Enlarging Reliable Pairs via Inter-Distance Offset for a PUF Entropy-Boosting Algorithm","authors":"Md. Omar Faruque, Wenjie Che","doi":"10.1109/ISQED57927.2023.10129308","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129308","url":null,"abstract":"Physically Unclonable Functions (PUFs) are emerging hardware security primitives that leverage random variations during chip fabrication to generate unique secrets. The amount of random secrets that can be extracted from a limited number of physical PUF components can be measured by entropy bits. Existing strategies of pairing or grouping N RO-PUF elements have an entropy upper bound limited by log2(N!) or O(N•log2(N)). A recently proposed entropy boosting technique [9] improves the entropy bits to be quadratically large at N(N-1)/2 or O(N^2), significantly improved the RO-PUF hardware utilization efficiency in generating secrets. However, the improved amount of random secrets comes at the cost of discarding a large portion of unreliable bits. In this paper, we propose an \"Inter-Distance Offset (IDO)\" technique that converts those unreliable pairs to be reliable by adjusting the pair inter-distance to an appropriate range. Theoretical analysis of the ratio of converted unreliable bits is provided along with experimental validations. Experimental evaluations on reliability, Entropy and reliability tradeoffs are given using real RO PUF datasets in [10]. Information leakage is analyzed and evaluated using PUF datasets to identify those offset ranges that leak no information. The proposed technique improves the portion of reliable (quadratically large) entropy bits by 20% and 100% respectively for different offset ranges. Hardware implementation on FPGAs demonstrates that the proposed technique is lightweight in implementation and runtime.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122029368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129374
James Geist, Travis Meade, Shaojie Zhang, Yier Jin
Algorithmic analysis of gate level netlists has become an important technique in hardware security. Algorithms can help detect malicious hardware injected into a design, or lock a design against reverse engineering or malicious modification. Many analysis tools have come from the research and commercial communities; however, it is currently the job of the analyst to make these tools work together and interpret the results. Typically tools are text-based, and require error-prone editing of input files in different formats. The analyst must interpret textual results, and sometimes transform them into other formats for use in third party visualization tools. These tasks are repetitive overhead that take time and effort that could better be spent on investigating the netlist. In this paper we introduce NetViz, a visual hardware security environment. NetViz is a meta-tool which combines other analysis tools, automates the task of transferring data between them, and helps with interpretation of results by providing graphical representations of the data.
{"title":"NetViz: A Tool for Netlist Security Visualization","authors":"James Geist, Travis Meade, Shaojie Zhang, Yier Jin","doi":"10.1109/ISQED57927.2023.10129374","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129374","url":null,"abstract":"Algorithmic analysis of gate level netlists has become an important technique in hardware security. Algorithms can help detect malicious hardware injected into a design, or lock a design against reverse engineering or malicious modification. Many analysis tools have come from the research and commercial communities; however, it is currently the job of the analyst to make these tools work together and interpret the results. Typically tools are text-based, and require error-prone editing of input files in different formats. The analyst must interpret textual results, and sometimes transform them into other formats for use in third party visualization tools. These tasks are repetitive overhead that take time and effort that could better be spent on investigating the netlist. In this paper we introduce NetViz, a visual hardware security environment. NetViz is a meta-tool which combines other analysis tools, automates the task of transferring data between them, and helps with interpretation of results by providing graphical representations of the data.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124168425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}