Pub Date : 2023-06-21DOI: https://dl.acm.org/doi/10.1145/3592798
Ahmad Patooghy, Mahdi Hasanzadeh, Amin Sarihi, Mostafa Abdelrehim, Abdel-Hameed A. Badawy
Network-on-chip (NoC) is widely used as an efficient communication architecture in multi-core and many-core System-on-chips (SoCs). However, the shared communication resources in an NoC platform, e.g., channels, buffers, and routers, might be used to conduct attacks compromising the security of NoC-based SoCs. Most of the proposed encryption-based protection methods in the literature require leaving some parts of the packet unencrypted to allow the routers to process/forward packets accordingly. This reveals the source/destination information of the packet to malicious routers, which can be exploited in various attacks. For the first time, we propose the idea of secure, anonymous routing with minimal hardware overhead to encrypt the entire packet while exchanging secure information over the network. We have designed and implemented a new NoC architecture that works with encrypted addresses. The proposed method can manage malicious and benign failures at NoC channels and buffers by bypassing failed components with a situation-driven stochastic path diversification approach. Hardware evaluations show that the proposed security solution combats the security threats at the affordable cost of 1.5% area and 20% power overheads chip-wide.
{"title":"Securing Network-on-chips Against Fault-injection and Crypto-analysis Attacks via Stochastic Anonymous Routing","authors":"Ahmad Patooghy, Mahdi Hasanzadeh, Amin Sarihi, Mostafa Abdelrehim, Abdel-Hameed A. Badawy","doi":"https://dl.acm.org/doi/10.1145/3592798","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3592798","url":null,"abstract":"<p>Network-on-chip (NoC) is widely used as an efficient communication architecture in multi-core and many-core System-on-chips (SoCs). However, the shared communication resources in an NoC platform, e.g., channels, buffers, and routers, might be used to conduct attacks compromising the security of NoC-based SoCs. Most of the proposed encryption-based protection methods in the literature require leaving some parts of the packet unencrypted to allow the routers to process/forward packets accordingly. This reveals the source/destination information of the packet to malicious routers, which can be exploited in various attacks. For the first time, we propose the idea of secure, anonymous routing with minimal hardware overhead to encrypt the entire packet while exchanging secure information over the network. We have designed and implemented a new NoC architecture that works with encrypted addresses. The proposed method can manage malicious and benign failures at NoC channels and buffers by bypassing failed components with a situation-driven stochastic path diversification approach. Hardware evaluations show that the proposed security solution combats the security threats at the affordable cost of 1.5% area and 20% power overheads chip-wide.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"99 4","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-21DOI: https://dl.acm.org/doi/10.1145/3570727
Sidhartha Sankar Rout, Mitali Sinha, Sujay Deb
Wireless Network-on-Chip (WNoC) requires a Medium Access Control (MAC) mechanism for an interference-free sharing of the wireless channel. In traditional MAC, a token is circulated among the Wireless Interfaces (WIs) in a Round Robin manner. The WI with the token holds the channel for a fixed number of cycles. However, the channel requirement of the individual WIs dynamically changes over time due to the varying traffic density across the WNoC. Moreover, the conventional WNoCs give equal importance to all the traffic taking the wireless path and transmit it in an oldest-first manner. Nevertheless, the critical data can degrade the system performance to a large extent by delaying the application runtime if not served promptly. We propose 2DMAC, which can change the token arbitration pattern and tune the channel hold time of each WI based on its runtime traffic density and criticality status. Moreover, 2DMAC prioritizes the critical traffic over the non-critical traffic during the wireless data transfer. The proposed mechanism improves the wireless channel utilization by 15.67% and the network throughput by 29.83% and reduces the critical data latency by 29.77% over the traditional MAC.
{"title":"2DMAC: A Sustainable and Efficient Medium Access Control Mechanism for Future Wireless NoCs","authors":"Sidhartha Sankar Rout, Mitali Sinha, Sujay Deb","doi":"https://dl.acm.org/doi/10.1145/3570727","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570727","url":null,"abstract":"<p>Wireless Network-on-Chip (WNoC) requires a Medium Access Control (MAC) mechanism for an interference-free sharing of the wireless channel. In traditional MAC, a token is circulated among the Wireless Interfaces (WIs) in a <i>Round Robin</i> manner. The WI with the token holds the channel for a fixed number of cycles. However, the channel requirement of the individual WIs dynamically changes over time due to the varying traffic density across the WNoC. Moreover, the conventional WNoCs give equal importance to all the traffic taking the wireless path and transmit it in an oldest-first manner. Nevertheless, the critical data can degrade the system performance to a large extent by delaying the application runtime if not served promptly. We propose 2DMAC, which can change the token arbitration pattern and tune the channel hold time of each WI based on its runtime traffic density and criticality status. Moreover, 2DMAC prioritizes the critical traffic over the non-critical traffic during the wireless data transfer. The proposed mechanism improves the wireless channel utilization by 15.67% and the network throughput by 29.83% and reduces the critical data latency by 29.77% over the traditional MAC.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"99 3","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-21DOI: https://dl.acm.org/doi/10.1145/3590772
Md. Mahfuz Al Hasan, Mohammad Tahsin Mostafiz, Thomas An Le, Jake Julia, Nidish Vashistha, Shayan Taheri, Navid Asadizanjani
Due to the ever-growing demands for electronic chips in different sectors, semiconductor companies have been mandated to offshore their manufacturing processes. This unwanted matter has made security and trustworthiness of their fabricated chips concerning and has caused the creation of hardware attacks. In this condition, different entities in the semiconductor supply chain can act maliciously and execute an attack on the design computing layers, from devices to systems. Our attack is a hardware Trojan that is inserted during mask generation/fabrication in an untrusted foundry. The Trojan leaves a footprint in the fabrication through addition, deletion, or change of design cells. To tackle this problem, we propose EVHA (Explainable Vision System for Hardware Testing and Assurance) in this work, which can detect the smallest possible change to a design in a low-cost, accurate, and fast manner. The inputs to this system are scanning electron microscopy images acquired from the integrated circuits under examination. The system output is the determination of integrated circuit status in terms of having any defect and/or hardware Trojan through addition, deletion, or change in the design cells at the cell level. This article provides an overview on the design, development, implementation, and analysis of our defense system.
{"title":"EVHA: Explainable Vision System for Hardware Testing and Assurance—An Overview","authors":"Md. Mahfuz Al Hasan, Mohammad Tahsin Mostafiz, Thomas An Le, Jake Julia, Nidish Vashistha, Shayan Taheri, Navid Asadizanjani","doi":"https://dl.acm.org/doi/10.1145/3590772","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3590772","url":null,"abstract":"<p>Due to the ever-growing demands for electronic chips in different sectors, semiconductor companies have been mandated to offshore their manufacturing processes. This unwanted matter has made security and trustworthiness of their fabricated chips concerning and has caused the creation of hardware attacks. In this condition, different entities in the semiconductor supply chain can act maliciously and execute an attack on the design computing layers, from devices to systems. Our attack is a hardware Trojan that is inserted during mask generation/fabrication in an untrusted foundry. The Trojan leaves a footprint in the fabrication through addition, deletion, or change of design cells. To tackle this problem, we propose EVHA (Explainable Vision System for Hardware Testing and Assurance) in this work, which can detect the smallest possible change to a design in a low-cost, accurate, and fast manner. The inputs to this system are scanning electron microscopy images acquired from the integrated circuits under examination. The system output is the determination of integrated circuit status in terms of having any defect and/or hardware Trojan through addition, deletion, or change in the design cells at the cell level. This article provides an overview on the design, development, implementation, and analysis of our defense system.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"95 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ikenna Okafor, A. Ramanathan, Nagadastagiri Challapalle, Zheyu Li, Vijaykrishnan Narayanan
Video analytics have a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy intensive, video analytics can greatly benefit from in/ near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers which perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSD’s are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra parallelism within a CNN layer. To this end we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator, and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.
{"title":"Fusing In-Storage and Near-Storage Acceleration of Convolutional Neural Networks","authors":"Ikenna Okafor, A. Ramanathan, Nagadastagiri Challapalle, Zheyu Li, Vijaykrishnan Narayanan","doi":"10.1145/3597496","DOIUrl":"https://doi.org/10.1145/3597496","url":null,"abstract":"Video analytics have a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy intensive, video analytics can greatly benefit from in/ near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers which perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSD’s are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra parallelism within a CNN layer. To this end we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator, and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42044380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video analytics have a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy intensive, video analytics can greatly benefit from in/ near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers which perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSD’s are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra parallelism within a CNN layer. To this end we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator, and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.
{"title":"Fusing In-Storage and Near-Storage Acceleration of Convolutional Neural Networks","authors":"Ikenna Okafor, Akshay Krishna Ramanathan, Nagadastagiri Reddy Challapalle, Zheyu Li, Vijaykrishnan Narayanan","doi":"https://dl.acm.org/doi/10.1145/3597496","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597496","url":null,"abstract":"<p>Video analytics have a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy intensive, video analytics can greatly benefit from in/ near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers which perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSD’s are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra parallelism within a CNN layer. To this end we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator, and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"94 3","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-30DOI: https://dl.acm.org/doi/10.1145/3597024
B. M. S. Bahar Talukder, Farah Ferdaus, Md Tauhidur Rahman
Many commercially available memory chips are fabricated worldwide in untrusted facilities. Therefore, a counterfeit memory chip can easily enter into the supply chain in different formats. Deploying these counterfeit memory chips into an electronic system can severely affect security and reliability domains because of their substandard quality, poor performance, and shorter lifespan. Therefore, a proper solution is required to identify counterfeit memory chips before deploying them in mission-, safety-, and security-critical systems. However, a single solution to prevent counterfeiting is challenging due to the diversity of counterfeit types, sources, and refinement techniques. Besides, the chips can pass initial testing and still fail while being used in the system. Furthermore, existing solutions focus on detecting a single counterfeit type (e.g., detecting recycled memory chips). This work proposes a framework that detects major counterfeit static random-access memory (SRAM) types by attesting/identifying the origin of the manufacturer. The proposed technique generates a single signature for a manufacturer and does not require any exhaustive registration/authentication process. We validate our proposed technique using 345 SRAM chips produced by major manufacturers. The silicon results show that the test scores (F1 score) of our proposed technique of identifying memory manufacturer and part-number are 93% and 71%, respectively.
{"title":"A Noninvasive Technique to Detect Authentic/Counterfeit SRAM Chips","authors":"B. M. S. Bahar Talukder, Farah Ferdaus, Md Tauhidur Rahman","doi":"https://dl.acm.org/doi/10.1145/3597024","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597024","url":null,"abstract":"<p>Many commercially available memory chips are fabricated worldwide in untrusted facilities. Therefore, a counterfeit memory chip can easily enter into the supply chain in different formats. Deploying these counterfeit memory chips into an electronic system can severely affect security and reliability domains because of their substandard quality, poor performance, and shorter lifespan. Therefore, a proper solution is required to identify counterfeit memory chips before deploying them in mission-, safety-, and security-critical systems. However, a single solution to prevent counterfeiting is challenging due to the diversity of counterfeit types, sources, and refinement techniques. Besides, the chips can pass initial testing and still fail while being used in the system. Furthermore, existing solutions focus on detecting a single counterfeit type (e.g., detecting recycled memory chips). This work proposes a framework that detects major counterfeit static random-access memory (SRAM) types by attesting/identifying the origin of the manufacturer. The proposed technique generates a single signature for a manufacturer and does not require any exhaustive registration/authentication process. We validate our proposed technique using 345 SRAM chips produced by major manufacturers. The silicon results show that the test scores (<i>F</i><sub>1</sub> score) of our proposed technique of identifying memory manufacturer and part-number are 93% and 71%, respectively.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"95 4","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-19DOI: https://dl.acm.org/doi/10.1145/3597497
Anindan Mondal, Debasish Kalita, Archisman Ghosh, Suchismita Roy, Bibhash Sen
Hardware Trojans (HT) are small circuits intentionally designed by an adversary for harmful purposes. These types of circuits are extremely difficult to detect. An HT often requires some specific signals to activate which is almost impossible to discover. For this reason, test generation for side channel analysis has gained significant attraction in recent times which does not require HT activation. Such test generation techniques aim to generate a large amount of switching activity inside the HT circuit, increasing transient current measurement. However, such methods suffer from either long runtime or reliable results. In this work, a test generation technique is proposed based on the relative switching activity of the circuit to overcome the limitations of the existing works. Initially, the proposed technique measures the impact of each input on rare nets individually using random vector simulation. Potent inputs are selected to obtain a new set of test vectors that provides high relative switching inside a circuit. The proposed method is applied on 11 different ISCAS and 3 ITC 99 benchmark circuits. Experimental results endorse the efficacy of the proposed method outperforming traditional hamming distance-based re-ordering techniques (up to 20x) while requiring a small run-time.
{"title":"Towards the Generation of Test Vectors for the Detection of Hardware Trojan Targeting Effective Switching Activity","authors":"Anindan Mondal, Debasish Kalita, Archisman Ghosh, Suchismita Roy, Bibhash Sen","doi":"https://dl.acm.org/doi/10.1145/3597497","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597497","url":null,"abstract":"<p>Hardware Trojans (HT) are small circuits intentionally designed by an adversary for harmful purposes. These types of circuits are extremely difficult to detect. An HT often requires some specific signals to activate which is almost impossible to discover. For this reason, test generation for side channel analysis has gained significant attraction in recent times which does not require HT activation. Such test generation techniques aim to generate a large amount of switching activity inside the HT circuit, increasing transient current measurement. However, such methods suffer from either long runtime or reliable results. In this work, a test generation technique is proposed based on the relative switching activity of the circuit to overcome the limitations of the existing works. Initially, the proposed technique measures the impact of each input on rare nets individually using random vector simulation. Potent inputs are selected to obtain a new set of test vectors that provides high relative switching inside a circuit. The proposed method is applied on 11 different ISCAS and 3 ITC 99 benchmark circuits. Experimental results endorse the efficacy of the proposed method outperforming traditional hamming distance-based re-ordering techniques (up to 20x) while requiring a small run-time.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"94 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anindan Mondal, Debasish Kalita, A. Ghosh, Suchismita Roy, Bibhash Sen
Hardware Trojans (HT) are small circuits intentionally designed by an adversary for harmful purposes. These types of circuits are extremely difficult to detect. An HT often requires some specific signals to activate which is almost impossible to discover. For this reason, test generation for side channel analysis has gained significant attraction in recent times which does not require HT activation. Such test generation techniques aim to generate a large amount of switching activity inside the HT circuit, increasing transient current measurement. However, such methods suffer from either long runtime or reliable results. In this work, a test generation technique is proposed based on the relative switching activity of the circuit to overcome the limitations of the existing works. Initially, the proposed technique measures the impact of each input on rare nets individually using random vector simulation. Potent inputs are selected to obtain a new set of test vectors that provides high relative switching inside a circuit. The proposed method is applied on 11 different ISCAS and 3 ITC 99 benchmark circuits. Experimental results endorse the efficacy of the proposed method outperforming traditional hamming distance-based re-ordering techniques (up to 20x) while requiring a small run-time.
{"title":"Towards the Generation of Test Vectors for the Detection of Hardware Trojan Targeting Effective Switching Activity","authors":"Anindan Mondal, Debasish Kalita, A. Ghosh, Suchismita Roy, Bibhash Sen","doi":"10.1145/3597497","DOIUrl":"https://doi.org/10.1145/3597497","url":null,"abstract":"Hardware Trojans (HT) are small circuits intentionally designed by an adversary for harmful purposes. These types of circuits are extremely difficult to detect. An HT often requires some specific signals to activate which is almost impossible to discover. For this reason, test generation for side channel analysis has gained significant attraction in recent times which does not require HT activation. Such test generation techniques aim to generate a large amount of switching activity inside the HT circuit, increasing transient current measurement. However, such methods suffer from either long runtime or reliable results. In this work, a test generation technique is proposed based on the relative switching activity of the circuit to overcome the limitations of the existing works. Initially, the proposed technique measures the impact of each input on rare nets individually using random vector simulation. Potent inputs are selected to obtain a new set of test vectors that provides high relative switching inside a circuit. The proposed method is applied on 11 different ISCAS and 3 ITC 99 benchmark circuits. Experimental results endorse the efficacy of the proposed method outperforming traditional hamming distance-based re-ordering techniques (up to 20x) while requiring a small run-time.","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":" ","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45576261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-18DOI: https://dl.acm.org/doi/10.1145/3577214
Siyuan Huang, Brian D. Hoskins, Matthew W. Daniels, Mark D. Stiles, Gina C. Adam
The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.
{"title":"Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays","authors":"Siyuan Huang, Brian D. Hoskins, Matthew W. Daniels, Mark D. Stiles, Gina C. Adam","doi":"https://dl.acm.org/doi/10.1145/3577214","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577214","url":null,"abstract":"<p>The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-<i>k</i> approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"96 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-18DOI: https://dl.acm.org/doi/10.1145/3588032
Nathan Jessurun, Olivia P. Dizon-Paradis, Jacob Harrison, Shajib Ghosh, Mark M. Tehranipoor, Damon L. Woodard, Navid Asadizanjani
Outsourced PCB fabrication necessitates increased hardware assurance capabilities. Several assurance techniques based on AOI have been proposed that leverage PCB images acquired using digital cameras. We review state-of-the-art AOI techniques and observe a strong, rapid trend toward ML solutions. These require significant amounts of labeled ground truth data, which is lacking in the publicly available PCB data space. We contribute the FPIC dataset to address this need. Additionally, we outline new hardware security methodologies enabled by our dataset.
{"title":"FPIC: A Novel Semantic Dataset for Optical PCB Assurance","authors":"Nathan Jessurun, Olivia P. Dizon-Paradis, Jacob Harrison, Shajib Ghosh, Mark M. Tehranipoor, Damon L. Woodard, Navid Asadizanjani","doi":"https://dl.acm.org/doi/10.1145/3588032","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588032","url":null,"abstract":"<p>Outsourced PCB fabrication necessitates increased hardware assurance capabilities. Several assurance techniques based on AOI have been proposed that leverage PCB images acquired using digital cameras. We review state-of-the-art AOI techniques and observe a strong, rapid trend toward ML solutions. These require significant amounts of labeled ground truth data, which is lacking in the publicly available PCB data space. We contribute the FPIC dataset to address this need. Additionally, we outline new hardware security methodologies enabled by our dataset.</p>","PeriodicalId":50924,"journal":{"name":"ACM Journal on Emerging Technologies in Computing Systems","volume":"96 3","pages":""},"PeriodicalIF":2.2,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}