Pub Date : 2024-10-09DOI: 10.1109/TCSI.2024.3470335
Suraj Mandal;Debapriya Basu Roy
Number Theoretic Transform (NTT) plays an important role in efficiently implementing lattice-based cryptographic algorithms like CRYSTALS-Kyber, Dilithium, and FALCON. Existing implementations of NTT for these algorithms are mostly based on radix-2 or radix-4 realization of Cooley-Tukey and Gentleman-Sande architectures. In this work, we explore an alternative method of performing NTT known as Winograd’s NTT that requires fewer number of modular multipliers than the conventional Coole-Tukey/Gentleman-Sande for higher radix NTT. We have proposed three different low-latency implementations of Winograd’s NTT, applicable to CRYSTALS-Dilithium, FALCON, and CRYSTALS-Kyber, respectively. Our first implementation of Winograd NTT focuses on radix-16 NTT multiplication unit for polynomials of length 256 and can be directly used for CRYSTALS-Dilithium. The NTT of CRYSTALS-Dilithium is also benefited from our proposed K-RED modular multiplication. Our radix-16-based Winograd outperforms existing Cooley-Tukey/Gentleman-Sande based NTT multipliers of CRYSTALS-Dilithium. Our second implementation of NTT is based on radix-8 Winograd structure with a novel modular multiplication method that targets polynomials of length 512 and can be directly applied for FALCON. For CRYSTALS-Kyber, we have designed a radix-16 Winograd Butterfly Unit (BFU) that can be configured as two parallel radix-8 Winograd BFUs during mixed-radix computation. To the best of our knowledge, this is the first work that applied the Winograd technique for NTT multiplication for post-quantum secure lattice-based cryptographic algorithms.
{"title":"Winograd for NTT: A Case Study on Higher-Radix and Low-Latency Implementation of NTT for Post Quantum Cryptography on FPGA","authors":"Suraj Mandal;Debapriya Basu Roy","doi":"10.1109/TCSI.2024.3470335","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3470335","url":null,"abstract":"Number Theoretic Transform (NTT) plays an important role in efficiently implementing lattice-based cryptographic algorithms like CRYSTALS-Kyber, Dilithium, and FALCON. Existing implementations of NTT for these algorithms are mostly based on radix-2 or radix-4 realization of Cooley-Tukey and Gentleman-Sande architectures. In this work, we explore an alternative method of performing NTT known as Winograd’s NTT that requires fewer number of modular multipliers than the conventional Coole-Tukey/Gentleman-Sande for higher radix NTT. We have proposed three different low-latency implementations of Winograd’s NTT, applicable to CRYSTALS-Dilithium, FALCON, and CRYSTALS-Kyber, respectively. Our first implementation of Winograd NTT focuses on radix-16 NTT multiplication unit for polynomials of length 256 and can be directly used for CRYSTALS-Dilithium. The NTT of CRYSTALS-Dilithium is also benefited from our proposed K-RED modular multiplication. Our radix-16-based Winograd outperforms existing Cooley-Tukey/Gentleman-Sande based NTT multipliers of CRYSTALS-Dilithium. Our second implementation of NTT is based on radix-8 Winograd structure with a novel modular multiplication method that targets polynomials of length 512 and can be directly applied for FALCON. For CRYSTALS-Kyber, we have designed a radix-16 Winograd Butterfly Unit (BFU) that can be configured as two parallel radix-8 Winograd BFUs during mixed-radix computation. To the best of our knowledge, this is the first work that applied the Winograd technique for NTT multiplication for post-quantum secure lattice-based cryptographic algorithms.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"6396-6409"},"PeriodicalIF":5.2,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1109/TCSI.2024.3466972
Yao Wang;Guangyang Zhang;Xue Mei;Chongyan Gu
As a lightweight hardware security primitive, physical unclonable functions (PUFs) can provide reliable identity authentication for the Internet of Things (IoT) devices with limited resources. Arbiter PUF (APUF) is one of the most well-known PUF circuits. However, its hardware implementation has poor reliability on field programmable gate arrays (FPGAs). This paper proposed a highly reliable APUF that uses a delay difference quantization strategy (DDQ-APUF). By adding multiple configurable delay units to the two symmetrical paths of the conventional APUF, the delay difference between the two symmetrical paths of APUF can be obtained by collecting the output of APUF under different delay configurations. Compared to conventional APUFs, DDQ-APUF does not use the arbitration result of signal transmission in two symmetric paths as its response, but rather uses the quantified delay difference between the two paths as its response. A tolerance threshold is adopted in the authentication to accommodate the variations in delay differences due to environmental changes. Moreover, the modeling attack resistance of DDQ-APUF is evaluated, and a strategy for improving this resistance by incorporating pseudo-XOR technique is proposed. The circuit was implemented on Xilinx Artix-7 FPGAs and the experimental results show that the reliability achieves 99.95% with non-CRP-discard.
{"title":"A High-Reliability, Non-CRP-Discard Arbiter PUF Based on Delay Difference Quantization","authors":"Yao Wang;Guangyang Zhang;Xue Mei;Chongyan Gu","doi":"10.1109/TCSI.2024.3466972","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3466972","url":null,"abstract":"As a lightweight hardware security primitive, physical unclonable functions (PUFs) can provide reliable identity authentication for the Internet of Things (IoT) devices with limited resources. Arbiter PUF (APUF) is one of the most well-known PUF circuits. However, its hardware implementation has poor reliability on field programmable gate arrays (FPGAs). This paper proposed a highly reliable APUF that uses a delay difference quantization strategy (DDQ-APUF). By adding multiple configurable delay units to the two symmetrical paths of the conventional APUF, the delay difference between the two symmetrical paths of APUF can be obtained by collecting the output of APUF under different delay configurations. Compared to conventional APUFs, DDQ-APUF does not use the arbitration result of signal transmission in two symmetric paths as its response, but rather uses the quantified delay difference between the two paths as its response. A tolerance threshold is adopted in the authentication to accommodate the variations in delay differences due to environmental changes. Moreover, the modeling attack resistance of DDQ-APUF is evaluated, and a strategy for improving this resistance by incorporating pseudo-XOR technique is proposed. The circuit was implemented on Xilinx Artix-7 FPGAs and the experimental results show that the reliability achieves 99.95% with non-CRP-discard.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"573-585"},"PeriodicalIF":5.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1109/TCSI.2024.3466219
Bomin Joo;Minkyu Ko;Jieun Kim;Bai-Sun Kong
This paper proposes an energy- and area-efficient sound localization neural network mimicking the auditory brainstem cognitive function. By adopting the bio-plausible Jeffress model, the proposed neural network locates the sound based on the interaural time difference (ITD) in an energy- and hardware-efficient manner. The proposed network modifies the original structure of the Jeffress model having a pair of long axon lines to provide performance gain. It can reduce power consumption and area by using a single axon line. It can further improve efficiency in terms of power and area by shortening the length of the axon line for pulse propagation. Since only the leading pulse is allowed to propagate through the shortened single axon delay line, the number of delay elements and corresponding network components are reduced. Moreover, it can accurately detect the location of the sound source thanks to the axon line composed of synchronized delay elements. A further reduction of the power consumption is achieved by eliminating redundant pulse propagation through the axon line after the output neuron fires. The proposed sound localization neural network was fabricated in a 28-nm CMOS process. The performance evaluation results indicate that the proposed sound localization neural network can detect the location of a sound source with a one-degree resolution at a given robot head size of 3.0125 cm, regardless of process corners. It also indicates that the network achieves up to 86.6% and 97.2% energy and area reduction from conventional sound localization networks, operating at 0.305-V supply voltage.
{"title":"A Bio-Inspired Energy- and Area-Efficient Sound Localization Neural Network","authors":"Bomin Joo;Minkyu Ko;Jieun Kim;Bai-Sun Kong","doi":"10.1109/TCSI.2024.3466219","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3466219","url":null,"abstract":"This paper proposes an energy- and area-efficient sound localization neural network mimicking the auditory brainstem cognitive function. By adopting the bio-plausible Jeffress model, the proposed neural network locates the sound based on the interaural time difference (ITD) in an energy- and hardware-efficient manner. The proposed network modifies the original structure of the Jeffress model having a pair of long axon lines to provide performance gain. It can reduce power consumption and area by using a single axon line. It can further improve efficiency in terms of power and area by shortening the length of the axon line for pulse propagation. Since only the leading pulse is allowed to propagate through the shortened single axon delay line, the number of delay elements and corresponding network components are reduced. Moreover, it can accurately detect the location of the sound source thanks to the axon line composed of synchronized delay elements. A further reduction of the power consumption is achieved by eliminating redundant pulse propagation through the axon line after the output neuron fires. The proposed sound localization neural network was fabricated in a 28-nm CMOS process. The performance evaluation results indicate that the proposed sound localization neural network can detect the location of a sound source with a one-degree resolution at a given robot head size of 3.0125 cm, regardless of process corners. It also indicates that the network achieves up to 86.6% and 97.2% energy and area reduction from conventional sound localization networks, operating at 0.305-V supply voltage.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"719-729"},"PeriodicalIF":5.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1109/TCSI.2024.3466563
Yishuo Meng;Junfeng Wu;Siwei Xiang;Jianfei Wang;Jia Hou;Zhijie Lin;Chen Yang
CNN acceleration algorithms, including Winograd, Fast Fourier Transform (FFT) and Number Theoretic transform (NTT), have demonstrated their potential in efficiently operating current Convolutional Neural Networks (CNNs). However, deploying FFT algorithm for CNN acceleration would introduce significant invalid elements, unnecessary computations and unacceptable transformation overhead. To address these issues, this paper proposes a series of improved methods along with an FFT-based architecture for efficient and simplified CNN acceleration. First, a novel mixed-radix FFT algorithm is proposed for the reduction of invalid elements. Moreover, Hermitian symmetry is utilized to further reduce the scale of FFT transformation and the number of multiplications. Furthermore, an efficient FFT-based CNN accelerator with a resource-efficient transformation component and a multiplication-reduced PE array is designed. Our proposed accelerator is implemented based on Xilinx XCVU440 with a running frequency of 238MHz, achieving actual performance of 2109-2797 GOPS and DSP efficiency of 1.37-1.82 GOPS/DSP. Compared to previous works based on Winograd, FFT and NTT, our proposed accelerator can realize up to $9.42times $ speedup on actual performance and $1.11times -6.41times $ speedup on DSP efficiency.
{"title":"A High-Throughput and Flexible CNN Accelerator Based on Mixed-Radix FFT Method","authors":"Yishuo Meng;Junfeng Wu;Siwei Xiang;Jianfei Wang;Jia Hou;Zhijie Lin;Chen Yang","doi":"10.1109/TCSI.2024.3466563","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3466563","url":null,"abstract":"CNN acceleration algorithms, including Winograd, Fast Fourier Transform (FFT) and Number Theoretic transform (NTT), have demonstrated their potential in efficiently operating current Convolutional Neural Networks (CNNs). However, deploying FFT algorithm for CNN acceleration would introduce significant invalid elements, unnecessary computations and unacceptable transformation overhead. To address these issues, this paper proposes a series of improved methods along with an FFT-based architecture for efficient and simplified CNN acceleration. First, a novel mixed-radix FFT algorithm is proposed for the reduction of invalid elements. Moreover, Hermitian symmetry is utilized to further reduce the scale of FFT transformation and the number of multiplications. Furthermore, an efficient FFT-based CNN accelerator with a resource-efficient transformation component and a multiplication-reduced PE array is designed. Our proposed accelerator is implemented based on Xilinx XCVU440 with a running frequency of 238MHz, achieving actual performance of 2109-2797 GOPS and DSP efficiency of 1.37-1.82 GOPS/DSP. Compared to previous works based on Winograd, FFT and NTT, our proposed accelerator can realize up to <inline-formula> <tex-math>$9.42times $ </tex-math></inline-formula> speedup on actual performance and <inline-formula> <tex-math>$1.11times -6.41times $ </tex-math></inline-formula> speedup on DSP efficiency.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"816-829"},"PeriodicalIF":5.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1109/TCSI.2024.3470135
Matthias Probst;Manuel Brosch;Georg Sigl
Spiking neural networks gain increasing attention in constraint edge devices due to event-based low-power operation and little resource usage. Such edge devices often allow physical access, opening the door for Side-Channel Analysis. In this work, we introduce a novel robust attack strategy on the neuron level to retrieve the trained parameters of an implemented spiking neural network. Utilizing horizontal correlation power analysis, we demonstrate how to recover the weights and thresholds of a feed-forward spiking neural network implementation. We verify our methodology with real-world measurements of localized electromagnetic emanations of an FPGA design. Additionally, we propose countermeasures against the introduced novel attack approach. We evaluate shuffling and masking as countermeasures to protect the implementation against our proposed attack and demonstrate their effectiveness and limitations.
{"title":"Side-Channel Analysis of Integrate-and-Fire Neurons Within Spiking Neural Networks","authors":"Matthias Probst;Manuel Brosch;Georg Sigl","doi":"10.1109/TCSI.2024.3470135","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3470135","url":null,"abstract":"Spiking neural networks gain increasing attention in constraint edge devices due to event-based low-power operation and little resource usage. Such edge devices often allow physical access, opening the door for Side-Channel Analysis. In this work, we introduce a novel robust attack strategy on the neuron level to retrieve the trained parameters of an implemented spiking neural network. Utilizing horizontal correlation power analysis, we demonstrate how to recover the weights and thresholds of a feed-forward spiking neural network implementation. We verify our methodology with real-world measurements of localized electromagnetic emanations of an FPGA design. Additionally, we propose countermeasures against the introduced novel attack approach. We evaluate shuffling and masking as countermeasures to protect the implementation against our proposed attack and demonstrate their effectiveness and limitations.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"548-560"},"PeriodicalIF":5.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10705320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1109/TCSI.2024.3458057
F. de Los Santos-Prieto;F. J. Rubio-Barbero;R. Castro-Lopez;E. Roca;F. V. Fernandez
Silicon Physical Unclonable Functions (PUFs) have emerged as a promising solution for generating cryptographic keys in low-cost resource-constrained devices. A PUF is expected to be reliable, meaning that its response bits should remain consistent each time the corresponding challenges are queried. Unfortunately, the stability of these challenge-response pairs (CRPs) can be seriously eroded by environmental factors like temperature variations and the aging of the integrated circuits implementing the PUF. Several approaches, including bit masking, bit selection techniques, and error-correcting codes, have been proposed to obtain a reliable PUF operation in the face of temperature variations. As for aging, a new kind of aging-resilient silicon PUF has been reported that uses the time-varying phenomenon known as Random Telegraph Noise (RTN) as the underlying entropy source. Although this type of PUF preserves its reliability well when aged, it is not immune to the impact of temperature variations. The work presented here shows that it is possible to improve the thermal reliability of RTN-based PUFs with a proper combination of (a) a novel optimization-based bit selection technique, that is also applicable to other types of PUFs based on differential measurements; and (b) a temperature-aware tuning of the entropy-harvesting function.
{"title":"A Comprehensive Approach to Improving the Thermal Reliability of RTN-Based PUFs","authors":"F. de Los Santos-Prieto;F. J. Rubio-Barbero;R. Castro-Lopez;E. Roca;F. V. Fernandez","doi":"10.1109/TCSI.2024.3458057","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3458057","url":null,"abstract":"Silicon Physical Unclonable Functions (PUFs) have emerged as a promising solution for generating cryptographic keys in low-cost resource-constrained devices. A PUF is expected to be reliable, meaning that its response bits should remain consistent each time the corresponding challenges are queried. Unfortunately, the stability of these challenge-response pairs (CRPs) can be seriously eroded by environmental factors like temperature variations and the aging of the integrated circuits implementing the PUF. Several approaches, including bit masking, bit selection techniques, and error-correcting codes, have been proposed to obtain a reliable PUF operation in the face of temperature variations. As for aging, a new kind of aging-resilient silicon PUF has been reported that uses the time-varying phenomenon known as Random Telegraph Noise (RTN) as the underlying entropy source. Although this type of PUF preserves its reliability well when aged, it is not immune to the impact of temperature variations. The work presented here shows that it is possible to improve the thermal reliability of RTN-based PUFs with a proper combination of (a) a novel optimization-based bit selection technique, that is also applicable to other types of PUFs based on differential measurements; and (b) a temperature-aware tuning of the entropy-harvesting function.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"661-670"},"PeriodicalIF":5.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1109/TCSI.2024.3468372
Yu-En Wu;Sin-Cheng Huang;Che-Ming Chang
This paper proposes a novel high step-up DC-DC converter comprising a single switch and a three-winding coupled inductor. By using only one switch, the proposed converter simplifies control by requiring only one set of PWM signals and eliminates the need for operating at extremely high duty cycles or high turns ratios to achieve the desired gain ratio. Moreover, the converter achieves remarkable voltage gain through a voltage multiplier cell and a three-winding coupled inductor. This paper employed a 500W high step-up converter to confirm the correctness and feasibility of the proposed converter through steady-state analysis, software simulations, and hardware implementation. The measured maximum efficiency reached 95.8% when operated under 150W.
{"title":"A Novel Single-Switch High Step-Up DC-DC Converter With High-Voltage Conversion Ratio","authors":"Yu-En Wu;Sin-Cheng Huang;Che-Ming Chang","doi":"10.1109/TCSI.2024.3468372","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3468372","url":null,"abstract":"This paper proposes a novel high step-up DC-DC converter comprising a single switch and a three-winding coupled inductor. By using only one switch, the proposed converter simplifies control by requiring only one set of PWM signals and eliminates the need for operating at extremely high duty cycles or high turns ratios to achieve the desired gain ratio. Moreover, the converter achieves remarkable voltage gain through a voltage multiplier cell and a three-winding coupled inductor. This paper employed a 500W high step-up converter to confirm the correctness and feasibility of the proposed converter through steady-state analysis, software simulations, and hardware implementation. The measured maximum efficiency reached 95.8% when operated under 150W.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"953-962"},"PeriodicalIF":5.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massive MIMO systems are promising for wireless communications beyond 5G, but scalable Direction-of-Arrival (DOA) estimation in these systems is challenging due to the increasing number of required antennas. Existing solutions, model-based or data-driven (typically using neural networks), face scalability issues with the growing antenna array size. To address this issue, we propose a hybrid system that makes the overall approach scalable. In the front-end, we employ a modular distributed approach namely, the method of sparse linear inverse to compute a proxy spectrum from the sampled covariance matrix of the antenna subarrays. The proxy drives a fixed lightweight back-end which consists of a 1-dimensional Convolution Neural Network (1D-CNN) and a simplified peak extraction. The input proxy dimension being independent of the antenna count makes the neural network input invariant of the array size, enabling it to handle multiple array sizes without requiring any modification of the neural network structure. To reduce the computation of the covariance matrix and proxy spectrum, we employ a system of subarrays with Nearest-Neighbor communication. The proposed approach was implemented on a Xilinx ZCU102 FPGA targeting 100 MHz frequency for 8 to 256-element arrays. We achieve below 1 ms processing time for an array of 256 antennas while requiring significantly less computation than both model-based and data-driven approaches for large antenna arrays.
{"title":"MDS-DOA: Fusing Model-Based and Data-Driven Approaches for Modular, Distributed, and Scalable Direction-of-Arrival Estimation","authors":"Adou Sangbone Assoa;Ashwin Bhat;Sigang Ryu;Arijit Raychowdhury","doi":"10.1109/TCSI.2024.3469928","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3469928","url":null,"abstract":"Massive MIMO systems are promising for wireless communications beyond 5G, but scalable Direction-of-Arrival (DOA) estimation in these systems is challenging due to the increasing number of required antennas. Existing solutions, model-based or data-driven (typically using neural networks), face scalability issues with the growing antenna array size. To address this issue, we propose a hybrid system that makes the overall approach scalable. In the front-end, we employ a modular distributed approach namely, the method of sparse linear inverse to compute a proxy spectrum from the sampled covariance matrix of the antenna subarrays. The proxy drives a fixed lightweight back-end which consists of a 1-dimensional Convolution Neural Network (1D-CNN) and a simplified peak extraction. The input proxy dimension being independent of the antenna count makes the neural network input invariant of the array size, enabling it to handle multiple array sizes without requiring any modification of the neural network structure. To reduce the computation of the covariance matrix and proxy spectrum, we employ a system of subarrays with Nearest-Neighbor communication. The proposed approach was implemented on a Xilinx ZCU102 FPGA targeting 100 MHz frequency for 8 to 256-element arrays. We achieve below 1 ms processing time for an array of 256 antennas while requiring significantly less computation than both model-based and data-driven approaches for large antenna arrays.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"941-952"},"PeriodicalIF":5.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03DOI: 10.1109/TCSI.2024.3470318
Adil Malik;Christos Papavassiliou
In this paper, we study the stochastic state trajectory and conductance distributions of memristors under periodic pulse excitation. Our results, backed by experimental evidence, reveal that practical memristors exhibit a $1/f^{2}$