Yang Jiao, Changxiong Xia, Jingyu Zhai, Tianyi Xing, Qun Wan
In the field of signal processing, modulation signals, including phase shift keying (PSK) and quadrature amplitude modulation (QAM), can significantly enhance the signal-to-noise ratio (SNR) through aliasing transmission following clustering and sorting. This article presents two novel approaches to compressed time difference of arrival (TDOA) estimation, leveraging amplitude-phase clustering signals. A carefully designed compression matrix is constructed based on the unique amplitude and phase characteristics of the signals. The study then analyzes the Cramér–Rao lower bound (CRLB) under full-sampling conditions. Finally, TDOA estimation is performed using the approximate maximum likelihood (AML) method. Simulation results demonstrate that the proposed compressed sampling TDOA estimation methods, based on amplitude-phase clustering, achieve accuracy within an order of magnitude of full-sampling performance. Additionally, this article explores the application of OFDM-QAM signals, which exhibit amplitude-phase convergence in the frequency domain, for time difference estimation in compressed sampling. A novel frequency-domain aliasing time difference estimation algorithm based on amplitude-phase convergence is proposed. Experimental results indicate that under high SNR conditions, the algorithm incurs only a minor SNR degradation of ~4 dB compared to time difference estimation in uncompressed transmission.
{"title":"Compressive TDOA Estimation Method Based on Amplitude Phase Clustering in Time-Frequency Domain","authors":"Yang Jiao, Changxiong Xia, Jingyu Zhai, Tianyi Xing, Qun Wan","doi":"10.1049/sil2/3642027","DOIUrl":"https://doi.org/10.1049/sil2/3642027","url":null,"abstract":"<p>In the field of signal processing, modulation signals, including phase shift keying (PSK) and quadrature amplitude modulation (QAM), can significantly enhance the signal-to-noise ratio (SNR) through aliasing transmission following clustering and sorting. This article presents two novel approaches to compressed time difference of arrival (TDOA) estimation, leveraging amplitude-phase clustering signals. A carefully designed compression matrix is constructed based on the unique amplitude and phase characteristics of the signals. The study then analyzes the Cramér–Rao lower bound (CRLB) under full-sampling conditions. Finally, TDOA estimation is performed using the approximate maximum likelihood (AML) method. Simulation results demonstrate that the proposed compressed sampling TDOA estimation methods, based on amplitude-phase clustering, achieve accuracy within an order of magnitude of full-sampling performance. Additionally, this article explores the application of OFDM-QAM signals, which exhibit amplitude-phase convergence in the frequency domain, for time difference estimation in compressed sampling. A novel frequency-domain aliasing time difference estimation algorithm based on amplitude-phase convergence is proposed. Experimental results indicate that under high SNR conditions, the algorithm incurs only a minor SNR degradation of ~4 dB compared to time difference estimation in uncompressed transmission.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"2026 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/sil2/3642027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146096380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangqun Zhang, Zhizhou Ge, Kai Lu, Genyuan Du, Jiawen Shen, Xiangqian Gao
Hand gesture recognition using mmWave radar has emerged as a promising technology for human–computer interaction (HCI), smart home systems, and the Internet of Things (IoT). However, the practical application of this technology is often constrained by the high computational complexity and significant storage demands of contemporary deep neural networks, which impede their deployment on resource-limited embedded devices. To address this limitation, we present a novel approach that combines an improved MobileViT model with a knowledge distillation (KD) framework. The proposed method consists of three main stages. First, raw radar signals are captured and restructured into a three-dimensional format (Chirps × Samples × Frames, a 3D tensor) and processed to generate range-time maps (RTMs) and Doppler-time maps (DTMs). Second, an improved MobileViT network is designed, incorporating fewer redundant blocks, a lower input resolution, and a dual-branch input structure to effectively fuse features from the RTM and DTM. This enhanced architecture serves as a robust teacher model, excelling at extracting both local and global spatiotemporal features for accurate gesture recognition. Finally, KD is applied to transfer knowledge from the teacher model to a compact student network, thereby achieving model compression. Experimental results demonstrate that the final distilled student model, evaluated on the test set, has only 0.018 M parameters—~10% of the teacher model’s size—while still achieving a high recognition accuracy of 99.16%. Consequently, the resulting model is highly compact and accurate, demonstrating its suitability for real-world embedded deployment.
{"title":"Joint MobileViT and Knowledge Distillation Network for Hand Gesture Recognition via mmWave Radar","authors":"Xiangqun Zhang, Zhizhou Ge, Kai Lu, Genyuan Du, Jiawen Shen, Xiangqian Gao","doi":"10.1049/sil2/9971257","DOIUrl":"https://doi.org/10.1049/sil2/9971257","url":null,"abstract":"<p>Hand gesture recognition using mmWave radar has emerged as a promising technology for human–computer interaction (HCI), smart home systems, and the Internet of Things (IoT). However, the practical application of this technology is often constrained by the high computational complexity and significant storage demands of contemporary deep neural networks, which impede their deployment on resource-limited embedded devices. To address this limitation, we present a novel approach that combines an improved MobileViT model with a knowledge distillation (KD) framework. The proposed method consists of three main stages. First, raw radar signals are captured and restructured into a three-dimensional format (Chirps × Samples × Frames, a 3D tensor) and processed to generate range-time maps (RTMs) and Doppler-time maps (DTMs). Second, an improved MobileViT network is designed, incorporating fewer redundant blocks, a lower input resolution, and a dual-branch input structure to effectively fuse features from the RTM and DTM. This enhanced architecture serves as a robust teacher model, excelling at extracting both local and global spatiotemporal features for accurate gesture recognition. Finally, KD is applied to transfer knowledge from the teacher model to a compact student network, thereby achieving model compression. Experimental results demonstrate that the final distilled student model, evaluated on the test set, has only 0.018 M parameters—~10% of the teacher model’s size—while still achieving a high recognition accuracy of 99.16%. Consequently, the resulting model is highly compact and accurate, demonstrating its suitability for real-world embedded deployment.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"2026 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/sil2/9971257","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brain-computer interface (BCI) plays an important role in various fields, such as neuroscience, rehabilitation, and machine learning. The silent BCI, which can reconstruct inner speech from neural activity, holds great promise for aphasia patients. In this paper, we design an imagined Chinese speech experimental paradigm based on initials and finals and collect raw signals from eight healthy participants by using 64-channel scalp electroencephalograms. Linear predictive coding (LPC) and mel frequency cepstral coefficients (MFCC), which are classical algorithms in the field of speech recognition, are used to extract distinguishing features for speech classification and reconstruction. Besides, the phase-lock value (PLV) is introduced to enrich the feature information. We choose support vector machine (SVM), linear discriminant analysis (LDA), decision tree (DT), and LogitBoost (LB) for binary classification in several different cases. Two-channel selection (CS) based on Broca’s area and Wernicke’s area of the brain is also introduced in the paper. The highest imaginary speech decoding accuracy reaches 84.38%, which demonstrates the effectiveness of the feature engineering. In addition, the comparative analysis is conducted with deep learning methods specifically designed for small sample scenarios. This study offers a novel systematic approach for the research of Chinese speech imagination BCI.
{"title":"Imagined Chinese Speech Decoding Based on Initials and Finals From EEG Activity","authors":"Jingyu Gu, Jiuchuan Jiang, Qian Cai, Haixian Wang","doi":"10.1049/sil2/5451362","DOIUrl":"https://doi.org/10.1049/sil2/5451362","url":null,"abstract":"<p>Brain-computer interface (BCI) plays an important role in various fields, such as neuroscience, rehabilitation, and machine learning. The silent BCI, which can reconstruct inner speech from neural activity, holds great promise for aphasia patients. In this paper, we design an imagined Chinese speech experimental paradigm based on initials and finals and collect raw signals from eight healthy participants by using 64-channel scalp electroencephalograms. Linear predictive coding (LPC) and mel frequency cepstral coefficients (MFCC), which are classical algorithms in the field of speech recognition, are used to extract distinguishing features for speech classification and reconstruction. Besides, the phase-lock value (PLV) is introduced to enrich the feature information. We choose support vector machine (SVM), linear discriminant analysis (LDA), decision tree (DT), and LogitBoost (LB) for binary classification in several different cases. Two-channel selection (CS) based on Broca’s area and Wernicke’s area of the brain is also introduced in the paper. The highest imaginary speech decoding accuracy reaches 84.38%, which demonstrates the effectiveness of the feature engineering. In addition, the comparative analysis is conducted with deep learning methods specifically designed for small sample scenarios. This study offers a novel systematic approach for the research of Chinese speech imagination BCI.</p>","PeriodicalId":56301,"journal":{"name":"IET Signal Processing","volume":"2026 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/sil2/5451362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}