Rotating machinery fault diagnosis under varying operating conditions is challenged not only by domain shift and data scarcity but more critically by intrinsic algorithmic limitations in existing methods. Most current unsupervised domain adaptation (UDA) approaches rely on single-channel vibration signals, which lack the ability to capture interchannel dependencies and thus produce suboptimal feature representations. Furthermore, existing domain alignment strategies are typically coarse-grained, aligning only global distributions while neglecting channel-wise, hierarchical, and class-specific discrepancies. To overcome these challenges, this article proposes a novel method, named MCL-3WDA, which innovatively integrates contrastive learning (CL) with fine-grained domain alignment. First, a multiscale attention fusion feature extraction (MAFFE) layer is devised to construct more expressive and generalized feature representations through cross-scale interactions and hierarchical attention refinement. Second, drawing inspiration from CL, a multichannel contrastive learning strategy (MCL) is introduced to uncover latent associative dependencies embedded within multichannel signals, thereby substantially augmenting the model’s discriminative capacity for fault pattern recognition. Finally, a channel-wise, layer-wise, and class-wise domain alignment strategy (3WDA) is developed, which achieves precise cross-domain distribution alignment based on multikernel maximum mean discrepancy (MKMMD). Extensive experiments using two public datasets and one private dataset demonstrate that the proposed MCL-3WDA achieves superior performance with an average accuracy of 98.95% (ranging from 97.13% to 100.00%) across multiple cross-domain tasks, significantly outperforming existing methods.
{"title":"MCL-3WDA: Cross-Domain Fault Diagnosis for Rotating Machine via Multichannel Vibration Data Based on Contrastive Learning and Fine-Grained Domain Alignment","authors":"Ziyao Geng;Shihua Zhou;Tianzhuang Yu;Yulin Liu;Jianbo Ye;Ye Zhang;Zhaohui Ren","doi":"10.1109/JSEN.2025.3625562","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3625562","url":null,"abstract":"Rotating machinery fault diagnosis under varying operating conditions is challenged not only by domain shift and data scarcity but more critically by intrinsic algorithmic limitations in existing methods. Most current unsupervised domain adaptation (UDA) approaches rely on single-channel vibration signals, which lack the ability to capture interchannel dependencies and thus produce suboptimal feature representations. Furthermore, existing domain alignment strategies are typically coarse-grained, aligning only global distributions while neglecting channel-wise, hierarchical, and class-specific discrepancies. To overcome these challenges, this article proposes a novel method, named MCL-3WDA, which innovatively integrates contrastive learning (CL) with fine-grained domain alignment. First, a multiscale attention fusion feature extraction (MAFFE) layer is devised to construct more expressive and generalized feature representations through cross-scale interactions and hierarchical attention refinement. Second, drawing inspiration from CL, a multichannel contrastive learning strategy (MCL) is introduced to uncover latent associative dependencies embedded within multichannel signals, thereby substantially augmenting the model’s discriminative capacity for fault pattern recognition. Finally, a channel-wise, layer-wise, and class-wise domain alignment strategy (3WDA) is developed, which achieves precise cross-domain distribution alignment based on multikernel maximum mean discrepancy (MKMMD). Extensive experiments using two public datasets and one private dataset demonstrate that the proposed MCL-3WDA achieves superior performance with an average accuracy of 98.95% (ranging from 97.13% to 100.00%) across multiple cross-domain tasks, significantly outperforming existing methods.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 24","pages":"44994-45008"},"PeriodicalIF":4.3,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1109/JSEN.2025.3620211
Nishant Chaurasia;Prashant Kumar
The rapid growth of Internet of Things–wireless sensor networks (IoT-WSNs) brings numerous security challenges, particularly in environments where devices have limited resources and cannot sustain heavy or complex security methods. This article introduces clustering with residual energy and neighbor analysis-regression learning classifier (CREN-RLC), a lightweight, adaptive security framework explicitly designed for IoT-WSNs. The framework integrates CREN—which organizes sensor nodes into energy-aware clusters based on their residual energy and communication patterns—with a RLC that detects and adapts to intrusions in real time. While CREN ensures balanced energy utilization and efficient anomaly detection, RLC leverages historical data to recognize evolving attack types, thereby improving resilience against diverse threats. Implemented in Python 3.12 and evaluated on benchmark datasets, CREN-RLC achieved strong results, including a classification accuracy of 94.38%, precision of 93.41%, recall of 92.86%, and an F 1-score of 92.27%, outperforming conventional neural and deep learning (DL) approaches. Moreover, the framework maintained high network efficiency, achieving low packet drop rates, forwarding ratios of up to 0.982, and over 95.6% attack prevention accuracy even under heavy attack conditions. By combining energy-aware clustering with intelligent, lightweight detection, CREN-RLC delivers a scalable, energyefficient, and robust security solution suitable for real-world IoT-WSN applications, including smart cities, healthcare, industrial automation, and intelligent transportation.
{"title":"CREN-RLC: Clustering-Based Adaptive Security With Regression Learning for IoT-WSNs","authors":"Nishant Chaurasia;Prashant Kumar","doi":"10.1109/JSEN.2025.3620211","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3620211","url":null,"abstract":"The rapid growth of Internet of Things–wireless sensor networks (IoT-WSNs) brings numerous security challenges, particularly in environments where devices have limited resources and cannot sustain heavy or complex security methods. This article introduces clustering with residual energy and neighbor analysis-regression learning classifier (CREN-RLC), a lightweight, adaptive security framework explicitly designed for IoT-WSNs. The framework integrates CREN—which organizes sensor nodes into energy-aware clusters based on their residual energy and communication patterns—with a RLC that detects and adapts to intrusions in real time. While CREN ensures balanced energy utilization and efficient anomaly detection, RLC leverages historical data to recognize evolving attack types, thereby improving resilience against diverse threats. Implemented in Python 3.12 and evaluated on benchmark datasets, CREN-RLC achieved strong results, including a classification accuracy of 94.38%, precision of 93.41%, recall of 92.86%, and an F 1-score of 92.27%, outperforming conventional neural and deep learning (DL) approaches. Moreover, the framework maintained high network efficiency, achieving low packet drop rates, forwarding ratios of up to 0.982, and over 95.6% attack prevention accuracy even under heavy attack conditions. By combining energy-aware clustering with intelligent, lightweight detection, CREN-RLC delivers a scalable, energyefficient, and robust security solution suitable for real-world IoT-WSN applications, including smart cities, healthcare, industrial automation, and intelligent transportation.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 24","pages":"44984-44993"},"PeriodicalIF":4.3,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23DOI: 10.1109/JSEN.2025.3622306
D. S. Parihar;Ripul Ghosh
Wildlife conflict has become a serious concern due to increasing animal mortality from rail-induced accidents on railway tracks passing through the forest region. Monitoring the movement of wild animals near a railway track remains challenging due to the complex terrain, varied landscapes, and diverse biodiversity. This article presents an optimized hybrid 1-D convolutional neural network–bidirectional long short-term memory (CNN–BiLSTM) architecture to classify wildlife and other ground activities from seismic data generated in a forest environment. The proposed method automatically searches the high-level patterns sequentially from the multidomain features that are extracted from the principal modes of variational mode decomposition (VMD) of seismic signals. Furthermore, the classification results are compared with the standalone CNN and BiLSTM, where the proposed method outperforms with an average accuracy of 78.11 ± 4.28% and the lowest false detection rate.
{"title":"A Hybrid CNN–BiLSTM Approach for Wildlife Detection Nearby Railway Track in a Forest","authors":"D. S. Parihar;Ripul Ghosh","doi":"10.1109/JSEN.2025.3622306","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3622306","url":null,"abstract":"Wildlife conflict has become a serious concern due to increasing animal mortality from rail-induced accidents on railway tracks passing through the forest region. Monitoring the movement of wild animals near a railway track remains challenging due to the complex terrain, varied landscapes, and diverse biodiversity. This article presents an optimized hybrid 1-D convolutional neural network–bidirectional long short-term memory (CNN–BiLSTM) architecture to classify wildlife and other ground activities from seismic data generated in a forest environment. The proposed method automatically searches the high-level patterns sequentially from the multidomain features that are extracted from the principal modes of variational mode decomposition (VMD) of seismic signals. Furthermore, the classification results are compared with the standalone CNN and BiLSTM, where the proposed method outperforms with an average accuracy of 78.11 ± 4.28% and the lowest false detection rate.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 23","pages":"43507-43515"},"PeriodicalIF":4.3,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145652175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20DOI: 10.1109/JSEN.2025.3621436
Shuai Zhang;Yongchao Dong;Shihao Huang;Gaoping Xu;Ruizhou Wang;Han Wang;Mengyu Wang
Whispering gallery mode (WGM) microresonators have shown great potential for precise displacement measurement due to their compact size, ultrahigh sensitivity, and rapid response. However, traditional WGM-based displacement sensors are susceptible to environmental noise interference, resulting in reduced accuracy and too long signal demodulation time. To address these limitations, this article proposes a multimodal displacement sensing method for surface nanoscale axial photonics (SNAPs) resonators based on deep learning (DL) techniques. A 1-D convolutional neural network (1D-CNN) is used to extract features from the full spectrum, which significantly improves the noise immunity and sensing accuracy while avoiding the time-consuming spectral preprocessing. Experimental results show that the average prediction error is as low as 0.05 μm and the maximum error does not exceed 1.4 μm when using the 1D-CNN network for displacement measurements. This work provides an effective solution for fast, highly accurate and robust displacement sensing.
{"title":"Deep Learning-Based SNAP Microresonator Displacement Sensing Technology","authors":"Shuai Zhang;Yongchao Dong;Shihao Huang;Gaoping Xu;Ruizhou Wang;Han Wang;Mengyu Wang","doi":"10.1109/JSEN.2025.3621436","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3621436","url":null,"abstract":"Whispering gallery mode (WGM) microresonators have shown great potential for precise displacement measurement due to their compact size, ultrahigh sensitivity, and rapid response. However, traditional WGM-based displacement sensors are susceptible to environmental noise interference, resulting in reduced accuracy and too long signal demodulation time. To address these limitations, this article proposes a multimodal displacement sensing method for surface nanoscale axial photonics (SNAPs) resonators based on deep learning (DL) techniques. A 1-D convolutional neural network (1D-CNN) is used to extract features from the full spectrum, which significantly improves the noise immunity and sensing accuracy while avoiding the time-consuming spectral preprocessing. Experimental results show that the average prediction error is as low as 0.05 μm and the maximum error does not exceed 1.4 μm when using the 1D-CNN network for displacement measurements. This work provides an effective solution for fast, highly accurate and robust displacement sensing.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 23","pages":"43500-43506"},"PeriodicalIF":4.3,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/JSEN.2025.3578608
Xingqi Na;Zhijia Zhang;Huaici Zhao;Shujun Jia
In the field of autonomous driving, 3-D object detection is a crucial technology. Visual sensors are essential in this area and are widely used for 3-D object detection tasks. Recent advancements in monocular 3-D object detection have introduced depth estimation branches within the network architecture. This integration leverages predicted depth information to address the depth perception limitations inherent in monocular sensors, thereby improving detection accuracy. However, many existing methods prioritize lightweight designs at the expense of depth estimation accuracy. To enhance this accuracy, we propose the pseudo depth feature extraction (PDFE) module. This module extracts features by fusing adaptive scale information and simulating disparity, leading to more precise depth predictions. Additionally, we present a hybrid model that combines convolutional neural networks (CNNs) and Transformer architectures. The model employs diverse feature fusion strategies, including depth-guided fusion (DGF) and a Transformer decoder. It also utilizes a convolutional mixture transformer (CMT) encoder to enhance the representation of both local and global features. Building on these innovations, we developed the MonoICT network model and evaluated its performance using the KITTI dataset. Our experimental results indicate that our approach is competitive with recent state-of-the-art methods, outperforming them in the pedestrian and cyclist categories.
{"title":"MonoICT: A Monocular 3-D Object Detection Model Integrating CNN and Transformer","authors":"Xingqi Na;Zhijia Zhang;Huaici Zhao;Shujun Jia","doi":"10.1109/JSEN.2025.3578608","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3578608","url":null,"abstract":"In the field of autonomous driving, 3-D object detection is a crucial technology. Visual sensors are essential in this area and are widely used for 3-D object detection tasks. Recent advancements in monocular 3-D object detection have introduced depth estimation branches within the network architecture. This integration leverages predicted depth information to address the depth perception limitations inherent in monocular sensors, thereby improving detection accuracy. However, many existing methods prioritize lightweight designs at the expense of depth estimation accuracy. To enhance this accuracy, we propose the pseudo depth feature extraction (PDFE) module. This module extracts features by fusing adaptive scale information and simulating disparity, leading to more precise depth predictions. Additionally, we present a hybrid model that combines convolutional neural networks (CNNs) and Transformer architectures. The model employs diverse feature fusion strategies, including depth-guided fusion (DGF) and a Transformer decoder. It also utilizes a convolutional mixture transformer (CMT) encoder to enhance the representation of both local and global features. Building on these innovations, we developed the MonoICT network model and evaluated its performance using the KITTI dataset. Our experimental results indicate that our approach is competitive with recent state-of-the-art methods, outperforming them in the pedestrian and cyclist categories.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 21","pages":"40763-40774"},"PeriodicalIF":4.3,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/jsen.2025.3620154
Jacynthe Francoeur, Raman Kashyap, Samuel Kadoury, Jin Seob Kim, Iulian Iordachita
This paper presents a systematic evaluation of fiber optic shape sensing models for prostate needle interventions using a single needle embedded with a three-fiber optical frequency domain reflectometry (OFDR) sensor. Two reconstruction algorithms were evaluated: (1) Linear Interpolation Models (LIM), a geometric method that directly estimates local curvature and orientation from distributed strain measurements, and (2) the Lie-Group Theoretic Model (LGTM), a physics-informed elastic-rod model that globally fits curvature profiles while accounting for tissue-needle interaction. Using software-defined strain-point selection, both sparse and quasi-distributed sensing configurations were emulated from the same OFDR data. Experiments were conducted in homogeneous and two-layer gel phantoms, ex vivo tissue, and a whole-body cadaveric pig model. While the repeated-measures ANOVA did not detect any significant differences, the Friedman test analysis revealed statistically significant differences in RMSEs between LIM and LGTM (p < 0.05), with LIM outperforming LGTM in the ex vivo tissue scenario. LIM also achieved over 50-fold faster computation (< 1 ms vs. > 40 ms per shape), enabling real-time use. These findings highlight the trade-offs between model complexity, sensing density, computational load, and tissue variability, providing guidance for selecting shape-sensing strategies in clinical and robotic needle interventions.
{"title":"Evaluation of Fiber Optic Shape Sensing Models for Minimally Invasive Prostate Needle Procedures Using OFDR Data.","authors":"Jacynthe Francoeur, Raman Kashyap, Samuel Kadoury, Jin Seob Kim, Iulian Iordachita","doi":"10.1109/jsen.2025.3620154","DOIUrl":"10.1109/jsen.2025.3620154","url":null,"abstract":"<p><p>This paper presents a systematic evaluation of fiber optic shape sensing models for prostate needle interventions using a single needle embedded with a three-fiber optical frequency domain reflectometry (OFDR) sensor. Two reconstruction algorithms were evaluated: (1) Linear Interpolation Models (LIM), a geometric method that directly estimates local curvature and orientation from distributed strain measurements, and (2) the Lie-Group Theoretic Model (LGTM), a physics-informed elastic-rod model that globally fits curvature profiles while accounting for tissue-needle interaction. Using software-defined strain-point selection, both sparse and quasi-distributed sensing configurations were emulated from the same OFDR data. Experiments were conducted in homogeneous and two-layer gel phantoms, <i>ex vivo</i> tissue, and a whole-body cadaveric pig model. While the repeated-measures ANOVA did not detect any significant differences, the Friedman test analysis revealed statistically significant differences in RMSEs between LIM and LGTM (p < 0.05), with LIM outperforming LGTM in the <i>ex vivo</i> tissue scenario. LIM also achieved over 50-fold faster computation (< 1 ms vs. > 40 ms per shape), enabling real-time use. These findings highlight the trade-offs between model complexity, sensing density, computational load, and tissue variability, providing guidance for selecting shape-sensing strategies in clinical and robotic needle interventions.</p>","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":" ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12588074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145457292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/JSEN.2025.3620015
Simanta Das;Ripudaman Singh
Distributed clustering routing protocols are acknowledged as effective methods for minimizing and balancing energy consumption in wireless sensor networks (WSNs). In these protocols, the random distribution of cluster heads (CHs) results in the presence of several isolated sensor nodes (ISNs). In general, an ISN consumes more energy than a cluster member (CM) sensor node (SN). Therefore, ISNs located far from the sink can significantly reduce the network lifetime. In this article, we propose a relay cluster head based traffic and energy-aware routing (RCHBTEAR) protocol for heterogeneous WSNs. The RCHBTEAR protocol improves the network lifetime by reducing the energy consumption of SNs. For this, we consider both the energy and traffic heterogeneities of SNs during the election of CHs. Furthermore, we select relay CHs (RCHs) from the existing CHs to reduce the energy consumption of ISNs located far from the sink. Furthermore, we propose an optimized super round (SR) technique that eliminates the need for reclustering in every round. Simulation results show that the RCHBTEAR protocol significantly improves the network lifetime.
{"title":"A Relay Cluster Head Based Traffic and Energy-Aware Routing Protocol for Heterogeneous WSNs","authors":"Simanta Das;Ripudaman Singh","doi":"10.1109/JSEN.2025.3620015","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3620015","url":null,"abstract":"Distributed clustering routing protocols are acknowledged as effective methods for minimizing and balancing energy consumption in wireless sensor networks (WSNs). In these protocols, the random distribution of cluster heads (CHs) results in the presence of several isolated sensor nodes (ISNs). In general, an ISN consumes more energy than a cluster member (CM) sensor node (SN). Therefore, ISNs located far from the sink can significantly reduce the network lifetime. In this article, we propose a relay cluster head based traffic and energy-aware routing (RCHBTEAR) protocol for heterogeneous WSNs. The RCHBTEAR protocol improves the network lifetime by reducing the energy consumption of SNs. For this, we consider both the energy and traffic heterogeneities of SNs during the election of CHs. Furthermore, we select relay CHs (RCHs) from the existing CHs to reduce the energy consumption of ISNs located far from the sink. Furthermore, we propose an optimized super round (SR) technique that eliminates the need for reclustering in every round. Simulation results show that the RCHBTEAR protocol significantly improves the network lifetime.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 22","pages":"42350-42363"},"PeriodicalIF":4.3,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145500463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-15DOI: 10.1109/JSEN.2025.3619651
Yang Yang;Yue Song;Xiaochun Shang;Qingshuang Mu;Beichen Li;Yue Lang
Multisensor fusion combines the benefits of each sensor, resulting in a thorough and reliable motion recognition even in challenging measurement environments. Meanwhile, even with the environmental robustness attained through sensor integration, the recognition model continues to face challenges in cross-target scenarios. In summary, the recognition model is consistently trained using the measurement dataset; however, its performance may decline when applied to unfamiliar subjects. This article highlights this issue and presents a cross-target human motion recognition model for the radar–camera measurement system. We have developed a modal-specific semantic interaction mechanism that allows the feature extractor to recognize different individuals, thereby removing identity information during the feature extraction process. Furthermore, we have also put forward a meta-prototype learning scheme that suitably adjusts the probability distribution to enhance the generalization capability of the recognition model. To emphasize, the proposed model is implemented without altering the primary designed network architecture, indicating that there is no additional computational burden during testing. In comparison with five multimodal learning algorithms, we have validated the effectiveness of our model, highlighting that it surpasses previous radar–video-based methods by more than 5% in recognition accuracy. Through experiments using public datasets under different dataset conditions, we verified the generalization ability of our model. Ablation studies and additional parameter studies have been conducted, enabling a thorough examination of each design.
{"title":"Human Motion Recognition Based on Videos and Radar Spectrograms in Cross-Target Scenarios","authors":"Yang Yang;Yue Song;Xiaochun Shang;Qingshuang Mu;Beichen Li;Yue Lang","doi":"10.1109/JSEN.2025.3619651","DOIUrl":"https://doi.org/10.1109/JSEN.2025.3619651","url":null,"abstract":"Multisensor fusion combines the benefits of each sensor, resulting in a thorough and reliable motion recognition even in challenging measurement environments. Meanwhile, even with the environmental robustness attained through sensor integration, the recognition model continues to face challenges in cross-target scenarios. In summary, the recognition model is consistently trained using the measurement dataset; however, its performance may decline when applied to unfamiliar subjects. This article highlights this issue and presents a cross-target human motion recognition model for the radar–camera measurement system. We have developed a modal-specific semantic interaction mechanism that allows the feature extractor to recognize different individuals, thereby removing identity information during the feature extraction process. Furthermore, we have also put forward a meta-prototype learning scheme that suitably adjusts the probability distribution to enhance the generalization capability of the recognition model. To emphasize, the proposed model is implemented without altering the primary designed network architecture, indicating that there is no additional computational burden during testing. In comparison with five multimodal learning algorithms, we have validated the effectiveness of our model, highlighting that it surpasses previous radar–video-based methods by more than 5% in recognition accuracy. Through experiments using public datasets under different dataset conditions, we verified the generalization ability of our model. Ablation studies and additional parameter studies have been conducted, enabling a thorough examination of each design.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 22","pages":"42400-42412"},"PeriodicalIF":4.3,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145500454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}