Pub Date : 2025-04-01DOI: 10.1109/JSTSP.2025.3554136
Kevin Arias;Pablo Gomez;Carlos Hinojosa;Juan Carlos Niebles;Henry Arguello
Due to the advancements in deep image generation models, ensuring digital image authenticity, integrity, and confidentiality becomes challenging. While many active image manipulation detection methods embed digital signatures post-image acquisition, the vulnerabilities persist if unauthorized access occurs before this embedding or the embedding software is compromised. This work introduces an optics-based active image manipulation detection approach that learns the structure of a color-coded aperture (CCA), which encodes the light within the camera and embeds a highly reliable and imperceptible optical signature before image acquisition. We optimize our camera model with our proposed image manipulation detection network via end-to-end training. We validate our approach with extensive simulations and a proof-of-concept optical system. The results show that our method outperforms the state-of-the-art active image manipulation detection techniques.
{"title":"Protecting Images From Manipulations With Deep Optical Signatures","authors":"Kevin Arias;Pablo Gomez;Carlos Hinojosa;Juan Carlos Niebles;Henry Arguello","doi":"10.1109/JSTSP.2025.3554136","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3554136","url":null,"abstract":"Due to the advancements in deep image generation models, ensuring digital image authenticity, integrity, and confidentiality becomes challenging. While many active image manipulation detection methods embed digital signatures post-image acquisition, the vulnerabilities persist if unauthorized access occurs before this embedding or the embedding software is compromised. This work introduces an optics-based active image manipulation detection approach that learns the structure of a color-coded aperture (CCA), which encodes the light within the camera and embeds a highly reliable and imperceptible optical signature before image acquisition. We optimize our camera model with our proposed image manipulation detection network via end-to-end training. We validate our approach with extensive simulations and a proof-of-concept optical system. The results show that our method outperforms the state-of-the-art active image manipulation detection techniques.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"549-558"},"PeriodicalIF":8.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-31DOI: 10.1109/JSTSP.2025.3555067
Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa
Indoor localization determines an object's position within enclosed spaces, with applications in navigation, asset tracking, robotics, and context-aware computing. Technologies range from WiFi and Bluetooth to advanced systems like Massive Multiple Input-Multiple Output (MIMO). MIMO, initially designed to enhance wireless communication, is now key in indoor positioning due to its spatial diversity and multipath propagation. This study integrates MIMO-based indoor localization with Hybrid Neural Networks (HyNN), converting structured datasets into synthetic images using TINTO. This research marks the first application of HyNNs using synthetic images for MIMO-based indoor localization. Our key contributions include: (i) adapting TINTO for regression problems; (ii) using synthetic images as input data for our model; (iii) designing a novel HyNN with a Convolutional Neural Network branch for synthetic images and an MultiLayer Percetron branch for tidy data; and (iv) demonstrating improved results and metrics compared to prior literature. These advancements highlight the potential of HyNNs in enhancing the accuracy and efficiency of indoor localization systems.
{"title":"MIMO-Based Indoor Localisation With Hybrid Neural Networks: Leveraging Synthetic Images From Tidy Data for Enhanced Deep Learning","authors":"Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa","doi":"10.1109/JSTSP.2025.3555067","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3555067","url":null,"abstract":"Indoor localization determines an object's position within enclosed spaces, with applications in navigation, asset tracking, robotics, and context-aware computing. Technologies range from WiFi and Bluetooth to advanced systems like Massive Multiple Input-Multiple Output (MIMO). MIMO, initially designed to enhance wireless communication, is now key in indoor positioning due to its spatial diversity and multipath propagation. This study integrates MIMO-based indoor localization with Hybrid Neural Networks (HyNN), converting structured datasets into synthetic images using TINTO. This research marks the first application of HyNNs using synthetic images for MIMO-based indoor localization. Our key contributions include: (i) adapting TINTO for regression problems; (ii) using synthetic images as input data for our model; (iii) designing a novel HyNN with a Convolutional Neural Network branch for synthetic images and an MultiLayer Percetron branch for tidy data; and (iv) demonstrating improved results and metrics compared to prior literature. These advancements highlight the potential of HyNNs in enhancing the accuracy and efficiency of indoor localization systems.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"559-571"},"PeriodicalIF":8.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10946146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces “Listen, Chat, and Remix” (LCR), a novel multimodal sound remixer that controls each sound source in a mixture based on user-provided text instructions. LCR distinguishes itself with a user-friendly text interface and its unique ability to remix multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for remixing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles filtered components back to the desired output. We developed a 160-hour dataset with over 100 k mixtures, including speech and various audio sources, along with text prompts for diverse remixing tasks including extraction, removal, and volume control of single or multiple sources. Our experiments demonstrate significant improvements in signal quality across all remixing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.
{"title":"Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience","authors":"Xilin Jiang;Cong Han;Yinghao Aaron Li;Nima Mesgarani","doi":"10.1109/JSTSP.2025.3570103","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3570103","url":null,"abstract":"In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces “Listen, Chat, and Remix” (LCR), a novel multimodal sound remixer that controls each sound source in a mixture based on user-provided text instructions. LCR distinguishes itself with a user-friendly text interface and its unique ability to remix multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for remixing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles filtered components back to the desired output. We developed a 160-hour dataset with over 100 k mixtures, including speech and various audio sources, along with text prompts for diverse remixing tasks including extraction, removal, and volume control of single or multiple sources. Our experiments demonstrate significant improvements in signal quality across all remixing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"635-645"},"PeriodicalIF":8.7,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-19DOI: 10.1109/JSTSP.2025.3552918
Dahu Wang;Chang Liu
Independent component analysis (ICA) is widely applied in remote sensing signal processing. Among various ICA algorithms, the modified semidefinite programming (MSDP) algorithm stands out. However, the efficacy and safety of MSDP depend on the distribution of data. Our research found that MSDP is better suited for handling data with a super-Gaussian distribution. As real-world data usually exhibit a combination of sub-Gaussian and super-Gaussian distributions, MSDP faces challenges in accurately extracting all independent components (ICs). To solve this problem, we conducted a comprehensive analysis of the MSDP algorithm and introduced an enhanced version, the sign-enhanced MSDP (SMSDP) algorithm. By incorporating the sign function into the projected Hessian matrix, SMSDP enables the algorithm to effectively extract ICs from data characterized by a mixture of sub-Gaussian and super-Gaussian distributions. Furthermore, we provided a detailed comparison with MSDP to illustrate why SMSDP can achieve more accurate eigenpairs. Some experiments have demonstrated the effectiveness of SMSDP. The experiments in blind separation of image/sound, radar clutter removal, and real hyperspectral feature extraction also show the superiority of SMSDP in improving the accuracy of IC extraction.
{"title":"Sign-Enhanced Semidefinite Programming Algorithm and its Application to Independent Component Analysis","authors":"Dahu Wang;Chang Liu","doi":"10.1109/JSTSP.2025.3552918","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3552918","url":null,"abstract":"Independent component analysis (ICA) is widely applied in remote sensing signal processing. Among various ICA algorithms, the modified semidefinite programming (MSDP) algorithm stands out. However, the efficacy and safety of MSDP depend on the distribution of data. Our research found that MSDP is better suited for handling data with a super-Gaussian distribution. As real-world data usually exhibit a combination of sub-Gaussian and super-Gaussian distributions, MSDP faces challenges in accurately extracting all independent components (ICs). To solve this problem, we conducted a comprehensive analysis of the MSDP algorithm and introduced an enhanced version, the sign-enhanced MSDP (SMSDP) algorithm. By incorporating the sign function into the projected Hessian matrix, SMSDP enables the algorithm to effectively extract ICs from data characterized by a mixture of sub-Gaussian and super-Gaussian distributions. Furthermore, we provided a detailed comparison with MSDP to illustrate why SMSDP can achieve more accurate eigenpairs. Some experiments have demonstrated the effectiveness of SMSDP. The experiments in blind separation of image/sound, radar clutter removal, and real hyperspectral feature extraction also show the superiority of SMSDP in improving the accuracy of IC extraction.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"536-548"},"PeriodicalIF":8.7,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-16DOI: 10.1109/JSTSP.2025.3566919
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2025.3566919","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3566919","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006306","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144072872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-16DOI: 10.1109/JSTSP.2025.3569430
Md Sahidullah;Hye-jin Shim;Rosa Gonzalez Hautamäki;Tomi H. Kinnunen
The widespread adoption of deep-learning models in data-driven applications has drawn attention to thepotential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores “shortcut learning” or “Clever Hans effect” in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is demonstrated through experiments on audio anti-spoofing and speaker verification tasks using both statistical models and deep neural networks. The insights gained from this study have broader implications for tackling biases in other domains and advancing the field of explainable artificial intelligence. The open-source implementation of the proposed method, along with demonstrations of interventional and observational case analyses.
{"title":"Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics","authors":"Md Sahidullah;Hye-jin Shim;Rosa Gonzalez Hautamäki;Tomi H. Kinnunen","doi":"10.1109/JSTSP.2025.3569430","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3569430","url":null,"abstract":"The widespread adoption of deep-learning models in data-driven applications has drawn attention to thepotential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores “shortcut learning” or “Clever Hans effect” in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is demonstrated through experiments on audio anti-spoofing and speaker verification tasks using both statistical models and deep neural networks. The insights gained from this study have broader implications for tackling biases in other domains and advancing the field of explainable artificial intelligence. The open-source implementation of the proposed method, along with demonstrations of interventional and observational case analyses.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 7","pages":"1542-1557"},"PeriodicalIF":13.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145860189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-16DOI: 10.1109/JSTSP.2025.3566895
{"title":"IEEE Signal Processing Society Publication Information","authors":"","doi":"10.1109/JSTSP.2025.3566895","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3566895","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-13DOI: 10.1109/JSTSP.2025.3549952
Baptiste Chatelier;Vincent Corlay;Matthieu Crussière;Luc Le Magoarou
Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel mapping can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel mapping learning a non-trivial problem. Indeed, it is well known that this mapping is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its mapping learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm.
{"title":"Model-Based Learning for Multi-Antenna Multi-Frequency Location-to-Channel Mapping","authors":"Baptiste Chatelier;Vincent Corlay;Matthieu Crussière;Luc Le Magoarou","doi":"10.1109/JSTSP.2025.3549952","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3549952","url":null,"abstract":"Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel mapping can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel mapping learning a non-trivial problem. Indeed, it is well known that this mapping is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its mapping learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"520-535"},"PeriodicalIF":8.7,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1109/JSTSP.2025.3569446
Senthil Murugan Nagarajan;Ganesh Gopal Devarajan;Asha Jerlin M;Daniel Arockiam;Ali Kashif Bashir;Maryam M. Al Dabel
As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.
{"title":"Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering","authors":"Senthil Murugan Nagarajan;Ganesh Gopal Devarajan;Asha Jerlin M;Daniel Arockiam;Ali Kashif Bashir;Maryam M. Al Dabel","doi":"10.1109/JSTSP.2025.3569446","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3569446","url":null,"abstract":"As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"613-622"},"PeriodicalIF":8.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-10DOI: 10.1109/JSTSP.2025.3549950
Hanxiao Lu;Zeyu Huang;Ren Wang
Convolutional neural networks (CNNs), one of the key architectures of deep learning models, have achieved superior performance on many machine learning tasks such as image classification, video recognition, and power systems. Despite their success, CNNs can be easily contaminated by natural noises and artificially injected noises such as backdoor attacks. In this paper, we propose a robust recovery method to remove the noise from the potentially contaminated CNNs and provide an exact recovery guarantee on one-hidden-layer non-overlapping CNNs with the rectified linear unit (ReLU) activation function. Our theoretical results show that both CNNs' weights and biases can be exactly recovered under the overparameterization setting with some mild assumptions. The experimental results demonstrate the correctness of the proofs and the effectiveness of the method in both the synthetic environment and the practical neural network setting. Our results also indicate that the proposed method can be extended to multiple-layer CNNs and potentially serve as a defense strategy against backdoor attacks.
{"title":"Purification of Contaminated Convolutional Neural Networks via Robust Recovery: An Approach With Theoretical Guarantee in One-Hidden-Layer Case","authors":"Hanxiao Lu;Zeyu Huang;Ren Wang","doi":"10.1109/JSTSP.2025.3549950","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3549950","url":null,"abstract":"Convolutional neural networks (CNNs), one of the key architectures of deep learning models, have achieved superior performance on many machine learning tasks such as image classification, video recognition, and power systems. Despite their success, CNNs can be easily contaminated by natural noises and artificially injected noises such as backdoor attacks. In this paper, we propose a robust recovery method to remove the noise from the potentially contaminated CNNs and provide an exact recovery guarantee on one-hidden-layer non-overlapping CNNs with the rectified linear unit (ReLU) activation function. Our theoretical results show that both CNNs' weights and biases can be exactly recovered under the overparameterization setting with some mild assumptions. The experimental results demonstrate the correctness of the proofs and the effectiveness of the method in both the synthetic environment and the practical neural network setting. Our results also indicate that the proposed method can be extended to multiple-layer CNNs and potentially serve as a defense strategy against backdoor attacks.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"507-519"},"PeriodicalIF":8.7,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}