首页 > 最新文献

IEEE Journal of Selected Topics in Signal Processing最新文献

英文 中文
Protecting Images From Manipulations With Deep Optical Signatures 保护图像免受操纵与深光学签名
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-01 DOI: 10.1109/JSTSP.2025.3554136
Kevin Arias;Pablo Gomez;Carlos Hinojosa;Juan Carlos Niebles;Henry Arguello
Due to the advancements in deep image generation models, ensuring digital image authenticity, integrity, and confidentiality becomes challenging. While many active image manipulation detection methods embed digital signatures post-image acquisition, the vulnerabilities persist if unauthorized access occurs before this embedding or the embedding software is compromised. This work introduces an optics-based active image manipulation detection approach that learns the structure of a color-coded aperture (CCA), which encodes the light within the camera and embeds a highly reliable and imperceptible optical signature before image acquisition. We optimize our camera model with our proposed image manipulation detection network via end-to-end training. We validate our approach with extensive simulations and a proof-of-concept optical system. The results show that our method outperforms the state-of-the-art active image manipulation detection techniques.
由于深度图像生成模型的进步,确保数字图像的真实性、完整性和保密性变得具有挑战性。虽然许多主动图像处理检测方法在图像采集后嵌入数字签名,但如果在嵌入之前发生未经授权的访问或嵌入软件被破坏,则漏洞仍然存在。这项工作介绍了一种基于光学的主动图像处理检测方法,该方法学习了颜色编码孔径(CCA)的结构,该结构对相机内的光进行编码,并在图像采集之前嵌入高度可靠且难以察觉的光学签名。我们通过端到端训练,用我们提出的图像处理检测网络来优化我们的相机模型。我们通过广泛的模拟和概念验证光学系统来验证我们的方法。结果表明,我们的方法优于最先进的主动图像处理检测技术。
{"title":"Protecting Images From Manipulations With Deep Optical Signatures","authors":"Kevin Arias;Pablo Gomez;Carlos Hinojosa;Juan Carlos Niebles;Henry Arguello","doi":"10.1109/JSTSP.2025.3554136","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3554136","url":null,"abstract":"Due to the advancements in deep image generation models, ensuring digital image authenticity, integrity, and confidentiality becomes challenging. While many active image manipulation detection methods embed digital signatures post-image acquisition, the vulnerabilities persist if unauthorized access occurs before this embedding or the embedding software is compromised. This work introduces an optics-based active image manipulation detection approach that learns the structure of a color-coded aperture (CCA), which encodes the light within the camera and embeds a highly reliable and imperceptible optical signature before image acquisition. We optimize our camera model with our proposed image manipulation detection network via end-to-end training. We validate our approach with extensive simulations and a proof-of-concept optical system. The results show that our method outperforms the state-of-the-art active image manipulation detection techniques.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"549-558"},"PeriodicalIF":8.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIMO-Based Indoor Localisation With Hybrid Neural Networks: Leveraging Synthetic Images From Tidy Data for Enhanced Deep Learning 基于mimo的室内定位与混合神经网络:利用来自整洁数据的合成图像增强深度学习
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-31 DOI: 10.1109/JSTSP.2025.3555067
Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa
Indoor localization determines an object's position within enclosed spaces, with applications in navigation, asset tracking, robotics, and context-aware computing. Technologies range from WiFi and Bluetooth to advanced systems like Massive Multiple Input-Multiple Output (MIMO). MIMO, initially designed to enhance wireless communication, is now key in indoor positioning due to its spatial diversity and multipath propagation. This study integrates MIMO-based indoor localization with Hybrid Neural Networks (HyNN), converting structured datasets into synthetic images using TINTO. This research marks the first application of HyNNs using synthetic images for MIMO-based indoor localization. Our key contributions include: (i) adapting TINTO for regression problems; (ii) using synthetic images as input data for our model; (iii) designing a novel HyNN with a Convolutional Neural Network branch for synthetic images and an MultiLayer Percetron branch for tidy data; and (iv) demonstrating improved results and metrics compared to prior literature. These advancements highlight the potential of HyNNs in enhancing the accuracy and efficiency of indoor localization systems.
室内定位确定物体在封闭空间中的位置,应用于导航、资产跟踪、机器人和上下文感知计算。技术范围从WiFi和蓝牙到先进的系统,如大规模多输入多输出(MIMO)。MIMO最初是为了增强无线通信而设计的,现在由于其空间多样性和多径传播而成为室内定位的关键。本研究将基于mimo的室内定位与混合神经网络(HyNN)相结合,利用TINTO将结构化数据集转换为合成图像。这项研究标志着HyNNs首次使用合成图像进行基于mimo的室内定位。我们的主要贡献包括:(i)调整TINTO来解决回归问题;(ii)使用合成图像作为模型的输入数据;(iii)设计一种新颖的HyNN,其中卷积神经网络分支用于合成图像,多层感知器分支用于整理数据;(iv)与之前的文献相比,证明了改进的结果和指标。这些进步突出了HyNNs在提高室内定位系统的准确性和效率方面的潜力。
{"title":"MIMO-Based Indoor Localisation With Hybrid Neural Networks: Leveraging Synthetic Images From Tidy Data for Enhanced Deep Learning","authors":"Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa","doi":"10.1109/JSTSP.2025.3555067","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3555067","url":null,"abstract":"Indoor localization determines an object's position within enclosed spaces, with applications in navigation, asset tracking, robotics, and context-aware computing. Technologies range from WiFi and Bluetooth to advanced systems like Massive Multiple Input-Multiple Output (MIMO). MIMO, initially designed to enhance wireless communication, is now key in indoor positioning due to its spatial diversity and multipath propagation. This study integrates MIMO-based indoor localization with Hybrid Neural Networks (HyNN), converting structured datasets into synthetic images using TINTO. This research marks the first application of HyNNs using synthetic images for MIMO-based indoor localization. Our key contributions include: (i) adapting TINTO for regression problems; (ii) using synthetic images as input data for our model; (iii) designing a novel HyNN with a Convolutional Neural Network branch for synthetic images and an MultiLayer Percetron branch for tidy data; and (iv) demonstrating improved results and metrics compared to prior literature. These advancements highlight the potential of HyNNs in enhancing the accuracy and efficiency of indoor localization systems.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"559-571"},"PeriodicalIF":8.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10946146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience 听,聊天,和混音:文本引导音景混音增强听觉体验
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-29 DOI: 10.1109/JSTSP.2025.3570103
Xilin Jiang;Cong Han;Yinghao Aaron Li;Nima Mesgarani
In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces “Listen, Chat, and Remix” (LCR), a novel multimodal sound remixer that controls each sound source in a mixture based on user-provided text instructions. LCR distinguishes itself with a user-friendly text interface and its unique ability to remix multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for remixing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles filtered components back to the desired output. We developed a 160-hour dataset with over 100 k mixtures, including speech and various audio sources, along with text prompts for diverse remixing tasks including extraction, removal, and volume control of single or multiple sources. Our experiments demonstrate significant improvements in signal quality across all remixing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.
在日常生活中,我们会遇到各种各样的声音,有令人满意的,也有不受欢迎的,我们对它们的存在和音量的控制是有限的。我们的工作介绍了“听、聊和混音”(LCR),这是一种新颖的多模态混音器,可以根据用户提供的文本指令控制混音中的每个声源。LCR区分自己与用户友好的文本界面和其独特的能力,重新混合多个声源同时在一个混合物,而不需要分开他们。用户输入开放词汇的文本提示,这些提示由一个大型语言模型进行解释,以创建一个语义过滤器,用于重新混合声音。然后,系统将混合物分解为其组件,应用语义过滤器,并将过滤后的组件重新组装回所需的输出。我们开发了一个160小时的数据集,其中包含超过100 k的混合,包括语音和各种音频源,以及用于各种混合任务的文本提示,包括提取,移除和单个或多个源的音量控制。我们的实验表明,在所有混音任务中,信号质量都有显著改善,并且在具有不同数量和类型声源的零射击场景中具有稳健的性能。
{"title":"Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience","authors":"Xilin Jiang;Cong Han;Yinghao Aaron Li;Nima Mesgarani","doi":"10.1109/JSTSP.2025.3570103","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3570103","url":null,"abstract":"In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces “Listen, Chat, and Remix” (LCR), a novel multimodal sound remixer that controls each sound source in a mixture based on user-provided text instructions. LCR distinguishes itself with a user-friendly text interface and its unique ability to remix multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for remixing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles filtered components back to the desired output. We developed a 160-hour dataset with over 100 k mixtures, including speech and various audio sources, along with text prompts for diverse remixing tasks including extraction, removal, and volume control of single or multiple sources. Our experiments demonstrate significant improvements in signal quality across all remixing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"635-645"},"PeriodicalIF":8.7,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sign-Enhanced Semidefinite Programming Algorithm and its Application to Independent Component Analysis 符号增强半定规划算法及其在独立分量分析中的应用
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-19 DOI: 10.1109/JSTSP.2025.3552918
Dahu Wang;Chang Liu
Independent component analysis (ICA) is widely applied in remote sensing signal processing. Among various ICA algorithms, the modified semidefinite programming (MSDP) algorithm stands out. However, the efficacy and safety of MSDP depend on the distribution of data. Our research found that MSDP is better suited for handling data with a super-Gaussian distribution. As real-world data usually exhibit a combination of sub-Gaussian and super-Gaussian distributions, MSDP faces challenges in accurately extracting all independent components (ICs). To solve this problem, we conducted a comprehensive analysis of the MSDP algorithm and introduced an enhanced version, the sign-enhanced MSDP (SMSDP) algorithm. By incorporating the sign function into the projected Hessian matrix, SMSDP enables the algorithm to effectively extract ICs from data characterized by a mixture of sub-Gaussian and super-Gaussian distributions. Furthermore, we provided a detailed comparison with MSDP to illustrate why SMSDP can achieve more accurate eigenpairs. Some experiments have demonstrated the effectiveness of SMSDP. The experiments in blind separation of image/sound, radar clutter removal, and real hyperspectral feature extraction also show the superiority of SMSDP in improving the accuracy of IC extraction.
独立分量分析在遥感信号处理中有着广泛的应用。在各种独立分量分析算法中,改进半定规划算法(MSDP)尤为突出。然而,MSDP的有效性和安全性取决于数据的分布。我们的研究发现MSDP更适合处理具有超高斯分布的数据。由于实际数据通常表现为亚高斯和超高斯分布的组合,MSDP在准确提取所有独立分量(ic)方面面临挑战。为了解决这个问题,我们对MSDP算法进行了全面的分析,并引入了一个增强版本,即符号增强MSDP (SMSDP)算法。通过将符号函数合并到投影的Hessian矩阵中,SMSDP使算法能够有效地从亚高斯和超高斯分布混合的数据中提取ic。此外,我们提供了与MSDP的详细比较,以说明为什么SMSDP可以获得更准确的特征对。一些实验证明了SMSDP的有效性。在图像/声音盲分离、雷达杂波去除、真实高光谱特征提取等方面的实验也显示了SMSDP在提高IC提取精度方面的优势。
{"title":"Sign-Enhanced Semidefinite Programming Algorithm and its Application to Independent Component Analysis","authors":"Dahu Wang;Chang Liu","doi":"10.1109/JSTSP.2025.3552918","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3552918","url":null,"abstract":"Independent component analysis (ICA) is widely applied in remote sensing signal processing. Among various ICA algorithms, the modified semidefinite programming (MSDP) algorithm stands out. However, the efficacy and safety of MSDP depend on the distribution of data. Our research found that MSDP is better suited for handling data with a super-Gaussian distribution. As real-world data usually exhibit a combination of sub-Gaussian and super-Gaussian distributions, MSDP faces challenges in accurately extracting all independent components (ICs). To solve this problem, we conducted a comprehensive analysis of the MSDP algorithm and introduced an enhanced version, the sign-enhanced MSDP (SMSDP) algorithm. By incorporating the sign function into the projected Hessian matrix, SMSDP enables the algorithm to effectively extract ICs from data characterized by a mixture of sub-Gaussian and super-Gaussian distributions. Furthermore, we provided a detailed comparison with MSDP to illustrate why SMSDP can achieve more accurate eigenpairs. Some experiments have demonstrated the effectiveness of SMSDP. The experiments in blind separation of image/sound, radar clutter removal, and real hyperspectral feature extraction also show the superiority of SMSDP in improving the accuracy of IC extraction.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"536-548"},"PeriodicalIF":8.7,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Signal Processing Society Information IEEE信号处理学会信息
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-16 DOI: 10.1109/JSTSP.2025.3566919
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2025.3566919","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3566919","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006306","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144072872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics 二元分类器黑盒中的捷径学习:语音反欺骗和生物识别的应用
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-16 DOI: 10.1109/JSTSP.2025.3569430
Md Sahidullah;Hye-jin Shim;Rosa Gonzalez Hautamäki;Tomi H. Kinnunen
The widespread adoption of deep-learning models in data-driven applications has drawn attention to thepotential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores “shortcut learning” or “Clever Hans effect” in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is demonstrated through experiments on audio anti-spoofing and speaker verification tasks using both statistical models and deep neural networks. The insights gained from this study have broader implications for tackling biases in other domains and advancing the field of explainable artificial intelligence. The open-source implementation of the proposed method, along with demonstrations of interventional and observational case analyses.
深度学习模型在数据驱动应用中的广泛应用引起了人们对与有偏见的数据集和模型相关的潜在风险的关注。数据集和模型中被忽视或隐藏的偏差可能导致意想不到的结果。本研究解决了数据集偏差的挑战,并探索了二元分类器中的“捷径学习”或“聪明汉斯效应”。我们提出了一个新的框架来分析黑箱分类器,并检查训练和测试数据对分类器分数的影响。我们的框架结合了干预和观察视角,采用线性混合效应模型进行事后分析。通过评估错误率之外的分类器性能,我们的目标是提供对有偏差数据集的见解,并全面了解它们对分类器行为的影响。通过使用统计模型和深度神经网络进行音频反欺骗和说话人验证任务的实验,证明了我们方法的有效性。从这项研究中获得的见解对解决其他领域的偏见和推进可解释人工智能领域具有更广泛的意义。提出的方法的开源实现,以及干预性和观察性案例分析的演示。
{"title":"Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics","authors":"Md Sahidullah;Hye-jin Shim;Rosa Gonzalez Hautamäki;Tomi H. Kinnunen","doi":"10.1109/JSTSP.2025.3569430","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3569430","url":null,"abstract":"The widespread adoption of deep-learning models in data-driven applications has drawn attention to thepotential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores “shortcut learning” or “Clever Hans effect” in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is demonstrated through experiments on audio anti-spoofing and speaker verification tasks using both statistical models and deep neural networks. The insights gained from this study have broader implications for tackling biases in other domains and advancing the field of explainable artificial intelligence. The open-source implementation of the proposed method, along with demonstrations of interventional and observational case analyses.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 7","pages":"1542-1557"},"PeriodicalIF":13.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145860189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Signal Processing Society Publication Information IEEE信号处理学会出版物信息
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-16 DOI: 10.1109/JSTSP.2025.3566895
{"title":"IEEE Signal Processing Society Publication Information","authors":"","doi":"10.1109/JSTSP.2025.3566895","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3566895","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Based Learning for Multi-Antenna Multi-Frequency Location-to-Channel Mapping 基于模型学习的多天线多频定位信道映射
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-13 DOI: 10.1109/JSTSP.2025.3549952
Baptiste Chatelier;Vincent Corlay;Matthieu Crussière;Luc Le Magoarou
Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel mapping can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel mapping learning a non-trivial problem. Indeed, it is well known that this mapping is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its mapping learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm.
对传播信道的多年研究表明,位置与相关的通信信道响应之间存在密切关系。因此,可以设想使用神经网络来学习位置到信道的映射。隐式神经表征(INR)文献表明,经典的神经结构倾向于学习低频内容,使得位置到通道映射学习成为一个非平凡的问题。事实上,众所周知,这种映射是一个随位置迅速变化的函数,按波长的顺序变化。本文利用基于模型的机器学习范式从传播通道模型中派生出特定问题的神经体系结构。由此产生的结构有效地克服了光谱偏差问题。它只学习低频稀疏校正项,激活高频成分字典。在真实的合成数据上对比经典的INR体系结构进行了评估,显示出更好的准确性。它的映射学习性能是基于近似通道模型来解释的,突出了基于模型的机器学习范式的可解释性。
{"title":"Model-Based Learning for Multi-Antenna Multi-Frequency Location-to-Channel Mapping","authors":"Baptiste Chatelier;Vincent Corlay;Matthieu Crussière;Luc Le Magoarou","doi":"10.1109/JSTSP.2025.3549952","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3549952","url":null,"abstract":"Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel mapping can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel mapping learning a non-trivial problem. Indeed, it is well known that this mapping is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its mapping learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"520-535"},"PeriodicalIF":8.7,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering 基于变压器模型的深度多源视觉融合视频内容过滤
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-12 DOI: 10.1109/JSTSP.2025.3569446
Senthil Murugan Nagarajan;Ganesh Gopal Devarajan;Asha Jerlin M;Daniel Arockiam;Ali Kashif Bashir;Maryam M. Al Dabel
As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.
随着YouTube内容的不断增长,先进的过滤系统对于确保安全和愉快的用户体验至关重要。我们提出了MFusTSVD,一个通过分析文本、音频和视频图像对YouTube视频内容进行分类的多模态模型。MFusTSVD使用专门的方法从音频和视频图像中提取特征,同时使用BERT transformer处理文本数据。我们的主要创新包括两种新的基于bert的多模态融合方法:B-SMTLMF和B-CMTLRMF。这些方法结合了来自不同数据类型的特征,并提高了模型理解每种数据类型的能力,包括详细的音频模式,从而实现更好的内容分类和与语音相关的分离。MFusTSVD的设计在准确性、精密度、召回率和F-measure方面优于现有模型。测试表明,MFusTSVD在不同的内容类型和评估措施中始终优于流行的模型,如内存融合网络、早期融合LSTM、晚期融合LSTM和多模态变压器。特别是,MFusTSVD有效地平衡了准确性和召回率,这使得它对于识别不适当的语音和音频内容以及更广泛的类别特别有用,从而确保可靠和健壮的内容审核。
{"title":"Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering","authors":"Senthil Murugan Nagarajan;Ganesh Gopal Devarajan;Asha Jerlin M;Daniel Arockiam;Ali Kashif Bashir;Maryam M. Al Dabel","doi":"10.1109/JSTSP.2025.3569446","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3569446","url":null,"abstract":"As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"613-622"},"PeriodicalIF":8.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Purification of Contaminated Convolutional Neural Networks via Robust Recovery: An Approach With Theoretical Guarantee in One-Hidden-Layer Case 基于鲁棒恢复的卷积神经网络净化:一种具有理论保证的单隐层情况
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-10 DOI: 10.1109/JSTSP.2025.3549950
Hanxiao Lu;Zeyu Huang;Ren Wang
Convolutional neural networks (CNNs), one of the key architectures of deep learning models, have achieved superior performance on many machine learning tasks such as image classification, video recognition, and power systems. Despite their success, CNNs can be easily contaminated by natural noises and artificially injected noises such as backdoor attacks. In this paper, we propose a robust recovery method to remove the noise from the potentially contaminated CNNs and provide an exact recovery guarantee on one-hidden-layer non-overlapping CNNs with the rectified linear unit (ReLU) activation function. Our theoretical results show that both CNNs' weights and biases can be exactly recovered under the overparameterization setting with some mild assumptions. The experimental results demonstrate the correctness of the proofs and the effectiveness of the method in both the synthetic environment and the practical neural network setting. Our results also indicate that the proposed method can be extended to multiple-layer CNNs and potentially serve as a defense strategy against backdoor attacks.
卷积神经网络(cnn)是深度学习模型的关键架构之一,在图像分类、视频识别和电力系统等许多机器学习任务上取得了优异的性能。尽管取得了成功,但cnn很容易受到自然噪音和后门攻击等人为注入的噪音的污染。在本文中,我们提出了一种鲁棒恢复方法来去除潜在污染cnn的噪声,并利用整流线性单元(ReLU)激活函数对单隐层非重叠cnn提供精确的恢复保证。我们的理论结果表明,在一些温和的假设下,在过参数化设置下,cnn的权值和偏差都可以精确地恢复。实验结果表明,该方法在综合环境和实际神经网络环境下都是有效的。我们的结果还表明,所提出的方法可以扩展到多层cnn,并有可能作为针对后门攻击的防御策略。
{"title":"Purification of Contaminated Convolutional Neural Networks via Robust Recovery: An Approach With Theoretical Guarantee in One-Hidden-Layer Case","authors":"Hanxiao Lu;Zeyu Huang;Ren Wang","doi":"10.1109/JSTSP.2025.3549950","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3549950","url":null,"abstract":"Convolutional neural networks (CNNs), one of the key architectures of deep learning models, have achieved superior performance on many machine learning tasks such as image classification, video recognition, and power systems. Despite their success, CNNs can be easily contaminated by natural noises and artificially injected noises such as backdoor attacks. In this paper, we propose a robust recovery method to remove the noise from the potentially contaminated CNNs and provide an exact recovery guarantee on one-hidden-layer non-overlapping CNNs with the rectified linear unit (ReLU) activation function. Our theoretical results show that both CNNs' weights and biases can be exactly recovered under the overparameterization setting with some mild assumptions. The experimental results demonstrate the correctness of the proofs and the effectiveness of the method in both the synthetic environment and the practical neural network setting. Our results also indicate that the proposed method can be extended to multiple-layer CNNs and potentially serve as a defense strategy against backdoor attacks.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"507-519"},"PeriodicalIF":8.7,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Selected Topics in Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1