首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
DWW: Robust Deep Wavelet-Domain Watermarking With Enhanced Frequency Mask DWW:使用增强型频率掩码的鲁棒深度小波域水印技术
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-01 DOI: 10.1109/LSP.2024.3490399
Shiyuan Tang;Jiangqun Ni;Wenkang Su;Yulin Zhang
This letter concentrates on the challenges of deep learning-based robust image watermarking against print-scanning, print-camera, and screen-shooting attacks for “physical channel transmission”. Given the excellent performance demonstrated by wavelet domain watermarking, in this paper, we incorporate the wavelet integrated convolutional neural networks (CNNs) and propose a Deep Wavelet-domain Watermarking (DWW) model, which is dedicated to embedding watermarks in the wavelet domain rather than the spatial domain of the previous arts. In addition, a frequency-domain enhanced mask loss is developed to increase the loss weight in the high-frequency regions of the image during back-propagation, thereby encouraging the model to embed the message in low-frequency components with priority so as to improve the robustness performance. Experiment results show that the proposed DWW consistently outperforms other state-of-the-art (SOTA) schemes by a clear margin in terms of embedding capacity, imperceptibility, and robustness.
这封信集中探讨了基于深度学习的鲁棒图像水印技术在 "物理信道传输 "中对抗打印扫描、打印相机和屏幕拍摄攻击的挑战。鉴于小波域水印技术表现出的卓越性能,我们在本文中结合了小波集成卷积神经网络(CNNs),并提出了深度小波域水印(DWW)模型,该模型专门用于在小波域而非之前的空间域嵌入水印。此外,该模型还开发了一种频域增强型掩码损耗,在反向传播过程中增加图像高频区域的损耗权重,从而促使模型优先将信息嵌入低频成分,以提高鲁棒性能。实验结果表明,所提出的 DWW 在嵌入容量、不可感知性和鲁棒性方面都明显优于其他最先进的(SOTA)方案。
{"title":"DWW: Robust Deep Wavelet-Domain Watermarking With Enhanced Frequency Mask","authors":"Shiyuan Tang;Jiangqun Ni;Wenkang Su;Yulin Zhang","doi":"10.1109/LSP.2024.3490399","DOIUrl":"https://doi.org/10.1109/LSP.2024.3490399","url":null,"abstract":"This letter concentrates on the challenges of deep learning-based robust image watermarking against print-scanning, print-camera, and screen-shooting attacks for “physical channel transmission”. Given the excellent performance demonstrated by wavelet domain watermarking, in this paper, we incorporate the wavelet integrated convolutional neural networks (CNNs) and propose a Deep Wavelet-domain Watermarking (DWW) model, which is dedicated to embedding watermarks in the wavelet domain rather than the spatial domain of the previous arts. In addition, a frequency-domain enhanced mask loss is developed to increase the loss weight in the high-frequency regions of the image during back-propagation, thereby encouraging the model to embed the message in low-frequency components with priority so as to improve the robustness performance. Experiment results show that the proposed DWW consistently outperforms other state-of-the-art (SOTA) schemes by a clear margin in terms of embedding capacity, imperceptibility, and robustness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3074-3078"},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New Constructions of 2-D Golay Complementary Array Sets With Highly Flexible Array Sizes for Massive MIMO Omni-Directional Transmission 二维戈莱互补阵列组的新构造,阵列尺寸高度灵活,适用于大规模 MIMO 全向传输
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-31 DOI: 10.1109/LSP.2024.3490403
Xiuping Peng;Yu Wang;Zilong Liu
This letter is concerned with efficient design of two-dimensional (2-D) Golay complementary array sets (GCASs) with ideal aperiodic sums for two correlation directions. Two new direct constructions of 2-D GCASs with highly flexible array sizes are proposed. The core idea is to truncate certain columns from large arrays generated by 2-D extended generalized Boolean functions (EGBFs). We show that these 2-D GCASs lead to highly flexible uniform rectangular array (URA) configurations for precoding matrices in omni-directional massive multi-input multi-output (MIMO) transmission.
这封信主要探讨如何高效设计具有两个相关方向理想非周期性和的二维(2-D)戈莱互补阵列集(GCAS)。本文提出了两种具有高度灵活阵列尺寸的二维 GCAS 新直接构造。其核心思想是从二维扩展广义布尔函数(EGBF)生成的大型阵列中截断某些列。我们证明,这些二维 GCAS 可为全向大规模多输入多输出 (MIMO) 传输中的预编码矩阵提供高度灵活的均匀矩形阵 (URA) 配置。
{"title":"New Constructions of 2-D Golay Complementary Array Sets With Highly Flexible Array Sizes for Massive MIMO Omni-Directional Transmission","authors":"Xiuping Peng;Yu Wang;Zilong Liu","doi":"10.1109/LSP.2024.3490403","DOIUrl":"https://doi.org/10.1109/LSP.2024.3490403","url":null,"abstract":"This letter is concerned with efficient design of two-dimensional (2-D) Golay complementary array sets (GCASs) with ideal aperiodic sums for two correlation directions. Two new direct constructions of 2-D GCASs with highly flexible array sizes are proposed. The core idea is to truncate certain columns from large arrays generated by 2-D extended generalized Boolean functions (EGBFs). We show that these 2-D GCASs lead to highly flexible uniform rectangular array (URA) configurations for precoding matrices in omni-directional massive multi-input multi-output (MIMO) transmission.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3129-3133"},"PeriodicalIF":3.2,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-Free Image-Text Matching via Uncertainty-Aware Learning 通过不确定性感知学习实现无源图像-文本匹配
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-31 DOI: 10.1109/LSP.2024.3488521
Mengxiao Tian;Shuo Yang;Xinxiao Wu;Yunde Jia
When applying a trained image-text matching model to a new scenario, the performance may largely degrade due to domain shift, which makes it impractical in real-world applications. In this paper, we make the first attempt on adapting the image-text matching model well-trained on a labeled source domain to an unlabeled target domain in the absence of source data, namely, source-free image-text matching. This task is challenging since it has no direct access to the source data when learning to reduce the doma in shift. To address this challenge, we propose a simple yet effective method that introduces uncertainty-aware learning to generate high-quality pseudo-pairs of image and text for target adaptation. Specifically, starting with using the pre-trained source model to retrieve several top-ranked image-text pairs from the target domain as pseudo-pairs, we then model uncertainty of each pseudo-pair by calculating the variance of retrieved texts (resp. images) given the paired image (resp. text) as query, and finally incorporate the uncertainty into an objective function to down-weight noisy pseudo-pairs for better training, thereby enhancing adaptation. This uncertainty-aware training approach can be generally applied on all existing models. Extensive experiments on the COCO and Flickr30K datasets demonstrate the effectiveness of the proposed method.
将训练有素的图像-文本匹配模型应用到新的场景时,其性能可能会因领域转移而大幅下降,这使其在实际应用中变得不切实际。在本文中,我们首次尝试在没有源数据的情况下,将在标注源领域训练有素的图像文本匹配模型应用到无标注的目标领域,即无源图像文本匹配。这项任务极具挑战性,因为它在学习减少转移中的 doma 时无法直接访问源数据。为了应对这一挑战,我们提出了一种简单而有效的方法,即引入不确定性感知学习,生成高质量的图像和文本伪对,用于目标适配。具体来说,我们首先使用预先训练好的源模型从目标领域中检索出几个排名靠前的图像-文本配对作为伪配对,然后以配对图像(或文本)为查询条件,通过计算检索到的文本(或图像)的方差来建立每个伪配对的不确定性模型,最后将不确定性纳入目标函数,以降低噪声伪配对的权重,从而提高训练效果,增强适应性。这种不确定性感知训练方法可普遍应用于所有现有模型。在 COCO 和 Flickr30K 数据集上进行的大量实验证明了所提方法的有效性。
{"title":"Source-Free Image-Text Matching via Uncertainty-Aware Learning","authors":"Mengxiao Tian;Shuo Yang;Xinxiao Wu;Yunde Jia","doi":"10.1109/LSP.2024.3488521","DOIUrl":"https://doi.org/10.1109/LSP.2024.3488521","url":null,"abstract":"When applying a trained image-text matching model to a new scenario, the performance may largely degrade due to domain shift, which makes it impractical in real-world applications. In this paper, we make the first attempt on adapting the image-text matching model well-trained on a labeled source domain to an unlabeled target domain in the absence of source data, namely, source-free image-text matching. This task is challenging since it has no direct access to the source data when learning to reduce the doma in shift. To address this challenge, we propose a simple yet effective method that introduces uncertainty-aware learning to generate high-quality pseudo-pairs of image and text for target adaptation. Specifically, starting with using the pre-trained source model to retrieve several top-ranked image-text pairs from the target domain as pseudo-pairs, we then model uncertainty of each pseudo-pair by calculating the variance of retrieved texts (resp. images) given the paired image (resp. text) as query, and finally incorporate the uncertainty into an objective function to down-weight noisy pseudo-pairs for better training, thereby enhancing adaptation. This uncertainty-aware training approach can be generally applied on all existing models. Extensive experiments on the COCO and Flickr30K datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3059-3063"},"PeriodicalIF":3.2,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MonoBooster: Semi-Dense Skip Connection With Cross-Level Attention for Boosting Self-Supervised Monocular Depth Estimation MonoBooster:半密集跳转连接与跨层注意力,用于增强自我监督的单目深度估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-30 DOI: 10.1109/LSP.2024.3488499
Changhao Wang;Guanwen Zhang;Zhengyun Cheng;Wei Zhou
Accurate depth estimation is crucial for various applications that require precise 3D information about the surrounding environment. In this paper, we propose MonoBooster, a feature aggregation architecture to enhance the performance of self-supervised monocular depth estimation. Specifically, we introduce a semi-dense skip connection scheme to aggregate multi-level features extracted from the backbone network. Additionally, we present a novel Cross-Level Attention (CLA) module to fuse the connected features. The CLA module captures spatial correlation using pyramid depth-wise convolution and adaptively extracts channel information from both low-level and high-level features, facilitating the translation from input RGB image to estimated depth map. Experimental results on the KITTI and Make3D datasets validate the effectiveness of the proposed MonoBooster. Notably, the MonoBooster architecture is flexible and can be seamlessly integrated into popular backbones, resulting in enhanced depth estimation performance.
准确的深度估计对于需要周围环境精确三维信息的各种应用来说至关重要。在本文中,我们提出了 MonoBooster 这一特征聚合架构,以提高自监督单目深度估计的性能。具体来说,我们引入了一种半密集跳接方案来聚合从主干网络中提取的多层次特征。此外,我们还提出了一种新颖的跨层关注(Cross-Level Attention,CLA)模块来融合连接的特征。该模块利用金字塔深度卷积捕捉空间相关性,并自适应地从低层次和高层次特征中提取通道信息,从而促进从输入 RGB 图像到估计深度图的转换。在 KITTI 和 Make3D 数据集上的实验结果验证了所提出的 MonoBooster 的有效性。值得注意的是,MonoBooster 架构非常灵活,可以无缝集成到流行的骨干网中,从而提高深度估计性能。
{"title":"MonoBooster: Semi-Dense Skip Connection With Cross-Level Attention for Boosting Self-Supervised Monocular Depth Estimation","authors":"Changhao Wang;Guanwen Zhang;Zhengyun Cheng;Wei Zhou","doi":"10.1109/LSP.2024.3488499","DOIUrl":"https://doi.org/10.1109/LSP.2024.3488499","url":null,"abstract":"Accurate depth estimation is crucial for various applications that require precise 3D information about the surrounding environment. In this paper, we propose MonoBooster, a feature aggregation architecture to enhance the performance of self-supervised monocular depth estimation. Specifically, we introduce a semi-dense skip connection scheme to aggregate multi-level features extracted from the backbone network. Additionally, we present a novel Cross-Level Attention (CLA) module to fuse the connected features. The CLA module captures spatial correlation using pyramid depth-wise convolution and adaptively extracts channel information from both low-level and high-level features, facilitating the translation from input RGB image to estimated depth map. Experimental results on the KITTI and Make3D datasets validate the effectiveness of the proposed MonoBooster. Notably, the MonoBooster architecture is flexible and can be seamlessly integrated into popular backbones, resulting in enhanced depth estimation performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3069-3073"},"PeriodicalIF":3.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition 端到端多对话者重叠语音识别中的无监督领域自适应
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-29 DOI: 10.1109/LSP.2024.3487795
Lin Zheng;Han Zhu;Sanli Tian;Qingwei Zhao;Ta Li
Serialized Output Training (SOT) has emerged as the mainstream approach for addressing the multi-talker overlapped speech recognition challenge due to its simplicity. However, SOT encounters cross-domain performance degradation which hinders its application. Meanwhile, traditional domain adaption methods may harm the accuracy of speaker change point prediction evaluated by UD-CER, which is an important metric in SOT. To solve these issues, we propose Pseudo-Labeling based SOT (PL-SOT) for domain adaptation by treating speaker change token ($< $sc$>$) specially during training to increase the accuracy of speaker change point prediction. Firstly, we improve CTC loss by proposing Weakening and Enhancing CTC (WE-CTC) loss to weaken the learning of error-prone labels surrounding $<$sc$>$ while enhance the emission probability of $< $sc$>$ through modifying posteriors of the pseudo-labels. Secondly, we introduce Weighted Confidence Filter (WCF) that assigns higher scores of $<$sc$>$ to exclude low-quality pseudo-labels without hurting the $< $sc$>$ prediction. Experimental results show that PL-SOT achieves 17.7%/12.8% average relative reduction of CER/UD-CER, with AliMeeting as source domain and AISHELL-4 along with MagicData-RAMC as target domain.
序列化输出训练(SOT)因其简单易行,已成为解决多说话者重叠语音识别难题的主流方法。然而,SOT 会遇到跨域性能下降的问题,这阻碍了它的应用。同时,传统的域自适应方法可能会损害通过 UD-CER 评估的说话人变化点预测的准确性,而 UD-CER 是 SOT 的一个重要指标。为了解决这些问题,我们提出了基于伪标记的 SOT(PL-SOT)领域适应方法,在训练过程中对说话人变化标记($< $sc$>$)进行特殊处理,以提高说话人变化点预测的准确性。首先,我们通过提出弱化和增强 CTC(Weakening and Enhancing CTC,WE-CTC)损失来改进 CTC 损失,以弱化对 $<$sc$>$ 周围易出错标签的学习,同时通过修改伪标签的后验值来增强 $< $sc$>$ 的发射概率。其次,我们引入了加权置信过滤器(WCF),在不影响 $< $sc$>$ 预测的情况下,为 $< $sc$>$ 分配更高的分数,以排除低质量的伪标签。实验结果表明,以 AliMeeting 为源域,AISHELL-4 和 MagicData-RAMC 为目标域,PL-SOT 实现了 17.7%/12.8% 的 CER/UD-CER 平均相对降低率。
{"title":"Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition","authors":"Lin Zheng;Han Zhu;Sanli Tian;Qingwei Zhao;Ta Li","doi":"10.1109/LSP.2024.3487795","DOIUrl":"https://doi.org/10.1109/LSP.2024.3487795","url":null,"abstract":"Serialized Output Training (SOT) has emerged as the mainstream approach for addressing the multi-talker overlapped speech recognition challenge due to its simplicity. However, SOT encounters cross-domain performance degradation which hinders its application. Meanwhile, traditional domain adaption methods may harm the accuracy of speaker change point prediction evaluated by UD-CER, which is an important metric in SOT. To solve these issues, we propose Pseudo-Labeling based SOT (PL-SOT) for domain adaptation by treating speaker change token (\u0000<inline-formula><tex-math>$&lt; $</tex-math></inline-formula>\u0000sc\u0000<inline-formula><tex-math>$&gt;$</tex-math></inline-formula>\u0000) specially during training to increase the accuracy of speaker change point prediction. Firstly, we improve CTC loss by proposing \u0000<italic>Weakening and Enhancing CTC</i>\u0000 (WE-CTC) loss to weaken the learning of error-prone labels surrounding \u0000<inline-formula><tex-math>$&lt;$</tex-math></inline-formula>\u0000sc\u0000<inline-formula><tex-math>$&gt;$</tex-math></inline-formula>\u0000 while enhance the emission probability of \u0000<inline-formula><tex-math>$&lt; $</tex-math></inline-formula>\u0000sc\u0000<inline-formula><tex-math>$&gt;$</tex-math></inline-formula>\u0000 through modifying posteriors of the pseudo-labels. Secondly, we introduce \u0000<italic>Weighted Confidence Filter</i>\u0000 (WCF) that assigns higher scores of \u0000<inline-formula><tex-math>$&lt;$</tex-math></inline-formula>\u0000sc\u0000<inline-formula><tex-math>$&gt;$</tex-math></inline-formula>\u0000 to exclude low-quality pseudo-labels without hurting the \u0000<inline-formula><tex-math>$&lt; $</tex-math></inline-formula>\u0000sc\u0000<inline-formula><tex-math>$&gt;$</tex-math></inline-formula>\u0000 prediction. Experimental results show that PL-SOT achieves 17.7%/12.8% average relative reduction of CER/UD-CER, with AliMeeting as source domain and AISHELL-4 along with MagicData-RAMC as target domain.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3119-3123"},"PeriodicalIF":3.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantile Learn-Then-Test: Quantile-Based Risk Control for Hyperparameter Optimization 定量学习,然后测试:基于定量的超参数优化风险控制
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-24 DOI: 10.1109/LSP.2024.3486238
Amirmohammad Farzaneh;Sangwoo Park;Osvaldo Simeone
The increasing adoption of Artificial Intelligence (AI) in engineering problems calls for the development of calibration methods capable of offering robust statistical reliability guarantees. The calibration of black box AI models is carried out via the optimization of hyperparameters dictating architecture, optimization, and/or inference configuration. Prior work has introduced learn-then-test (LTT), a calibration procedure for hyperparameter optimization (HPO) that provides statistical guarantees on average performance measures. Recognizing the importance of controlling risk-aware objectives in engineering contexts, this work introduces a variant of LTT that is designed to provide statistical guarantees on quantiles of a risk measure. We illustrate the practical advantages of this approach by applying the proposed algorithm to a radio access scheduling problem.
工程问题中越来越多地采用人工智能(AI),这就要求开发能够提供稳健的统计可靠性保证的校准方法。黑盒人工智能模型的校准是通过优化决定架构、优化和/或推理配置的超参数来实现的。之前的工作引入了 "先学习后测试"(LTT),这是一种超参数优化(HPO)校准程序,可为平均性能指标提供统计保证。认识到在工程环境中控制风险意识目标的重要性,这项工作引入了 LTT 的变体,旨在为风险度量的定量提供统计保证。我们将提出的算法应用于无线电接入调度问题,以此说明这种方法的实际优势。
{"title":"Quantile Learn-Then-Test: Quantile-Based Risk Control for Hyperparameter Optimization","authors":"Amirmohammad Farzaneh;Sangwoo Park;Osvaldo Simeone","doi":"10.1109/LSP.2024.3486238","DOIUrl":"https://doi.org/10.1109/LSP.2024.3486238","url":null,"abstract":"The increasing adoption of Artificial Intelligence (AI) in engineering problems calls for the development of calibration methods capable of offering robust statistical reliability guarantees. The calibration of black box AI models is carried out via the optimization of hyperparameters dictating architecture, optimization, and/or inference configuration. Prior work has introduced learn-then-test (LTT), a calibration procedure for hyperparameter optimization (HPO) that provides statistical guarantees on average performance measures. Recognizing the importance of controlling risk-aware objectives in engineering contexts, this work introduces a variant of LTT that is designed to provide statistical guarantees on quantiles of a risk measure. We illustrate the practical advantages of this approach by applying the proposed algorithm to a radio access scheduling problem.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3044-3048"},"PeriodicalIF":3.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HF2TNet: A Hierarchical Fusion Two-Stage Training Network for Infrared and Visible Image Fusion HF2TNet:用于红外和可见光图像融合的分层融合两级训练网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-24 DOI: 10.1109/LSP.2024.3486113
Ting Lv;Chuanming Ji;Hong Jiang;Yu Liu
In the field of infrared and visible image fusion, current algorithms often focus on complex feature extraction and sophisticated fusion mechanisms, ignoring the issues of information redundancy and feature imbalance. These limit effective information aggregation. To address these issues, this paper proposes a hierarchical fusion strategy with a two-stage training network, abbreviated as HF2TNet, which achieves effective information aggregation in a staged manner. In the initial training stage, a three-stream encoder-decoder architecture is proposed, seamlessly integrating CNN and transformer modules. This architecture extracts both global and local features from visible and infrared images, capturing their shared attributes before the fusion process. Moreover, a multi-shared attention module (MSAM) is proposed to profoundly reconstruct and augment the visible and infrared features, ensuring the preservation and enhancement of details across modalities. In the subsequent stage, HF2TNet utilizes the pre-integrated features as query inputs for the dual MSAMs. These modules interact with the previously reconstructed infrared and visible features to enhance complementary information and ensure a balanced feature fusion. Experimental results indicate HF2TNet's superior performance on standard datasets like MSRS and TNO, especially in complex scenes, demonstrating its potential in multimodal image fusion.
在红外图像和可见光图像融合领域,目前的算法往往侧重于复杂的特征提取和复杂的融合机制,而忽视了信息冗余和特征不平衡的问题。这些问题限制了有效的信息聚合。针对这些问题,本文提出了一种采用两阶段训练网络的分层融合策略(简称 HF2TNet),以分阶段的方式实现有效的信息聚合。在初始训练阶段,本文提出了一种三流编码器-解码器架构,无缝集成了 CNN 和变换器模块。该架构从可见光和红外图像中提取全局和局部特征,在融合过程之前捕捉它们的共享属性。此外,还提出了一个多共享注意力模块(MSAM),用于深度重构和增强可见光和红外特征,确保跨模态的细节保留和增强。在随后的阶段,HF2TNet 利用预先集成的特征作为双 MSAM 的查询输入。这些模块与之前重建的红外和可见光特征相互作用,以增强互补信息并确保均衡的特征融合。实验结果表明,HF2TNet 在 MSRS 和 TNO 等标准数据集上表现出色,尤其是在复杂场景中,证明了其在多模态图像融合方面的潜力。
{"title":"HF2TNet: A Hierarchical Fusion Two-Stage Training Network for Infrared and Visible Image Fusion","authors":"Ting Lv;Chuanming Ji;Hong Jiang;Yu Liu","doi":"10.1109/LSP.2024.3486113","DOIUrl":"https://doi.org/10.1109/LSP.2024.3486113","url":null,"abstract":"In the field of infrared and visible image fusion, current algorithms often focus on complex feature extraction and sophisticated fusion mechanisms, ignoring the issues of information redundancy and feature imbalance. These limit effective information aggregation. To address these issues, this paper proposes a hierarchical fusion strategy with a two-stage training network, abbreviated as HF2TNet, which achieves effective information aggregation in a staged manner. In the initial training stage, a three-stream encoder-decoder architecture is proposed, seamlessly integrating CNN and transformer modules. This architecture extracts both global and local features from visible and infrared images, capturing their shared attributes before the fusion process. Moreover, a multi-shared attention module (MSAM) is proposed to profoundly reconstruct and augment the visible and infrared features, ensuring the preservation and enhancement of details across modalities. In the subsequent stage, HF2TNet utilizes the pre-integrated features as query inputs for the dual MSAMs. These modules interact with the previously reconstructed infrared and visible features to enhance complementary information and ensure a balanced feature fusion. Experimental results indicate HF2TNet's superior performance on standard datasets like MSRS and TNO, especially in complex scenes, demonstrating its potential in multimodal image fusion.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3164-3168"},"PeriodicalIF":3.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KFA: Keyword Feature Augmentation for Open Set Keyword Spotting KFA:用于发现开放集关键词的关键词特征增强技术
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-22 DOI: 10.1109/LSP.2024.3484932
Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko
In recent years, with the advancement of deep learning technology and the emergence of smart devices, there has been a growing interest in keyword spotting (KWS), which is used to activate AI systems with automatic speech recognition and text-to-speech. However, smart devices with KWS often encounter false alarm errors when inputting unexpected words. To address this issue, existing KWS methods typically train non-target words as an unknown class. Despite these efforts, there is still a possibility that unseen words not trained as part of the unknown class could be misclassified as one of the target words. To overcome this limitation, we propose a new method named Keyword Feature Augmentation (KFA) for open-set KWS. KFA performs feature augmentation through adversarial learning to increase the loss. The augmented features are constrained within a limited space using label smoothing. Unlike other generative model-based open set recognition (OSR) methods, KFA does not require any additional training parameters or repeated operation for inference. As a result, KFA has achieved a 0.955 AUROC score and 97.34% target class accuracy for Google Speech Commands V1, and a 0.959 AUROC score and 98.17% target class accuracy for Google Speech Commands V2, which is the highest performance when compared to various OSR methods.
近年来,随着深度学习技术的发展和智能设备的出现,人们对关键词识别(KWS)越来越感兴趣,它被用来激活具有自动语音识别和文本转语音功能的人工智能系统。然而,带有 KWS 的智能设备在输入意外词语时经常会遇到误报错误。为了解决这个问题,现有的 KWS 方法通常将非目标词作为未知类进行训练。尽管做出了这些努力,但未被训练为未知类的未知单词仍有可能被误判为目标单词之一。为了克服这一局限性,我们为开放集 KWS 提出了一种名为关键词特征增强(KFA)的新方法。KFA 通过对抗学习进行特征增强,以增加损失。使用标签平滑法将增强特征限制在有限的空间内。与其他基于生成模型的开放集识别(OSR)方法不同,KFA 不需要任何额外的训练参数或重复推理操作。因此,KFA 在谷歌语音命令 V1 中获得了 0.955 AUROC 分数和 97.34% 的目标类别准确率,在谷歌语音命令 V2 中获得了 0.959 AUROC 分数和 98.17% 的目标类别准确率,是与各种 OSR 方法相比性能最高的。
{"title":"KFA: Keyword Feature Augmentation for Open Set Keyword Spotting","authors":"Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko","doi":"10.1109/LSP.2024.3484932","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484932","url":null,"abstract":"In recent years, with the advancement of deep learning technology and the emergence of smart devices, there has been a growing interest in keyword spotting (KWS), which is used to activate AI systems with automatic speech recognition and text-to-speech. However, smart devices with KWS often encounter false alarm errors when inputting unexpected words. To address this issue, existing KWS methods typically train non-target words as an \u0000<italic>unknown</i>\u0000 class. Despite these efforts, there is still a possibility that unseen words not trained as part of the \u0000<italic>unknown</i>\u0000 class could be misclassified as one of the target words. To overcome this limitation, we propose a new method named Keyword Feature Augmentation (KFA) for open-set KWS. KFA performs feature augmentation through adversarial learning to increase the loss. The augmented features are constrained within a limited space using label smoothing. Unlike other generative model-based open set recognition (OSR) methods, KFA does not require any additional training parameters or repeated operation for inference. As a result, KFA has achieved a 0.955 AUROC score and 97.34% target class accuracy for Google Speech Commands V1, and a 0.959 AUROC score and 98.17% target class accuracy for Google Speech Commands V2, which is the highest performance when compared to various OSR methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2985-2989"},"PeriodicalIF":3.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Training Acceleration via Sample-Wise Dynamic Probabilistic Pruning 通过抽样动态概率剪枝实现高效训练加速
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-21 DOI: 10.1109/LSP.2024.3484289
Feicheng Huang;Wenbo Zhou;Yue Huang;Xinghao Ding
Data pruning is observed to substantially reduce the computation and memory costs of model training. Previous studies have primarily focused on constructing a series of coresets with representative samples by leveraging predefined rules for evaluating sample importance. Learning dynamics and selection bias, however, are rarely being considered. In this letter, a novel Sample-wise Dynamic Probabilistic Pruning (SwDPP) method is proposed for efficient training. Specifically, instead of hard-pruning the samples that are considered easy or well-learned, we formulate the pruning process as a probabilistic sampling problem. This is achieved by a carefully-designed soft-selection mechanism, which constantly expresses learning dynamics and relaxes selection bias. Moreover, to alleviate the accuracy drop under high pruning rates, we introduce a probabilistic Mixup strategy for information diversity maintenance. Extensive experiments conducted on CIFAR-10, CIFAR-100 and Tiny-ImageNet show that, the proposed SwDPP outperforms current state-of-the-art methods across various pruning settings. Notably, on CIFAR-10 and CIFAR-100, SwDPP achieves lossless training acceleration using only 70% of the data per epoch.
据观察,数据剪枝可大幅降低模型训练的计算和记忆成本。以往的研究主要集中在利用预定义的样本重要性评估规则,构建一系列具有代表性样本的核心集。然而,学习动态和选择偏差却很少被考虑在内。在这封信中,我们提出了一种新颖的样本动态概率剪枝(SwDPP)方法,以实现高效训练。具体来说,我们将剪枝过程表述为一个概率抽样问题,而不是硬剪枝那些被认为容易或学习良好的样本。这是通过精心设计的软选择机制来实现的,该机制不断表达学习动态,并放宽选择偏差。此外,为了缓解高剪枝率下的准确率下降问题,我们还引入了一种用于信息多样性维护的概率混合策略。在 CIFAR-10、CIFAR-100 和 Tiny-ImageNet 上进行的大量实验表明,所提出的 SwDPP 在各种剪枝设置下都优于目前最先进的方法。值得注意的是,在 CIFAR-10 和 CIFAR-100 上,SwDPP 每个历时仅使用 70% 的数据就实现了无损训练加速。
{"title":"Efficient Training Acceleration via Sample-Wise Dynamic Probabilistic Pruning","authors":"Feicheng Huang;Wenbo Zhou;Yue Huang;Xinghao Ding","doi":"10.1109/LSP.2024.3484289","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484289","url":null,"abstract":"Data pruning is observed to substantially reduce the computation and memory costs of model training. Previous studies have primarily focused on constructing a series of coresets with representative samples by leveraging predefined rules for evaluating sample importance. Learning dynamics and selection bias, however, are rarely being considered. In this letter, a novel Sample-wise Dynamic Probabilistic Pruning (SwDPP) method is proposed for efficient training. Specifically, instead of hard-pruning the samples that are considered easy or well-learned, we formulate the pruning process as a probabilistic sampling problem. This is achieved by a carefully-designed soft-selection mechanism, which constantly expresses learning dynamics and relaxes selection bias. Moreover, to alleviate the accuracy drop under high pruning rates, we introduce a probabilistic Mixup strategy for information diversity maintenance. Extensive experiments conducted on CIFAR-10, CIFAR-100 and Tiny-ImageNet show that, the proposed SwDPP outperforms current state-of-the-art methods across various pruning settings. Notably, on CIFAR-10 and CIFAR-100, SwDPP achieves lossless training acceleration using only 70% of the data per epoch.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3034-3038"},"PeriodicalIF":3.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Prompt: Boosting Whisper's Performance in Low-Resource Speech Recognition 元提示:提升 Whisper 在低资源语音识别中的性能
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-21 DOI: 10.1109/LSP.2024.3484328
Yaqi Chen;Tong Niu;Hao Zhang;Wenlin Zhang;Dan Qu
Recent advancements in large-scale pre-trained automatic speech recognition (ASR) foundation models (e.g., Whisper) have exhibited remarkable performance in speech processing tasks. A recently emerging paradigm, prompt tuning, offers a parameter-efficient approach for fine-tuning, which has proven to be effective in enhancing the adaptation of pre-trained models to downstream tasks. In this paper, we first explore the prompting method for low-resource speech recognition based on Whisper. Although effective, it poses a challenge in the few-shot scenario due to its high sensitivity to initialization. To address this problem, we propose a novel meta-prompt for low-resource speech recognition that leverages the benefits of meta-learning for fast learning. Moreover, we further present a lightweight version of meta-prompt that omits the learning of encoder-prompt, reducing computational and storage costs. Extensive experiments on FLEURS datasets demonstrate consistent improvements across eleven target languages, showing better generalizability. Notably, meta-prompt achieves similar performance with a 20%-shot compared to prompt tuning with a 50%-shot setting, suggesting excellent few-shot learning ability.
最近,大规模预训练自动语音识别(ASR)基础模型(如 Whisper)在语音处理任务中表现出了卓越的性能。最近出现的一种范例--提示调整,提供了一种参数高效的微调方法,事实证明它能有效地提高预训练模型对下游任务的适应性。在本文中,我们首先探讨了基于 Whisper 的低资源语音识别提示方法。虽然这种方法很有效,但由于其对初始化的高度敏感性,它在少拍场景中构成了挑战。为了解决这个问题,我们提出了一种用于低资源语音识别的新型元提示方法,它利用元学习的优势实现快速学习。此外,我们还进一步提出了元提示的轻量级版本,省略了编码器提示的学习,从而降低了计算和存储成本。在 FLEURS 数据集上进行的广泛实验表明,在 11 种目标语言中,元提示都取得了一致的改进,显示了更好的普适性。值得注意的是,元提示与提示调谐相比,在 20% 提示设置下取得了相似的性能,而在 50%提示设置下取得了相似的性能,这表明元提示具有出色的少量提示学习能力。
{"title":"Meta-Prompt: Boosting Whisper's Performance in Low-Resource Speech Recognition","authors":"Yaqi Chen;Tong Niu;Hao Zhang;Wenlin Zhang;Dan Qu","doi":"10.1109/LSP.2024.3484328","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484328","url":null,"abstract":"Recent advancements in large-scale pre-trained automatic speech recognition (ASR) foundation models (e.g., Whisper) have exhibited remarkable performance in speech processing tasks. A recently emerging paradigm, prompt tuning, offers a parameter-efficient approach for fine-tuning, which has proven to be effective in enhancing the adaptation of pre-trained models to downstream tasks. In this paper, we first explore the prompting method for low-resource speech recognition based on Whisper. Although effective, it poses a challenge in the few-shot scenario due to its high sensitivity to initialization. To address this problem, we propose a novel meta-prompt for low-resource speech recognition that leverages the benefits of meta-learning for fast learning. Moreover, we further present a lightweight version of meta-prompt that omits the learning of encoder-prompt, reducing computational and storage costs. Extensive experiments on FLEURS datasets demonstrate consistent improvements across eleven target languages, showing better generalizability. Notably, meta-prompt achieves similar performance with a 20%-shot compared to prompt tuning with a 50%-shot setting, suggesting excellent few-shot learning ability.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3039-3043"},"PeriodicalIF":3.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1