首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition 频率解耦掩码自编码器用于自监督骨架动作识别
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3525398
Ye Liu;Tianhao Shi;Mingliang Zhai;Jun Liu
In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.
在基于3D骨骼的动作识别中,监督数据的有限可用性促使人们对自监督学习方法产生了兴趣。基于掩码自编码器(MAE)的重构范式是一种有效的主流自监督学习方法。然而,最近的研究表明,MAE模型倾向于关注某一频率范围内的特征,这可能会导致重要信息的丢失。为了解决这个问题,我们提出了一种频率解耦的MAE。具体来说,通过结合特定尺度的频率特征重建模块,我们深入研究了利用频率信息作为重建的直接和明确的目标,这增强了MAE在数据中识别和准确再现不同频率属性的能力。此外,为了解决更复杂的优化目标在频率重构中导致梯度更新不稳定的问题,我们引入了双路径网络结合指数移动平均(EMA)参数更新策略来指导模型稳定训练过程。我们进行了大量的实验,证明了所提出方法的有效性。
{"title":"Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition","authors":"Ye Liu;Tianhao Shi;Mingliang Zhai;Jun Liu","doi":"10.1109/LSP.2024.3525398","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525398","url":null,"abstract":"In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"546-550"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142975891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-Prompted Network: Efficient Audio–Visual Segmentation via Transformer and Prompt Learning 变压器提示网络:基于变压器和提示学习的高效视听分割
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3524120
Yusen Wang;Xiaohong Qian;Wujie Zhou
Audio–visual segmentation (AVS) is a challenging task that focuses on segmenting sound-producing objects within video frames by leveraging audio signals. Existing convolutional neural networks (CNNs) and Transformer-based methods extract features separately from modality-specific encoders and then use fusion modules to integrate the visual and auditory features. We propose an effective Transformer-prompted network, TPNet, which utilizes prompt learning with a Transformer to guide the CNN in addressing AVS tasks. Specifically, during feature encoding, we incorporate a frequency-based prompt-supplement module to fine-tune and enhance the encoded features through frequency-domain methods. Furthermore, during audio–visual fusion, we integrate a self-supplementing cross-fusion module that uses self-attention, two-dimensional selective scanning, and cross-attention mechanisms to merge and enhance audio–visual features effectively. The prompt features undergo the same processing in cross-modal fusion, further refining the fused features to achieve more accurate segmentation results. Finally, we apply self-knowledge distillation to the network, further enhancing the model performance. Extensive experiments on the AVSBench dataset validate the effectiveness of TPNet.
视听分割(AVS)是一项具有挑战性的任务,其重点是利用音频信号对视频帧内的声音产生对象进行分割。现有的卷积神经网络(cnn)和基于transformer的方法分别从模态特定的编码器中提取特征,然后使用融合模块将视觉和听觉特征融合。我们提出了一个有效的变压器提示网络,TPNet,它利用变压器的快速学习来指导CNN处理AVS任务。具体而言,在特征编码过程中,我们引入了基于频率的提示补充模块,通过频域方法对编码特征进行微调和增强。此外,在视听融合过程中,我们集成了一个自补充的交叉融合模块,该模块利用自注意、二维选择性扫描和交叉注意机制有效地融合和增强了视听特征。提示特征在跨模态融合中进行同样的处理,进一步细化融合特征,以获得更准确的分割结果。最后,我们将自知识蒸馏应用于网络,进一步提高了模型的性能。在AVSBench数据集上的大量实验验证了TPNet的有效性。
{"title":"Transformer-Prompted Network: Efficient Audio–Visual Segmentation via Transformer and Prompt Learning","authors":"Yusen Wang;Xiaohong Qian;Wujie Zhou","doi":"10.1109/LSP.2024.3524120","DOIUrl":"https://doi.org/10.1109/LSP.2024.3524120","url":null,"abstract":"Audio–visual segmentation (AVS) is a challenging task that focuses on segmenting sound-producing objects within video frames by leveraging audio signals. Existing convolutional neural networks (CNNs) and Transformer-based methods extract features separately from modality-specific encoders and then use fusion modules to integrate the visual and auditory features. We propose an effective Transformer-prompted network, TPNet, which utilizes prompt learning with a Transformer to guide the CNN in addressing AVS tasks. Specifically, during feature encoding, we incorporate a frequency-based prompt-supplement module to fine-tune and enhance the encoded features through frequency-domain methods. Furthermore, during audio–visual fusion, we integrate a self-supplementing cross-fusion module that uses self-attention, two-dimensional selective scanning, and cross-attention mechanisms to merge and enhance audio–visual features effectively. The prompt features undergo the same processing in cross-modal fusion, further refining the fused features to achieve more accurate segmentation results. Finally, we apply self-knowledge distillation to the network, further enhancing the model performance. Extensive experiments on the AVSBench dataset validate the effectiveness of TPNet.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"516-520"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Degree of Freedom Bound of Secret-Key Capacity for Two-Way Wiretap Channel 双向窃听信道密钥容量的安全自由度边界
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3525406
Qingpeng Liang;Linsong Du;Yanzhi Wu;Zheng Ma
This letter focuses on the optimal sum secure degree of freedom (SDoF) in a two-way wiretap channel (TW-WC), wherein two legitimate full-duplex multiple-antenna nodes cooperate with each other and are wiretapped by a multiple antenna eavesdropper simultaneously. It aims to find the optimal sum SDoF pertaining to secret-key capacity for the TW-WC. First, we analyze the upper bound and lower bounds of the optimal sum SDoF by establishing their equivalence to the expression of the optimal SDoF corresponding to the secrecy rate for the TW-WC. Subsequently, in scenarios where the legitimate nodes are configured with an equal number of transmit and receive antennas, it is elucidated that the upper and lower bounds of the optimal SDoF converge. Furthermore, the findings suggest that a higher SDoF can be achieved than the existing works, thereby heralding an enhancement in secure spectral efficiency.
这封信的重点是在双向窃听信道(TW-WC)的最佳总和安全自由度(SDoF),其中两个合法的全双工多天线节点相互合作,同时被多天线窃听者窃听。其目的是寻找与TW-WC保密密钥容量相关的最优SDoF总和。首先,我们分析了最优和SDoF的上界和下界,建立了它们与TW-WC保密率对应的最优SDoF表达式的等价性。然后,在合法节点配置相同数量的发射和接收天线的情况下,阐明了最优SDoF的上界和下界收敛。此外,研究结果表明,可以实现比现有工作更高的SDoF,从而预示着安全频谱效率的提高。
{"title":"Secure Degree of Freedom Bound of Secret-Key Capacity for Two-Way Wiretap Channel","authors":"Qingpeng Liang;Linsong Du;Yanzhi Wu;Zheng Ma","doi":"10.1109/LSP.2024.3525406","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525406","url":null,"abstract":"This letter focuses on the optimal sum secure degree of freedom (SDoF) in a two-way wiretap channel (TW-WC), wherein two legitimate full-duplex multiple-antenna nodes cooperate with each other and are wiretapped by a multiple antenna eavesdropper simultaneously. It aims to find the optimal sum SDoF pertaining to secret-key capacity for the TW-WC. First, we analyze the upper bound and lower bounds of the optimal sum SDoF by establishing their equivalence to the expression of the optimal SDoF corresponding to the secrecy rate for the TW-WC. Subsequently, in scenarios where the legitimate nodes are configured with an equal number of transmit and receive antennas, it is elucidated that the upper and lower bounds of the optimal SDoF converge. Furthermore, the findings suggest that a higher SDoF can be achieved than the existing works, thereby heralding an enhancement in secure spectral efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"581-585"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study on the Optimality of Downlink Hybrid NOMA 下行链路混合NOMA的最优性研究
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3524096
Zhiguo Ding
The key idea of hybrid non-orthogonal multiple access (NOMA) is to allow users to use the bandwidth resources to which they cannot have access in orthogonal multiple access (OMA) based legacy networks while still guaranteeing its compatibility with the legacy network. However, in a conventional hybrid NOMA downlink network, some users have access to more bandwidth resources than others, which leads to a potential performance loss. So what if the users can access the same amount of bandwidth resources? This letter focuses on a simple two-user scenario, and develops analytical and simulation results to reveal that for this considered scenario, conventional hybrid NOMA is still an optimal transmission strategy.
混合非正交多址(NOMA)的关键思想是允许用户使用在基于正交多址(OMA)的传统网络中无法访问的带宽资源,同时保证其与传统网络的兼容性。然而,在传统的混合NOMA下行网络中,一些用户可以访问比其他用户更多的带宽资源,从而导致潜在的性能损失。那么,如果用户可以访问相同数量的带宽资源呢?这封信的重点是一个简单的双用户场景,并开发了分析和仿真结果,以揭示在这种考虑的场景中,传统的混合NOMA仍然是一种最佳的传输策略。
{"title":"A Study on the Optimality of Downlink Hybrid NOMA","authors":"Zhiguo Ding","doi":"10.1109/LSP.2024.3524096","DOIUrl":"https://doi.org/10.1109/LSP.2024.3524096","url":null,"abstract":"The key idea of hybrid non-orthogonal multiple access (NOMA) is to allow users to use the bandwidth resources to which they cannot have access in orthogonal multiple access (OMA) based legacy networks while still guaranteeing its compatibility with the legacy network. However, in a conventional hybrid NOMA downlink network, some users have access to more bandwidth resources than others, which leads to a potential performance loss. So what if the users can access the same amount of bandwidth resources? This letter focuses on a simple two-user scenario, and develops analytical and simulation results to reveal that for this considered scenario, conventional hybrid NOMA is still an optimal transmission strategy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"511-515"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer 基于变压器的拥挤情况下头部特征不完全行人检测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-02 DOI: 10.1109/LSP.2024.3525397
Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou
Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP$_{m}$ from 53.02 to 53.87. Furthermore, the mMR$^{-2}$ decreased from 52.46 to 42.32 compared to the existing BFJ.
拥挤环境下的行人检测是一项具有挑战性的任务。本研究提出了一种简单有效的方法,称为Det RCNN,用于在拥挤情况下检测行人,同时还将行人个体的身体和头部进行配对。一方面,行人头部具有形状稳定、特征鲜明的特点。另一方面,他们的头部通常在图像中的位置较高,因此即使在拥挤的情况下,也很难完全覆盖行人的头部。因此,本研究为DETR模型配备了一个与解码器并行的头部解码器(HDecoder)。HDecoder将在Decoder阶段生成的头部知识作为头部查询。同时,HDecoder使用键查询机制在整个图像中搜索与头部查询相对应的body边界框。最后,提出的方法在Decoder和HDecoder阶段产生的体边界框之间进行直接的IOU (Intersection over Union)匹配。这种HDecoder类似于Faster RCNN模型的第二阶段,因此本文将其称为Det RCNN (DETR RCNN)。与Deformable DETR相比,在CrowdHuman数据集上的实验结果表明,该模型可以将AP$_{m}$从53.02提高到53.87。mMR$^{-2}$从52.46下降到42.32。
{"title":"Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer","authors":"Zefei Chen;Yongjie Lin;Jianmin Xu;Kai Lu;Yanfang Shou","doi":"10.1109/LSP.2024.3525397","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525397","url":null,"abstract":"Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP<inline-formula><tex-math>$_{m}$</tex-math></inline-formula> from 53.02 to 53.87. Furthermore, the mMR<inline-formula><tex-math>$^{-2}$</tex-math></inline-formula> decreased from 52.46 to 42.32 compared to the existing BFJ.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"576-580"},"PeriodicalIF":3.2,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Decreasing Concave Regularized Minimization for Principal Component Analysis 主成分分析的非递减凹正则化极小化
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3523223
Qinghai Zheng;Yixin Zhuang
As a widely used method in signal processing, Principal Component Analysis (PCA) performs both the compression and the recovery of high dimensional data by leveraging the linear transformations. Considering the robustness of PCA, how to discriminate correct samples and outliers in PCA is a crucial and challenging issue. In this paper, we present a general model, which conducts PCA via a non-decreasing concave regularized minimization and is termed PCA-NCRM for short. Different from most existing PCA methods, which learn the linear transformations by minimizing the recovery errors between the recovered data and the original data in the least squared sense, our model adopts the monotonically non-decreasing concave function to enhance the ability of model in distinguishing correct samples and outliers. To be specific, PCA-NCRM enlarges the attention to samples with smaller recovery errors and diminishes the attention to samples with larger recovery errors at the same time. The proposed minimization problem can be efficiently addressed by employing an iterative re-weighting optimization. Experimental results on several datasets show the effectiveness of our model.
主成分分析(PCA)是一种广泛应用于信号处理的方法,它利用线性变换对高维数据进行压缩和恢复。考虑到主成分分析的鲁棒性,如何在主成分分析中区分正确的样本和异常值是一个关键而具有挑战性的问题。在本文中,我们提出了一个通用模型,该模型通过非递减凹正则化最小化来进行主成分分析,简称PCA- ncrm。现有的大多数PCA方法通过最小化恢复数据与原始数据在最小二乘意义上的恢复误差来学习线性变换,而我们的模型采用单调不减小的凹函数来提高模型区分正确样本和离群点的能力。具体而言,PCA-NCRM增加了对恢复误差较小的样本的关注,同时减少了对恢复误差较大的样本的关注。所提出的最小化问题可以通过采用迭代重加权优化有效地解决。在多个数据集上的实验结果表明了该模型的有效性。
{"title":"Non-Decreasing Concave Regularized Minimization for Principal Component Analysis","authors":"Qinghai Zheng;Yixin Zhuang","doi":"10.1109/LSP.2024.3523223","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523223","url":null,"abstract":"As a widely used method in signal processing, Principal Component Analysis (PCA) performs both the compression and the recovery of high dimensional data by leveraging the linear transformations. Considering the robustness of PCA, how to discriminate correct samples and outliers in PCA is a crucial and challenging issue. In this paper, we present a general model, which conducts PCA via a non-decreasing concave regularized minimization and is termed PCA-NCRM for short. Different from most existing PCA methods, which learn the linear transformations by minimizing the recovery errors between the recovered data and the original data in the least squared sense, our model adopts the monotonically non-decreasing concave function to enhance the ability of model in distinguishing correct samples and outliers. To be specific, PCA-NCRM enlarges the attention to samples with smaller recovery errors and diminishes the attention to samples with larger recovery errors at the same time. The proposed minimization problem can be efficiently addressed by employing an iterative re-weighting optimization. Experimental results on several datasets show the effectiveness of our model.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"486-490"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trinity Detector: Text-Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection 三位一体检测器:基于文本辅助和注意机制的光谱融合扩散生成图像检测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3522851
Jiawei Song;Dengpan Ye;Yunming Zhang
Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Experimental results demonstrate that traditional forgery detection methods perform poorly in adapting to diffusion model-generated scenarios, while existing diffusion-specific techniques lack robustness against post-processed images. In response, we propose the Trinity Detector, which integrates coarse-grained text features from a Contrastive Language-Image Pretraining (CLIP) encoder with fine-grained artifacts in the pixel domain to achieve semantic-level image detection, significantly enhancing model robustness. To enhance sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed. It adaptively fuses multiple preset frequency bands, dynamically adjusting the weight of each band, and then integrates the fused frequency-domain information with the spatial co-occurrence of the two modalities. Extensive experiments validate that our Trinity Detector improves transfer detection performance across black-box datasets by an average of 14.3% compared to previous diffusion detection models and demonstrating superior performance on post-processed image datasets.
以文本到图像生成为代表的人工智能生成内容(AIGC)技术导致了深度伪造的恶意使用,引发了对多媒体内容可信度的担忧。实验结果表明,传统的伪造检测方法在适应扩散模型生成场景方面表现不佳,而现有的特定扩散技术对后处理图像缺乏鲁棒性。作为回应,我们提出了Trinity Detector,它将来自对比语言图像预训练(CLIP)编码器的粗粒度文本特征与像素域的细粒度伪像集成在一起,实现语义级图像检测,显著增强了模型的鲁棒性。为了提高对扩散生成图像特征的灵敏度,设计了多光谱通道注意力融合单元(MCAF)。该算法对多个预设频段进行自适应融合,动态调整各频段的权重,然后将融合后的频域信息与两模态的空间共现进行融合。大量的实验证明,与以前的扩散检测模型相比,我们的Trinity检测器在黑箱数据集上的传输检测性能平均提高了14.3%,并且在后处理图像数据集上表现出卓越的性能。
{"title":"Trinity Detector: Text-Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection","authors":"Jiawei Song;Dengpan Ye;Yunming Zhang","doi":"10.1109/LSP.2024.3522851","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522851","url":null,"abstract":"Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Experimental results demonstrate that traditional forgery detection methods perform poorly in adapting to diffusion model-generated scenarios, while existing diffusion-specific techniques lack robustness against post-processed images. In response, we propose the Trinity Detector, which integrates coarse-grained text features from a Contrastive Language-Image Pretraining (CLIP) encoder with fine-grained artifacts in the pixel domain to achieve semantic-level image detection, significantly enhancing model robustness. To enhance sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed. It adaptively fuses multiple preset frequency bands, dynamically adjusting the weight of each band, and then integrates the fused frequency-domain information with the spatial co-occurrence of the two modalities. Extensive experiments validate that our Trinity Detector improves transfer detection performance across black-box datasets by an average of 14.3% compared to previous diffusion detection models and demonstrating superior performance on post-processed image datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"501-505"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IrisFormer: A Dedicated Transformer Framework for Iris Recognition IrisFormer:虹膜识别专用变压器框架
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3522856
Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun
While Vision Transformer (ViT)-based methods have significantly improved the performance of various vision tasks in natural scenes, progress in iris recognition remains limited. In addition, the human iris contains unique characters that are distinct from natural scenes. To remedy this, this paper investigates a dedicated Transformer framework, termed IrisFormer, for iris recognition and attempts to improve the accuracy by combining the contextual modeling ability of ViT and iris-specific optimization to learn robust, fine-grained, and discriminative features. Specifically, to achieve rotation invariance in iris recognition, we employ relative position encoding instead of regular absolute position encoding for each iris image token, and a horizontal pixel-shifting strategy is utilized during training for data augmentation. Then, to enhance the model's robustness against local distortions such as occlusions and reflections, we randomly mask some tokens during training to force the model to learn representative identity features from only part of the image. Finally, considering that fine-grained features are more discriminative in iris recognition, we retain the entire token sequence for patch-wise feature matching instead of using the standard single classification token. Experiments on three popular datasets demonstrate that the proposed framework achieves competitive performance under both intra- and inter-dataset testing protocols.
尽管基于视觉变换(Vision Transformer, ViT)的方法显著提高了自然场景中各种视觉任务的性能,但虹膜识别的进展仍然有限。此外,人的虹膜包含着不同于自然场景的独特特征。为了解决这个问题,本文研究了一个专用的Transformer框架,称为IrisFormer,用于虹膜识别,并试图通过结合ViT的上下文建模能力和虹膜特定优化来提高准确性,以学习鲁棒性、细粒度和判别性特征。具体而言,为了实现虹膜识别中的旋转不变性,我们对每个虹膜图像标记采用相对位置编码而不是常规的绝对位置编码,并且在训练过程中使用水平像素移动策略进行数据增强。然后,为了增强模型对局部扭曲(如遮挡和反射)的鲁棒性,我们在训练过程中随机屏蔽一些标记,以迫使模型仅从部分图像中学习具有代表性的身份特征。最后,考虑到细粒度特征在虹膜识别中更具区别性,我们保留了整个标记序列进行贴片特征匹配,而不是使用标准的单个分类标记。在三个流行的数据集上的实验表明,该框架在数据集内部和数据集之间的测试协议下都具有竞争力。
{"title":"IrisFormer: A Dedicated Transformer Framework for Iris Recognition","authors":"Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun","doi":"10.1109/LSP.2024.3522856","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522856","url":null,"abstract":"While Vision Transformer (ViT)-based methods have significantly improved the performance of various vision tasks in natural scenes, progress in iris recognition remains limited. In addition, the human iris contains unique characters that are distinct from natural scenes. To remedy this, this paper investigates a dedicated Transformer framework, termed IrisFormer, for iris recognition and attempts to improve the accuracy by combining the contextual modeling ability of ViT and iris-specific optimization to learn robust, fine-grained, and discriminative features. Specifically, to achieve rotation invariance in iris recognition, we employ relative position encoding instead of regular absolute position encoding for each iris image token, and a horizontal pixel-shifting strategy is utilized during training for data augmentation. Then, to enhance the model's robustness against local distortions such as occlusions and reflections, we randomly mask some tokens during training to force the model to learn representative identity features from only part of the image. Finally, considering that fine-grained features are more discriminative in iris recognition, we retain the entire token sequence for patch-wise feature matching instead of using the standard single classification token. Experiments on three popular datasets demonstrate that the proposed framework achieves competitive performance under both intra- and inter-dataset testing protocols.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"431-435"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronous and Asynchronous HARQ-CC Assisted SCMA Schemes 同步和异步HARQ-CC辅助SCMA方案
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3523227
Man Wang;Zheng Shi;Yunfei Li;Xianda Wu;Weiqiang Tan
This letter proposes a novel hybrid automatic repeat request with chase combining assisted sparse code multiple access (HARQ-CC-SCMA) scheme. Depending on whether the same superimposed packet is retransmitted, synchronous and asynchronous modes are considered for retransmissions. Moreover, a factor graph aggregation (FGA) method is used for multi-user detection. Specifically, a large-scale factor graph is constructed by combining all the received superimposed signals and message passing algorithm (MPA) is applied to calculate log-likelihood ratio (LLR). Monte Carlo simulations are preformed to show that FGA surpasses bit-level combining (BLC) and HARQ with incremental redundancy (HARQ-IR) in synchronous mode. Moreover, FGA performs better than BLC at high signal-to-noise ratio (SNR) region in asynchronous mode. However, FGA in asynchronous mode is worse than BLC at low SNR, because significant error propagation is induced by the presence of failed messages after the maximum allowable HARQ rounds.
本文提出了一种新的混合自动重复请求与追踪结合辅助稀疏码多址(HARQ-CC-SCMA)方案。根据是否重传相同的叠加包,重传会考虑同步和异步模式。此外,采用因子图聚合(FGA)方法进行多用户检测。具体而言,将接收到的所有叠加信号组合成一个大规模因子图,并应用消息传递算法(MPA)计算对数似然比(LLR)。蒙特卡罗仿真表明,FGA在同步模式下优于位级组合(BLC)和具有增量冗余的HARQ (HARQ- ir)。此外,在异步模式下,FGA在高信噪比(SNR)区域的性能优于BLC。然而,异步模式下的FGA比低信噪比下的BLC更差,因为在最大允许的HARQ轮之后,失败消息的存在会引起显著的错误传播。
{"title":"Synchronous and Asynchronous HARQ-CC Assisted SCMA Schemes","authors":"Man Wang;Zheng Shi;Yunfei Li;Xianda Wu;Weiqiang Tan","doi":"10.1109/LSP.2024.3523227","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523227","url":null,"abstract":"This letter proposes a novel hybrid automatic repeat request with chase combining assisted sparse code multiple access (HARQ-CC-SCMA) scheme. Depending on whether the same superimposed packet is retransmitted, synchronous and asynchronous modes are considered for retransmissions. Moreover, a factor graph aggregation (FGA) method is used for multi-user detection. Specifically, a large-scale factor graph is constructed by combining all the received superimposed signals and message passing algorithm (MPA) is applied to calculate log-likelihood ratio (LLR). Monte Carlo simulations are preformed to show that FGA surpasses bit-level combining (BLC) and HARQ with incremental redundancy (HARQ-IR) in synchronous mode. Moreover, FGA performs better than BLC at high signal-to-noise ratio (SNR) region in asynchronous mode. However, FGA in asynchronous mode is worse than BLC at low SNR, because significant error propagation is induced by the presence of failed messages after the maximum allowable HARQ rounds.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"506-510"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Dynamic Distractor-Repressed Correlation Filter for Real-Time UAV Tracking 用于无人机实时跟踪的学习动态干扰抑制相关滤波器
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3522850
Zhi Chen;Lijun Liu;Zhen Yu
With high-efficiency computing advantages and desirable tracking accuracy, discriminative correlation filters (DCFs) have been widely utilized in UAV tracking, leading to substantial progress. However, in some intricate scenarios (e.g., similar objects or backgrounds, background clutter), DCF-based trackers are prone to generating low-reliability response maps influenced by surrounding response distractors, thereby reducing tracking robustness. Furthermore, the limited computational resources and endurance on UAV platforms drive DCF-based trackers to exhibit real-time and reliable tracking performance. To address the aforementioned issues, a dynamic distractor-repressed correlation filter (DDRCF) is proposed. First, a dynamic distractor-repressed regularization is introduced into the DCF framework. Then, a new objective function is formulated to tune the penalty intensity of the distractor-repressed regularization module. Furthermore, a novel response map variation evaluation mechanism is used to dynamically tune the distractor-repressed regularization coefficient to adapt to omnipresent appearance variations. Considerable and exhaustive experiments on four prevailing UAV benchmarks, i.e., UAV123@10fps, UAVTrack112, DTB70 and UAVDT, validate that the proposed DDRCF tracker is superior to other state-of-the-art trackers. Moreover, the proposed method can achieve a tracking speed of 59 FPS on a CPU, meeting the requirements of real-time aerial tracking.
判别相关滤波器具有高效的计算优势和良好的跟踪精度,在无人机跟踪中得到了广泛的应用,取得了长足的进步。然而,在一些复杂的场景中(例如,相似的物体或背景,背景杂波),基于dcf的跟踪器容易产生受周围响应干扰因素影响的低可靠性响应图,从而降低了跟踪的鲁棒性。此外,无人机平台上有限的计算资源和续航能力驱动基于dcf的跟踪器表现出实时可靠的跟踪性能。为了解决上述问题,提出了一种动态干扰抑制相关滤波器(DDRCF)。首先,在DCF框架中引入了动态干扰抑制正则化。然后,建立了一个新的目标函数来调整干扰抑制正则化模块的惩罚强度。在此基础上,提出了一种新的响应图变化评价机制,对干扰抑制正则化系数进行动态调整,以适应无所不在的外观变化。在四种主流无人机基准(即UAV123@10fps, UAVTrack112, DTB70和UAVDT)上进行了大量详尽的实验,验证了所提出的DDRCF跟踪器优于其他最先进的跟踪器。此外,该方法在单个CPU上的跟踪速度可达59 FPS,满足实时空中跟踪的要求。
{"title":"Learning Dynamic Distractor-Repressed Correlation Filter for Real-Time UAV Tracking","authors":"Zhi Chen;Lijun Liu;Zhen Yu","doi":"10.1109/LSP.2024.3522850","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522850","url":null,"abstract":"With high-efficiency computing advantages and desirable tracking accuracy, discriminative correlation filters (DCFs) have been widely utilized in UAV tracking, leading to substantial progress. However, in some intricate scenarios (e.g., similar objects or backgrounds, background clutter), DCF-based trackers are prone to generating low-reliability response maps influenced by surrounding response distractors, thereby reducing tracking robustness. Furthermore, the limited computational resources and endurance on UAV platforms drive DCF-based trackers to exhibit real-time and reliable tracking performance. To address the aforementioned issues, a dynamic distractor-repressed correlation filter (DDRCF) is proposed. First, a dynamic distractor-repressed regularization is introduced into the DCF framework. Then, a new objective function is formulated to tune the penalty intensity of the distractor-repressed regularization module. Furthermore, a novel response map variation evaluation mechanism is used to dynamically tune the distractor-repressed regularization coefficient to adapt to omnipresent appearance variations. Considerable and exhaustive experiments on four prevailing UAV benchmarks, i.e., UAV123@10fps, UAVTrack112, DTB70 and UAVDT, validate that the proposed DDRCF tracker is superior to other state-of-the-art trackers. Moreover, the proposed method can achieve a tracking speed of 59 FPS on a CPU, meeting the requirements of real-time aerial tracking.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"616-620"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1