首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Harnessing Depth Gradients: A New Framework for Precise RGB-D Instance Segmentation 利用深度梯度:精确RGB-D实例分割的新框架
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-12 DOI: 10.1109/LSP.2025.3632238
Renjie Zhou;Qingsong Hu;Meiling Wang
To address the suboptimal fusion of depth data in RGB-D instance segmentation, we propose a novel framework with two synergistic modules. The Depth Gradient Guidance Module (DGGM) provides fine-grained boundary cues by processing an explicit depth gradient map. Concurrently, the Enhanced Depth-Sensitive Attention Module (E-DSAM) adaptively captures scene context using a lightweight predictor to make its attention mechanism dynamic. Extensive experiments on the NYUv2-48 dataset validate our approach, which achieves 26.8 mAP (a 4.1-point improvement over a strong baseline) and generates qualitatively superior masks. The code is available at https://github.com/TheoBald200814/RGB-D-Instance-Segmentation.
为了解决RGB-D实例分割中深度数据的次优融合问题,我们提出了一个具有两个协同模块的新框架。深度梯度引导模块(DGGM)通过处理显式深度梯度图提供细粒度边界线索。同时,增强型深度敏感注意模块(E-DSAM)使用轻量级预测器自适应捕获场景上下文,使其注意机制动态。在NYUv2-48数据集上进行的大量实验验证了我们的方法,该方法实现了26.8 mAP(比强基线提高4.1点),并生成了质量优越的掩模。代码可在https://github.com/TheoBald200814/RGB-D-Instance-Segmentation上获得。
{"title":"Harnessing Depth Gradients: A New Framework for Precise RGB-D Instance Segmentation","authors":"Renjie Zhou;Qingsong Hu;Meiling Wang","doi":"10.1109/LSP.2025.3632238","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632238","url":null,"abstract":"To address the suboptimal fusion of depth data in RGB-D instance segmentation, we propose a novel framework with two synergistic modules. The Depth Gradient Guidance Module (DGGM) provides fine-grained boundary cues by processing an explicit depth gradient map. Concurrently, the Enhanced Depth-Sensitive Attention Module (E-DSAM) adaptively captures scene context using a lightweight predictor to make its attention mechanism dynamic. Extensive experiments on the NYUv2-48 dataset validate our approach, which achieves 26.8 mAP (a 4.1-point improvement over a strong baseline) and generates qualitatively superior masks. The code is available at <uri>https://github.com/TheoBald200814/RGB-D-Instance-Segmentation</uri>.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4429-4433"},"PeriodicalIF":3.9,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEMFormer: Internal and External Multi-Fusion Transformer for Indoor RGB-D Semantic Segmentation IEMFormer:室内RGB-D语义分割的内部和外部多融合变压器
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631433
Kaidi Hu;Wei Li;Guangwei Gao;Ruigang Yang
Effectively fusing and complementing RGB and depth modalities while mitigating image noise is a critical challenge in the RGB-D semantic segmentation task. In this letter, we propose a novel Internal and External Multi-fusion Transformer (IEMFormer) to address this issue. IEMFormer incorporates stage-specific fusion strategies to enhance modal complementarity. For internal fusion, we integrate a fusion unit within the traditional Transformer block, combining matching tokens from both modalities on a pixel-by-pixel basis. For external fusion, the proposed External Adaptive Cross-modal Fusion (EACF) module filters dual-modal features across both spatial and channel dimensions, serving the purpose of adaptively weighting complementary channel information and robustly aggregating spatial patterns from both modalities, thereby facilitating the integration of multimodal information. Additionally, the Global Self-attention Guided Fusion (GSGF) module in the decoder refines the fused features from earlier stages, effectively suppressing noise. This is achieved by leveraging high-level semantic features to guide the refinement and incorporating an active noise suppression mechanism to prevent overfitting to dominant, noisy features. Extensive experiments on the NYUv2 and SUN RGB-D datasets demonstrate that IEMFormer achieves highly competitive performance in accurately understanding indoor scenes.
在RGB- d语义分割任务中,有效地融合和补充RGB和深度模式,同时减轻图像噪声是一个关键的挑战。在这封信中,我们提出了一种新的内部和外部多融合变压器(IEMFormer)来解决这个问题。IEMFormer结合了特定阶段的融合策略来增强模态互补性。对于内部融合,我们在传统Transformer块中集成了一个融合单元,在逐像素的基础上组合来自两种模式的匹配令牌。对于外部融合,提出的外部自适应跨模态融合(EACF)模块在空间和通道维度上过滤双模态特征,自适应加权互补通道信息,鲁棒地聚合两种模式的空间模式,从而促进多模态信息的融合。此外,解码器中的全局自关注引导融合(GSGF)模块从早期阶段细化融合特征,有效抑制噪声。这是通过利用高级语义特征来指导改进,并结合主动噪声抑制机制来防止过度拟合到主要的、有噪声的特征来实现的。在NYUv2和SUN RGB-D数据集上的大量实验表明,IEMFormer在准确理解室内场景方面取得了极具竞争力的性能。
{"title":"IEMFormer: Internal and External Multi-Fusion Transformer for Indoor RGB-D Semantic Segmentation","authors":"Kaidi Hu;Wei Li;Guangwei Gao;Ruigang Yang","doi":"10.1109/LSP.2025.3631433","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631433","url":null,"abstract":"Effectively fusing and complementing RGB and depth modalities while mitigating image noise is a critical challenge in the RGB-D semantic segmentation task. In this letter, we propose a novel Internal and External Multi-fusion Transformer (IEMFormer) to address this issue. IEMFormer incorporates stage-specific fusion strategies to enhance modal complementarity. For internal fusion, we integrate a fusion unit within the traditional Transformer block, combining matching tokens from both modalities on a pixel-by-pixel basis. For external fusion, the proposed External Adaptive Cross-modal Fusion (EACF) module filters dual-modal features across both spatial and channel dimensions, serving the purpose of adaptively weighting complementary channel information and robustly aggregating spatial patterns from both modalities, thereby facilitating the integration of multimodal information. Additionally, the Global Self-attention Guided Fusion (GSGF) module in the decoder refines the fused features from earlier stages, effectively suppressing noise. This is achieved by leveraging high-level semantic features to guide the refinement and incorporating an active noise suppression mechanism to prevent overfitting to dominant, noisy features. Extensive experiments on the NYUv2 and SUN RGB-D datasets demonstrate that IEMFormer achieves highly competitive performance in accurately understanding indoor scenes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4424-4428"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Greedy Iterative Adversarial Attack With Distortion Maps Against Deep Face Recognition 基于扭曲映射的贪婪迭代对抗性深度人脸识别
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631427
Peng Gao;Jiu-Ao Zhu;Wen-Hua Qin
Existing deep learning-based face recognition models are vulnerable to adversarial attacks due to their inherent network fragility. However, current attack methods generate adversarial examples that often suffer from low visual quality and poor transferability. To address these issues, this paper proposes a novel adversarial attack method, G-FRadv, combining greedy iteration with multi-scale distortion maps to enhance both the attack performance and the visual quality of the adversarial examples. Specifically, G-FRadv first fuses images from different scales to obtain multiple distortion maps. These maps are then partitioned, and the disturbance weight map is coupled with the iteratively sorted gradient information. Finally, the adversarial perturbations generated by different distortion maps are fused and applied to the original image. Experimental results show that the proposed G-FRadv method achieves an average attack success rate 11.38% higher than noise-based methods, and 26.53% higher than makeup-based attack methods, while maintaining better visual quality.
现有的基于深度学习的人脸识别模型由于其固有的网络脆弱性,容易受到对抗性攻击。然而,目前的攻击方法生成的对抗性示例通常存在视觉质量低和可移植性差的问题。为了解决这些问题,本文提出了一种新的对抗攻击方法G-FRadv,该方法将贪婪迭代与多尺度失真映射相结合,以提高攻击性能和对抗示例的视觉质量。具体来说,G-FRadv首先融合不同比例尺的图像,得到多个畸变图。然后对这些映射进行分割,并将扰动权重映射与迭代排序的梯度信息耦合。最后,将不同畸变映射产生的对抗性摄动融合到原始图像中。实验结果表明,G-FRadv方法在保持较好的视觉质量的同时,平均攻击成功率比基于噪声的方法高11.38%,比基于补色的方法高26.53%。
{"title":"Towards Greedy Iterative Adversarial Attack With Distortion Maps Against Deep Face Recognition","authors":"Peng Gao;Jiu-Ao Zhu;Wen-Hua Qin","doi":"10.1109/LSP.2025.3631427","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631427","url":null,"abstract":"Existing deep learning-based face recognition models are vulnerable to adversarial attacks due to their inherent network fragility. However, current attack methods generate adversarial examples that often suffer from low visual quality and poor transferability. To address these issues, this paper proposes a novel adversarial attack method, G-FRadv, combining greedy iteration with multi-scale distortion maps to enhance both the attack performance and the visual quality of the adversarial examples. Specifically, G-FRadv first fuses images from different scales to obtain multiple distortion maps. These maps are then partitioned, and the disturbance weight map is coupled with the iteratively sorted gradient information. Finally, the adversarial perturbations generated by different distortion maps are fused and applied to the original image. Experimental results show that the proposed G-FRadv method achieves an average attack success rate 11.38% higher than noise-based methods, and 26.53% higher than makeup-based attack methods, while maintaining better visual quality.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4369-4373"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RIS-Aided Channel Estimation for Multi-User MIMO mmWave Systems Under Practical Hybrid Architecture With Direct Path 直接路径下多用户MIMO毫米波系统的ris辅助信道估计
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631381
Qiuyuan Chen;Liuchang Zhuo;Taihao Zhang;Cunhua Pan;Hong Ren;Jiangzhou Wang;Ruidong Li;Changhong Wang
This paper proposes a novel channel estimation protocol for a reconfigurable intelligent surface (RIS) aided multi-user (MU) multi-input multi-output (MIMO) millimeter wave (mmWave) system under the hybrid architecture where the direct channels between the base station (BS) and user equipment (UE) exist. There are two stages respectively estimating the direct and cascaded channels. In Stage I, besides the direct channels, the angles of arrival (AoA) and the angles of departure (AoD) of the cascaded channels are also estimated. Stage II is divided into two sub-stages and the overall cascaded channels are estimated. In sub-stage I, the cascaded channel of a typical UE is estimated. In sub-stage II, the cascaded channels of all the remaining UEs are estimated. Simulation results demonstrate that the proposed method has lower pilot overhead and achieves higher accuracy than the existing benchmark approaches.
针对可重构智能表面(RIS)辅助多用户(MU)多输入多输出(MIMO)毫米波(mmWave)系统,在基站(BS)和用户设备(UE)之间存在直接信道的混合架构下,提出了一种新的信道估计协议。对直接通道和级联通道分别进行了两个阶段的估计。在第一阶段,除了直接通道外,还估计了级联通道的到达角(AoA)和离开角(AoD)。将第二阶段划分为两个子阶段,对整个级联河道进行了估计。在子阶段I中,对典型UE的级联信道进行估计。在子阶段II中,对所有剩余ue的级联信道进行估计。仿真结果表明,该方法比现有的基准方法具有更低的导频开销和更高的精度。
{"title":"RIS-Aided Channel Estimation for Multi-User MIMO mmWave Systems Under Practical Hybrid Architecture With Direct Path","authors":"Qiuyuan Chen;Liuchang Zhuo;Taihao Zhang;Cunhua Pan;Hong Ren;Jiangzhou Wang;Ruidong Li;Changhong Wang","doi":"10.1109/LSP.2025.3631381","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631381","url":null,"abstract":"This paper proposes a novel channel estimation protocol for a reconfigurable intelligent surface (RIS) aided multi-user (MU) multi-input multi-output (MIMO) millimeter wave (mmWave) system under the hybrid architecture where the direct channels between the base station (BS) and user equipment (UE) exist. There are two stages respectively estimating the direct and cascaded channels. In Stage I, besides the direct channels, the angles of arrival (AoA) and the angles of departure (AoD) of the cascaded channels are also estimated. Stage II is divided into two sub-stages and the overall cascaded channels are estimated. In sub-stage I, the cascaded channel of a typical UE is estimated. In sub-stage II, the cascaded channels of all the remaining UEs are estimated. Simulation results demonstrate that the proposed method has lower pilot overhead and achieves higher accuracy than the existing benchmark approaches.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4364-4368"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analysis of 2-D Signals With Fast Varying Instantaneous Frequencies: Extending Complex-Lag Time-Frequency Distribution 快速变化瞬时频率的二维信号分析:扩展复滞后时频分布
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631428
Xinhang Zhu;Yicheng Jiang;Zitao Liu;Yong Wang;Yun Zhang;Qinglong Hua
The radar echo of a target can be modeled as a 2-D signal whose variables are the intra-pulse sampling time (fast-time) and the inter-pulse sampling time (slow-time).The fast-time instantaneous frequency (FIF) and slow-time instantaneous frequency (SIF) of the signal are modulated by the target’s slant range. When the target undergoes complex motion, the radar echo becomes a 2-D signal with fast-varying instantaneous frequencies (IFs). IF analysis for such a signal is challenging. To solve this issue, an extending complex-lag time-frequency distribution (ECTD) is introduced. ECTD is a 3-D distribution for 2-D signals based on traditional complex-lag time-frequency distribution (CTD), and it inherits the good performance in handling fast-varying IFs.By introducing complex-lags in both the fast-time and slow-time dimensions, the ECTD can accurately estimate the SIF and FIF of a 2-D signal with fast-varying IFs. Finally, a reduced interference realization of the ECTD achieved by introducing the frequency domain filter is given. Numerical examples validate the effectiveness of the ECTD.
目标雷达回波可以建模为二维信号,其变量为脉冲内采样时间(快时间)和脉冲间采样时间(慢时间)。信号的快时瞬时频率(FIF)和慢时瞬时频率(SIF)由目标的倾斜范围调制。当目标发生复杂运动时,雷达回波成为瞬时频率快速变化的二维信号。对这样的信号进行中频分析是具有挑战性的。为了解决这一问题,引入了一种扩展的复滞后时频分布(ECTD)。ECTD是在传统复滞后时频分布(CTD)的基础上对二维信号进行的三维分布,继承了处理快速变化中频信号的优良性能。通过在快时和慢时维度引入复杂滞后,ECTD可以准确地估计具有快速变化中频的二维信号的SIF和FIF。最后,通过引入频域滤波器实现了对ECTD的抗干扰。数值算例验证了该方法的有效性。
{"title":"An Analysis of 2-D Signals With Fast Varying Instantaneous Frequencies: Extending Complex-Lag Time-Frequency Distribution","authors":"Xinhang Zhu;Yicheng Jiang;Zitao Liu;Yong Wang;Yun Zhang;Qinglong Hua","doi":"10.1109/LSP.2025.3631428","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631428","url":null,"abstract":"The radar echo of a target can be modeled as a 2-D signal whose variables are the intra-pulse sampling time (fast-time) and the inter-pulse sampling time (slow-time).The fast-time instantaneous frequency (FIF) and slow-time instantaneous frequency (SIF) of the signal are modulated by the target’s slant range. When the target undergoes complex motion, the radar echo becomes a 2-D signal with fast-varying instantaneous frequencies (IFs). IF analysis for such a signal is challenging. To solve this issue, an extending complex-lag time-frequency distribution (ECTD) is introduced. ECTD is a 3-D distribution for 2-D signals based on traditional complex-lag time-frequency distribution (CTD), and it inherits the good performance in handling fast-varying IFs.By introducing complex-lags in both the fast-time and slow-time dimensions, the ECTD can accurately estimate the SIF and FIF of a 2-D signal with fast-varying IFs. Finally, a reduced interference realization of the ECTD achieved by introducing the frequency domain filter is given. Numerical examples validate the effectiveness of the ECTD.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4384-4388"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-Based Geometric Tracking Control for Rigid Body Dynamics 基于学习的刚体动力学几何跟踪控制
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631429
Jiawei Tang;Shilei Li;Lisheng Kuang;Ling Shi
This letter investigates learning-based geometric tracking control for rigid body dynamics without precise system model parameters. Our approach leverages recent advancements in geometric optimal control and data-driven techniques to develop a learning-based tracking solution. By adopting Lie algebra formulation to transform tracking dynamics into a vector space, we estimate unknown parameters from data, achieving robust and efficient learning. Compared to existing learning-based methods, our approach ensures geometric consistency and delivers superior tracking accuracy. The simulation results validate the effectiveness of our method.
本文研究了在没有精确系统模型参数的情况下,基于学习的刚体动力学几何跟踪控制。我们的方法利用几何最优控制和数据驱动技术的最新进展来开发基于学习的跟踪解决方案。采用李代数将跟踪动力学转化为向量空间,从数据中估计未知参数,实现鲁棒高效学习。与现有的基于学习的方法相比,我们的方法确保了几何一致性,并提供了更高的跟踪精度。仿真结果验证了该方法的有效性。
{"title":"Learning-Based Geometric Tracking Control for Rigid Body Dynamics","authors":"Jiawei Tang;Shilei Li;Lisheng Kuang;Ling Shi","doi":"10.1109/LSP.2025.3631429","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631429","url":null,"abstract":"This letter investigates learning-based geometric tracking control for rigid body dynamics without precise system model parameters. Our approach leverages recent advancements in geometric optimal control and data-driven techniques to develop a learning-based tracking solution. By adopting Lie algebra formulation to transform tracking dynamics into a vector space, we estimate unknown parameters from data, achieving robust and efficient learning. Compared to existing learning-based methods, our approach ensures geometric consistency and delivers superior tracking accuracy. The simulation results validate the effectiveness of our method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4419-4423"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Attention-Based Network for Self-Supervised Point Cloud Completion 基于对比注意的自监督点云补全网络
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-11 DOI: 10.1109/LSP.2025.3631424
Seema Kumari;Preyum Kumar;Srimanta Mandal;Shanmuganathan Raman
Point cloud completion aims to reconstruct complete 3D shapes from partial observations, often requiring multiple views or complete data for training. In this paper, we propose an attention-driven, self-supervised autoencoder network that completes 3D point clouds from a single partial observation. Multi-head self-attention captures robust contextual relationships, while residual connections in the autoencoder enhance geometric feature learning. In addition to this, we incorporate a contrastive learning-based loss, which encourages the network to better distinguish structural patterns even in highly incomplete observations. Experimental results on benchmark datasets demonstrate that the proposed approach achieves state-of-the-art performance in single-view point cloud completion.
点云补全旨在从部分观测中重建完整的3D形状,通常需要多个视图或完整的数据进行训练。在本文中,我们提出了一个注意力驱动的、自监督的自编码器网络,它从单个部分观测完成三维点云。多头自注意捕获了强大的上下文关系,而自编码器中的剩余连接增强了几何特征学习。除此之外,我们还结合了基于对比学习的损失,这鼓励网络在高度不完整的观察中更好地区分结构模式。在基准数据集上的实验结果表明,该方法在单视图点云补全中达到了最先进的性能。
{"title":"Contrastive Attention-Based Network for Self-Supervised Point Cloud Completion","authors":"Seema Kumari;Preyum Kumar;Srimanta Mandal;Shanmuganathan Raman","doi":"10.1109/LSP.2025.3631424","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631424","url":null,"abstract":"Point cloud completion aims to reconstruct complete 3D shapes from partial observations, often requiring multiple views or complete data for training. In this paper, we propose an attention-driven, self-supervised autoencoder network that completes 3D point clouds from a single partial observation. Multi-head self-attention captures robust contextual relationships, while residual connections in the autoencoder enhance geometric feature learning. In addition to this, we incorporate a contrastive learning-based loss, which encourages the network to better distinguish structural patterns even in highly incomplete observations. Experimental results on benchmark datasets demonstrate that the proposed approach achieves state-of-the-art performance in single-view point cloud completion.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4444-4448"},"PeriodicalIF":3.9,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RF-REN: RGB-Frequency Relation Exploration Network for Micro-Expression Recognition rgb -微表情识别频率关系探索网络
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-06 DOI: 10.1109/LSP.2025.3630085
Jiateng Liu;Hengcan Shi;Yaonan Wang;Yuan Zong
Micro-expression recognition (MER) has drawn increasing attention in recent years due to its ability to reveal the true feelings people want to hide. The key challenge in MER is subtle motions, which are hard to capture but crucial for MER. Existing methods usually solve this problem by magnifying all motions in the whole face and temporal sequence. However, micro-expressions (MEs) only involve a few facial areas and several temporal snippets. The all-motion magnification in previous methods cannot precisely capture these local ME motion patterns, and can easily cause spatial as well as temporal distortions, which significantly decrease the MER accuracy. In this paper, we propose an RGB-Frequency Relation Exploration Network (RF-REN), which enhances the subtle motions in refined local ME cues by exploring spatial and temporal relations in both RGB and frequency domains. Specifically, we first decompose the ME video into RGB as well as frequency domains, and conduct temporal division according to different motion stages to cover various ME local patterns. Secondly, we construct an adaptive local-global relation exploration (LGRE) module to explore the local relation cues in the spatial appearance and temporal dynamics in both domains. Finally, we propose an RGB-Frequency routing strategy to fuse the RGB and frequency cues, aiming to aggregate spatial-temporal local-global information and enhance subtle motions for MER. Extensive experiments on three databases (CASME II, SAMM and SMIC) show that the proposed model outperforms other state-of-the-art methods.
近年来,微表情识别(MER)因其能够揭示人们想要隐藏的真实情感而受到越来越多的关注。MER的关键挑战是微妙的运动,这很难捕捉,但对MER至关重要。现有的方法通常是通过放大整个面部的所有运动和时间序列来解决这个问题。然而,微表情(MEs)只涉及少数面部区域和几个时间片段。以往的全运动放大方法不能精确捕捉这些局部运动模式,并且容易造成空间和时间畸变,从而大大降低了市场动力学的精度。在本文中,我们提出了一个RGB-频率关系探索网络(RF-REN),该网络通过探索RGB和频域的空间和时间关系来增强精细局部ME线索中的细微运动。具体来说,我们首先将ME视频分解为RGB域和频域,并根据不同的运动阶段进行时间分割,以覆盖ME的各种局部模式。其次,我们构建了一个自适应局部-全局关系探索(LGRE)模块,以探索两个领域的空间外观和时间动态中的局部关系线索。最后,我们提出了一种RGB- frequency路由策略来融合RGB和频率线索,旨在聚合时空局部全局信息并增强MER的细微运动。在三个数据库(CASME II, SAMM和SMIC)上进行的大量实验表明,所提出的模型优于其他最先进的方法。
{"title":"RF-REN: RGB-Frequency Relation Exploration Network for Micro-Expression Recognition","authors":"Jiateng Liu;Hengcan Shi;Yaonan Wang;Yuan Zong","doi":"10.1109/LSP.2025.3630085","DOIUrl":"https://doi.org/10.1109/LSP.2025.3630085","url":null,"abstract":"Micro-expression recognition (MER) has drawn increasing attention in recent years due to its ability to reveal the true feelings people want to hide. The key challenge in MER is subtle motions, which are hard to capture but crucial for MER. Existing methods usually solve this problem by magnifying all motions in the whole face and temporal sequence. However, micro-expressions (MEs) only involve a few facial areas and several temporal snippets. The all-motion magnification in previous methods cannot precisely capture these local ME motion patterns, and can easily cause spatial as well as temporal distortions, which significantly decrease the MER accuracy. In this paper, we propose an RGB-Frequency Relation Exploration Network (RF-REN), which enhances the subtle motions in refined local ME cues by exploring spatial and temporal relations in both RGB and frequency domains. Specifically, we first decompose the ME video into RGB as well as frequency domains, and conduct temporal division according to different motion stages to cover various ME local patterns. Secondly, we construct an adaptive local-global relation exploration (LGRE) module to explore the local relation cues in the spatial appearance and temporal dynamics in both domains. Finally, we propose an RGB-Frequency routing strategy to fuse the RGB and frequency cues, aiming to aggregate spatial-temporal local-global information and enhance subtle motions for MER. Extensive experiments on three databases (CASME II, SAMM and SMIC) show that the proposed model outperforms other state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4439-4443"},"PeriodicalIF":3.9,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ABFE-Net: Attention-Based Feature Enhancement Network for Few-Shot Point Cloud Classification ABFE-Net:基于注意力的点云分类特征增强网络
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-03 DOI: 10.1109/LSP.2025.3627853
Kaidi Hu;Mao Ye;Yi Wu;Wei Li;Ruigang Yang
Few-shot 3D point cloud classification has attracted significant attention due to the challenge of acquiring large-scale labeled data. Existing methods often employ network backbones tailored for fully-supervised learning, which can lead to suboptimal performance in few-shot settings. To tackle these limitations, we propose ABFE-Net, a novel method for point cloud classification with few-shot learning principles. We comprehensively summarize the drawbacks of existing network architectures into four aspects: contextual information loss, channel redundancy, overfitting, and insufficient hidden feature extraction. Accordingly, we design novel modules, such as the Attention-based Dilated Mix-up Module (ADMM) and Attention-based Comprehensive Feature Learning (ACFL), to enhance the network by addressing those issues effectively. Experiments on multiple public datasets demonstrate that ABFE-Net achieves state-of-the-art performance with superior generalization.
由于获取大规模标记数据的挑战,少镜头三维点云分类引起了人们的广泛关注。现有的方法通常采用为完全监督学习量身定制的网络骨干,这可能导致在少数镜头设置下的性能不佳。为了解决这些限制,我们提出了ABFE-Net,一种基于少镜头学习原理的点云分类新方法。我们从上下文信息丢失、信道冗余、过拟合和隐藏特征提取不足四个方面全面总结了现有网络架构的缺陷。因此,我们设计了新颖的模块,如基于注意的扩展混合模块(ADMM)和基于注意的综合特征学习(ACFL),通过有效地解决这些问题来增强网络。在多个公共数据集上的实验表明,ABFE-Net具有优异的泛化性能。
{"title":"ABFE-Net: Attention-Based Feature Enhancement Network for Few-Shot Point Cloud Classification","authors":"Kaidi Hu;Mao Ye;Yi Wu;Wei Li;Ruigang Yang","doi":"10.1109/LSP.2025.3627853","DOIUrl":"https://doi.org/10.1109/LSP.2025.3627853","url":null,"abstract":"Few-shot 3D point cloud classification has attracted significant attention due to the challenge of acquiring large-scale labeled data. Existing methods often employ network backbones tailored for fully-supervised learning, which can lead to suboptimal performance in few-shot settings. To tackle these limitations, we propose ABFE-Net, a novel method for point cloud classification with few-shot learning principles. We comprehensively summarize the drawbacks of existing network architectures into four aspects: contextual information loss, channel redundancy, overfitting, and insufficient hidden feature extraction. Accordingly, we design novel modules, such as the Attention-based Dilated Mix-up Module (ADMM) and Attention-based Comprehensive Feature Learning (ACFL), to enhance the network by addressing those issues effectively. Experiments on multiple public datasets demonstrate that ABFE-Net achieves state-of-the-art performance with superior generalization.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4414-4418"},"PeriodicalIF":3.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISTD-DLA: Industrial Scene Text Detection Method Based on Dynamic Local-Aware Aggregation Network 基于动态本地感知聚合网络的工业场景文本检测方法
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-31 DOI: 10.1109/LSP.2025.3627114
Mingdi Hu;Yize Yang;Helin Yu;Bingyi Jing
Industrial scene text detection is challenging due to cluttered backgrounds, rust occlusions, and arbitrary orientations. We introduce ISTD-DLA, a dynamic local-aware aggregation network for industrial text detection. The framework integrates two synergistic components: (i) a dynamic local-aware feature learner that fuses shape-aware and Bayar convolutions to enrich fine-grained structural cues; and (ii) a local feature aggregation module that forms superpixel-based proposals and uses cross-attention to iteratively exchange context between pixels and superpixels, enabling more precise localization in complex scenes. To maintain efficiency, we prune implicit-mapping submodules at inference, reducing complexity without degrading accuracy. On the MPSC benchmark, ISTD-DLA attains an F-measure of 86.2% at 32 FPS, demonstrating a favorable accuracy–efficiency trade-off and robust practicality for industrial applications.
工业场景文本检测是具有挑战性的,由于杂乱的背景,铁锈遮挡,和任意方向。介绍了一种用于工业文本检测的动态本地感知聚合网络ISTD-DLA。该框架集成了两个协同组件:(i)一个动态的局部感知特征学习器,它融合了形状感知和Bayar卷积来丰富细粒度的结构线索;(ii)局部特征聚合模块,形成基于超像素的建议,并利用交叉关注在像素和超像素之间迭代交换上下文,从而在复杂场景中实现更精确的定位。为了保持效率,我们在推理时修剪隐式映射子模块,在不降低精度的情况下降低复杂性。在MPSC基准测试中,ISTD-DLA在32 FPS下的f值达到86.2%,证明了良好的精度-效率权衡和工业应用的强大实用性。
{"title":"ISTD-DLA: Industrial Scene Text Detection Method Based on Dynamic Local-Aware Aggregation Network","authors":"Mingdi Hu;Yize Yang;Helin Yu;Bingyi Jing","doi":"10.1109/LSP.2025.3627114","DOIUrl":"https://doi.org/10.1109/LSP.2025.3627114","url":null,"abstract":"Industrial scene text detection is challenging due to cluttered backgrounds, rust occlusions, and arbitrary orientations. We introduce ISTD-DLA, a dynamic local-aware aggregation network for industrial text detection. The framework integrates two synergistic components: (i) a dynamic local-aware feature learner that fuses shape-aware and Bayar convolutions to enrich fine-grained structural cues; and (ii) a local feature aggregation module that forms superpixel-based proposals and uses cross-attention to iteratively exchange context between pixels and superpixels, enabling more precise localization in complex scenes. To maintain efficiency, we prune implicit-mapping submodules at inference, reducing complexity without degrading accuracy. On the MPSC benchmark, ISTD-DLA attains an F-measure of 86.2% at 32 FPS, demonstrating a favorable accuracy–efficiency trade-off and robust practicality for industrial applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4264-4268"},"PeriodicalIF":3.9,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1