首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
UIQA-MSST: Multi-Scale Staircase-Transformer Fusion for Underwater Image Quality Assessment 多尺度阶梯-变压器融合水下图像质量评估
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-01 Epub Date: 2026-01-06 DOI: 10.1016/j.image.2026.117479
Tianhai Chen , Xichen Yang , Tianshu Wang , Shun Zhu , Yan Zhang , Zhongyuan Mao , Nengxin Li
Underwater images play a crucial role in underwater exploration and resource development, but their quality often degrades in complex underwater scenarios. However, existing methods mainly focus on specific scenarios, and exhibit limited generalization ability when addressing complex underwater scenarios. Enhancing their applicability is therefore essential for accurate quality assessment of underwater images across diverse scenarios. This paper proposes an Underwater Image Quality Assessment (UIQA) method that combines the advantages of staircase network and Transformer, focusing on efficiently capturing and integrating image features at different scales. Initially, multi-scale feature extraction is performed to obtain information from images at various levels. Following this, a Staircase Feature (SF) module progressively integrates features from shallow to deep layers, achieving fusion of cross-scale information. Additionally, the Cross-Scale Transformer (CST) module effectively merges information from multiple scales using self-attention mechanisms. By concatenating the output features of both modules, the model gains an understanding of image content across global and local ranges. Subsequently, a regression module is utilized to generate quality scores. Finally, meta-learning optimizes the model’s learning process, enabling adaptation to new data for accurate image quality prediction across diverse scenarios. Experiments show superior accuracy and stability on underwater datasets, with additional tests on natural scenes that demonstrate broader applicability. Cross-dataset experiments validate the generalization capability of the proposed method. The source code will be made available at https://github.com/dart-into/UIQA-MSST.
水下图像在水下勘探和资源开发中发挥着至关重要的作用,但在复杂的水下场景下,图像质量往往会下降。然而,现有的方法主要集中在特定的场景上,在处理复杂的水下场景时,泛化能力有限。因此,提高其适用性对于在不同场景下准确评估水下图像的质量至关重要。本文提出了一种结合阶梯网络和Transformer优点的水下图像质量评估(UIQA)方法,重点关注不同尺度下图像特征的高效捕获和整合。首先进行多尺度特征提取,从不同层次的图像中获取信息。随后,楼梯特征(SF)模块从浅层到深层逐步整合特征,实现跨尺度信息融合。此外,跨尺度变压器(CST)模块使用自关注机制有效地合并来自多个尺度的信息。通过连接两个模块的输出特征,该模型可以理解全局和局部范围内的图像内容。随后,利用回归模块生成质量分数。最后,元学习优化了模型的学习过程,使其能够适应新数据,从而在不同场景下进行准确的图像质量预测。实验表明,在水下数据集上具有卓越的准确性和稳定性,在自然场景上进行的额外测试显示了更广泛的适用性。跨数据集实验验证了该方法的泛化能力。源代码将在https://github.com/dart-into/UIQA-MSST上提供。
{"title":"UIQA-MSST: Multi-Scale Staircase-Transformer Fusion for Underwater Image Quality Assessment","authors":"Tianhai Chen ,&nbsp;Xichen Yang ,&nbsp;Tianshu Wang ,&nbsp;Shun Zhu ,&nbsp;Yan Zhang ,&nbsp;Zhongyuan Mao ,&nbsp;Nengxin Li","doi":"10.1016/j.image.2026.117479","DOIUrl":"10.1016/j.image.2026.117479","url":null,"abstract":"<div><div>Underwater images play a crucial role in underwater exploration and resource development, but their quality often degrades in complex underwater scenarios. However, existing methods mainly focus on specific scenarios, and exhibit limited generalization ability when addressing complex underwater scenarios. Enhancing their applicability is therefore essential for accurate quality assessment of underwater images across diverse scenarios. This paper proposes an Underwater Image Quality Assessment (UIQA) method that combines the advantages of staircase network and Transformer, focusing on efficiently capturing and integrating image features at different scales. Initially, multi-scale feature extraction is performed to obtain information from images at various levels. Following this, a Staircase Feature (SF) module progressively integrates features from shallow to deep layers, achieving fusion of cross-scale information. Additionally, the Cross-Scale Transformer (CST) module effectively merges information from multiple scales using self-attention mechanisms. By concatenating the output features of both modules, the model gains an understanding of image content across global and local ranges. Subsequently, a regression module is utilized to generate quality scores. Finally, meta-learning optimizes the model’s learning process, enabling adaptation to new data for accurate image quality prediction across diverse scenarios. Experiments show superior accuracy and stability on underwater datasets, with additional tests on natural scenes that demonstrate broader applicability. Cross-dataset experiments validate the generalization capability of the proposed method. The source code will be made available at <span><span>https://github.com/dart-into/UIQA-MSST</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117479"},"PeriodicalIF":2.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CS-YOLO:A small object detection model based on YOLO for UAV aerial photography CS-YOLO:基于YOLO的无人机航拍小目标检测模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-01 Epub Date: 2025-12-24 DOI: 10.1016/j.image.2025.117460
Rui Fan , Renhao Jiao , Weigui Nan , Haitao Meng , Abin Jiang , Xiaojia Yang , Zhiqiang Zhao , Jin Dang , Zhixue Wang , Yanshan Tian , Baiying Dong , Xiaowei He , Xiaoli Luo
With the rapid development of the UAV industry, the application of object detection technology based on UAV aerial images is becoming more and more extensive. However, the target of UAV aerial image is small, dense and disturbed by complex environment, which makes object detection face great challenges. In order to solve the problems of dense small targets and strong background interference in UAV aerial images, we propose a YOLO-based UAV aerial image detection model-Content-Conscious and Scale-Sensitive (CS-YOLO). Unlike existing YOLO-based approaches, our contribution lies in the joint design of Bottleneck Attention Module-cross-stage partial (BAM-CSP), Multi-Scale Pooling Attention Fusion Module (MPAFM) and Feature Difference Fusion Module (FDFM). The BAM-CSP module significantly enhances the small target feature response by integrating the channel attention mechanism at the bottleneck layer of the cross-stage partial network; the MPAFM module adopts a multi-scale pooling attention fusion architecture, which suppresses complex background interference through parallel pooling and enhances the background perception ability of small targets. The FDFM module captures the information changes during the sampling process through the feature difference fusion mechanism. The Gradient Adaptive-Efficient IoU (GA-EIoU) loss function is introduced to optimize bounding box regression performance by incorporating the EIoU gradient constraint weighting mechanism. Comparative experiments on the VisDrone2019 dataset, CS-YOLO achieves 22.6% mAP@50:95, which is 2.7% higher than YOLO11n; on the HazyDet dataset, CS-YOLO achieved 53.8% mAP@50:95, an increase of 2.8%. CS-YOLO also comprehensively surpasses the existing advanced methods in terms of recall rate and robustness. Meanwhile, we conducted ablation experiments to verify the gain effect of each module on the detection performance. The model effectively solves the technical problems such as dense small targets and strong environmental interference in UAV aerial images, and provides a high-precision, real-time and reliable detection scheme for complex tasks such as UAV inspection. The source code will be available at https://github.com/unscfr/CS-YOLO.
随着无人机产业的快速发展,基于无人机航拍图像的目标检测技术的应用越来越广泛。然而,无人机航拍图像的目标体积小、密度大,且受复杂环境的干扰,使得目标检测面临很大挑战。为了解决无人机航测图像中小目标密集、背景干扰强的问题,提出了一种基于yolo的无人机航测图像检测模型——内容意识和尺度敏感(CS-YOLO)。与现有的基于yolo的方法不同,我们的贡献在于瓶颈注意力模块-跨阶段部分(BAM-CSP),多尺度池注意力融合模块(MPAFM)和特征差异融合模块(FDFM)的联合设计。BAM-CSP模块通过在跨级局部网络的瓶颈层集成信道注意机制,显著增强了小目标特征响应;MPAFM模块采用多尺度池化注意力融合架构,通过并行池化抑制复杂背景干扰,增强小目标的背景感知能力。FDFM模块通过特征差异融合机制捕获采样过程中的信息变化。引入梯度自适应高效IoU (GA-EIoU)损失函数,结合EIoU梯度约束加权机制优化边界盒回归性能。在VisDrone2019数据集上对比实验,CS-YOLO达到22.6% mAP@50:95,比YOLO11n高2.7%;在HazyDet数据集上,CS-YOLO达到53.8% mAP@50:95,提高2.8%。CS-YOLO在召回率和鲁棒性方面也全面超越了现有的先进方法。同时,我们进行了烧蚀实验,验证了各模块的增益效应对检测性能的影响。该模型有效解决了无人机航拍图像中小目标密集、环境干扰强等技术难题,为无人机巡检等复杂任务提供了高精度、实时、可靠的检测方案。源代码可从https://github.com/unscfr/CS-YOLO获得。
{"title":"CS-YOLO:A small object detection model based on YOLO for UAV aerial photography","authors":"Rui Fan ,&nbsp;Renhao Jiao ,&nbsp;Weigui Nan ,&nbsp;Haitao Meng ,&nbsp;Abin Jiang ,&nbsp;Xiaojia Yang ,&nbsp;Zhiqiang Zhao ,&nbsp;Jin Dang ,&nbsp;Zhixue Wang ,&nbsp;Yanshan Tian ,&nbsp;Baiying Dong ,&nbsp;Xiaowei He ,&nbsp;Xiaoli Luo","doi":"10.1016/j.image.2025.117460","DOIUrl":"10.1016/j.image.2025.117460","url":null,"abstract":"<div><div>With the rapid development of the UAV industry, the application of object detection technology based on UAV aerial images is becoming more and more extensive. However, the target of UAV aerial image is small, dense and disturbed by complex environment, which makes object detection face great challenges. In order to solve the problems of dense small targets and strong background interference in UAV aerial images, we propose a YOLO-based UAV aerial image detection model-Content-Conscious and Scale-Sensitive (CS-YOLO). Unlike existing YOLO-based approaches, our contribution lies in the joint design of Bottleneck Attention Module-cross-stage partial (BAM-CSP), Multi-Scale Pooling Attention Fusion Module (MPAFM) and Feature Difference Fusion Module (FDFM). The BAM-CSP module significantly enhances the small target feature response by integrating the channel attention mechanism at the bottleneck layer of the cross-stage partial network; the MPAFM module adopts a multi-scale pooling attention fusion architecture, which suppresses complex background interference through parallel pooling and enhances the background perception ability of small targets. The FDFM module captures the information changes during the sampling process through the feature difference fusion mechanism. The Gradient Adaptive-Efficient IoU (GA-EIoU) loss function is introduced to optimize bounding box regression performance by incorporating the EIoU gradient constraint weighting mechanism. Comparative experiments on the VisDrone2019 dataset, CS-YOLO achieves 22.6% mAP@50:95, which is 2.7% higher than YOLO11n; on the HazyDet dataset, CS-YOLO achieved 53.8% mAP@50:95, an increase of 2.8%. CS-YOLO also comprehensively surpasses the existing advanced methods in terms of recall rate and robustness. Meanwhile, we conducted ablation experiments to verify the gain effect of each module on the detection performance. The model effectively solves the technical problems such as dense small targets and strong environmental interference in UAV aerial images, and provides a high-precision, real-time and reliable detection scheme for complex tasks such as UAV inspection. The source code will be available at <span><span>https://github.com/unscfr/CS-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117460"},"PeriodicalIF":2.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single object tracking based on Spatio-Temporal information 基于时空信息的单目标跟踪
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-01 Epub Date: 2025-12-24 DOI: 10.1016/j.image.2025.117463
Lixin Wei , Yun Luo , Rongzhe Zhu , Xin Li
To address the challenge of tracking difficulties due to the absence of temporal dynamic information and background clutter interference caused by similar backgrounds, similar objects, target occlusion, and illumination changes during target tracking, this paper proposes a single object tracking algorithm based on spatio-temporal information (SST). The algorithm integrates a Temporal Adaptive Module (TAM) into the backbone network to generate a temporal kernel based on feature maps. This endows the network with the capability to model temporal dynamics, effectively utilizing the temporal relationships between frames to handle complex temporal dynamics such as changes in target motion states and environmental conditions. Additionally, to mitigate background clutter interference, the algorithm employs a Mixed Local Channel Attention (MLCA) mechanism, which captures channel and spatial information to focus the network on the target and reduce the impact of interfering information. The proposed algorithm was evaluated on OTB100, LaSOT, and NFS datasets. It achieved an AUC score of 70.7% on OTB, which represents a 1.3% improvement over the baseline tracker. On LaSOT and NFS datasets, it obtained AUC scores of 65.1% and 65.9%, respectively, showing improvements of 0.2% compared to the baseline tracker. The tracking speed exceeds 80fps, and the performance of the SST algorithm has been verified on self-made videos. The code is available at https://github.com/xuexiaodemenggubao/sst.
针对目标跟踪过程中由于背景相似、目标相似、目标遮挡、光照变化等原因导致的时间动态信息缺失和背景杂波干扰等问题,提出了一种基于时空信息(SST)的单目标跟踪算法。该算法将时序自适应模块(TAM)集成到主干网中,基于特征映射生成时序核。这赋予了网络建模时间动态的能力,有效地利用帧之间的时间关系来处理复杂的时间动态,如目标运动状态和环境条件的变化。此外,为了减轻背景杂波干扰,该算法采用了混合本地信道注意(MLCA)机制,该机制捕获信道和空间信息,使网络集中在目标上,减少干扰信息的影响。在OTB100、LaSOT和NFS数据集上对该算法进行了评估。它在OTB上的AUC得分为70.7%,比基线跟踪器提高了1.3%。在LaSOT和NFS数据集上,它的AUC得分分别为65.1%和65.9%,与基线跟踪器相比提高了0.2%。跟踪速度超过80fps,并在自制视频上验证了SST算法的性能。代码可在https://github.com/xuexiaodemenggubao/sst上获得。
{"title":"Single object tracking based on Spatio-Temporal information","authors":"Lixin Wei ,&nbsp;Yun Luo ,&nbsp;Rongzhe Zhu ,&nbsp;Xin Li","doi":"10.1016/j.image.2025.117463","DOIUrl":"10.1016/j.image.2025.117463","url":null,"abstract":"<div><div>To address the challenge of tracking difficulties due to the absence of temporal dynamic information and background clutter interference caused by similar backgrounds, similar objects, target occlusion, and illumination changes during target tracking, this paper proposes a single object tracking algorithm based on spatio-temporal information (SST). The algorithm integrates a Temporal Adaptive Module (TAM) into the backbone network to generate a temporal kernel based on feature maps. This endows the network with the capability to model temporal dynamics, effectively utilizing the temporal relationships between frames to handle complex temporal dynamics such as changes in target motion states and environmental conditions. Additionally, to mitigate background clutter interference, the algorithm employs a Mixed Local Channel Attention (MLCA) mechanism, which captures channel and spatial information to focus the network on the target and reduce the impact of interfering information. The proposed algorithm was evaluated on OTB100, LaSOT, and NFS datasets. It achieved an AUC score of 70.7% on OTB, which represents a 1.3% improvement over the baseline tracker. On LaSOT and NFS datasets, it obtained AUC scores of 65.1% and 65.9%, respectively, showing improvements of 0.2% compared to the baseline tracker. The tracking speed exceeds 80fps, and the performance of the SST algorithm has been verified on self-made videos. The code is available at <span><span>https://github.com/xuexiaodemenggubao/sst</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117463"},"PeriodicalIF":2.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new baseline for edge detection: Make encoder–decoder great again 边缘检测的新基线:使编码器-解码器再次伟大
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-01 Epub Date: 2026-01-14 DOI: 10.1016/j.image.2026.117485
Yachuan Li , Xavier Soria Poma , Yongke Xi , Guanlin Li , Chaozhi Yang , Qian Xiao , Yun Bai , Zongmin Li
The performance of deep learning based edge detectors has surpassed human performance, but the huge computational cost and complex training strategies hinder their further development and application. In this paper, we alleviate these complexities with a vanilla encoder–decoder based detector. Firstly, we design a bilateral encoder to decouple the extraction process of spatial features and semantic features. As the spatial branch no longer guides the semantic branch, feature richness can be reduced, enabling a more compact model design. We propose a cascaded feature fusion decoder, where the spatial features are progressively refined by semantic features. The refined spatial features are the only basis for generating the edge map. The coarse original spatial features and semantic features are avoided from direct contact with the final result. So the noise in the spatial features and the location error in the semantic features can be suppressed in the generated edge map. The proposed New Baseline for Edge Detection (NBED) achieves superior performance consistently across multiple edge detection benchmarks, even compared with those methods with huge computational costs and complex training strategies. The ODS of NBED on BSDS500 is 0.838, achieving state-of-the-art performance. Our study highlights that high-quality features are key to modern edge detection, and encoder–decoder based detectors can achieve excellent performance without complex training or heavy computation. Furthermore, we take retinal vessel segmentation as an example to explore the application of NBED in downstream tasks. The code is available at https://github.com/Li-yachuan/NBED.
基于深度学习的边缘检测器的性能已经超越了人类的性能,但巨大的计算成本和复杂的训练策略阻碍了它们的进一步发展和应用。在本文中,我们使用基于普通编码器-解码器的检测器来减轻这些复杂性。首先,我们设计了一个双边编码器来解耦空间特征和语义特征的提取过程。由于空间分支不再引导语义分支,可以减少特征丰富度,使模型设计更加紧凑。我们提出了一种级联特征融合解码器,其中空间特征通过语义特征逐步细化。精细的空间特征是生成边缘图的唯一依据。避免了粗糙的原始空间特征和语义特征与最终结果的直接接触。因此,生成的边缘图可以有效地抑制空间特征中的噪声和语义特征中的位置误差。即使与那些具有巨大计算成本和复杂训练策略的方法相比,所提出的边缘检测新基线(NBED)在多个边缘检测基准上也具有一致的优越性能。NBED在BSDS500上的ODS为0.838,达到了最先进的性能。我们的研究强调了高质量的特征是现代边缘检测的关键,基于编码器-解码器的检测器可以在不需要复杂训练或大量计算的情况下获得出色的性能。此外,我们以视网膜血管分割为例,探讨NBED在下游任务中的应用。代码可在https://github.com/Li-yachuan/NBED上获得。
{"title":"A new baseline for edge detection: Make encoder–decoder great again","authors":"Yachuan Li ,&nbsp;Xavier Soria Poma ,&nbsp;Yongke Xi ,&nbsp;Guanlin Li ,&nbsp;Chaozhi Yang ,&nbsp;Qian Xiao ,&nbsp;Yun Bai ,&nbsp;Zongmin Li","doi":"10.1016/j.image.2026.117485","DOIUrl":"10.1016/j.image.2026.117485","url":null,"abstract":"<div><div>The performance of deep learning based edge detectors has surpassed human performance, but the huge computational cost and complex training strategies hinder their further development and application. In this paper, we alleviate these complexities with a vanilla encoder–decoder based detector. Firstly, we design a bilateral encoder to decouple the extraction process of spatial features and semantic features. As the spatial branch no longer guides the semantic branch, feature richness can be reduced, enabling a more compact model design. We propose a cascaded feature fusion decoder, where the spatial features are progressively refined by semantic features. The refined spatial features are the only basis for generating the edge map. The coarse original spatial features and semantic features are avoided from direct contact with the final result. So the noise in the spatial features and the location error in the semantic features can be suppressed in the generated edge map. The proposed New Baseline for Edge Detection (NBED) achieves superior performance consistently across multiple edge detection benchmarks, even compared with those methods with huge computational costs and complex training strategies. The ODS of NBED on BSDS500 is 0.838, achieving state-of-the-art performance. Our study highlights that high-quality features are key to modern edge detection, and encoder–decoder based detectors can achieve excellent performance without complex training or heavy computation. Furthermore, we take retinal vessel segmentation as an example to explore the application of NBED in downstream tasks. The code is available at <span><span>https://github.com/Li-yachuan/NBED</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117485"},"PeriodicalIF":2.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization model for sign language recognition using hybrid convolution networks 基于混合卷积网络的手语识别优化模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-11-25 DOI: 10.1016/j.image.2025.117444
S. Venkatesh , Pravin R. Kshirsagar , R. Thiagarajan , Tan Kuan Tak , B. Sivaneasan
Sign Language Gestures Recognition model seeks to provide effective communication for people by converting their sign language motions into spoken or written language, allowing them to connect with non-signers. The deep features are extracted using Vision Transformer-YOLOv5 (ViT-YOLOv5), this model is employed to extract the Regions of Interest (ROI) from the images to generate the first set of features as F1. Concurrently, Scale-Invariant Feature Transform (SIFT) is used to extract a second set of features as F2 from the same images. The two extracted features are fed into a Hybrid Convolution-based Adaptive EfficientB7 Network (HCA-EfB7N). In this network, F1 is processed using 1D convolution, while F2 is processed using 2D convolution to obtain the recognized outcome. By utilizing both 1D and 2D convolutions, this proposed model accurately identifies the class of hand gestures, leading to more accurate recognition. The parameters of the HCA-EffB7 network are optimized using the Fitness-based Archimedes Optimization Algorithm (FAOA). This hybrid approach recognizes complex hand gestures, particularly in the sign language translation system. The proposed approach’s effectiveness is validated by comparing its performance against several baseline systems, thereby confirming its superiority and robustness in recognizing sign language gestures.
手语手势识别模型旨在通过将人们的手势动作转换为口头或书面语言,使他们能够与不使用手语的人进行交流,从而为人们提供有效的沟通。使用Vision Transformer-YOLOv5 (viti - yolov5)提取深度特征,利用该模型从图像中提取感兴趣区域(ROI),生成第一组特征作为F1。同时,使用尺度不变特征变换(SIFT)从相同的图像中提取第二组特征作为F2。将提取的两个特征输入到基于混合卷积的自适应高效b7网络(HCA-EfB7N)中。在该网络中,F1采用一维卷积处理,F2采用二维卷积处理,得到识别结果。通过使用一维和二维卷积,该模型可以准确地识别手势的类别,从而实现更准确的识别。采用基于适应度的阿基米德优化算法(FAOA)对HCA-EffB7网络参数进行优化。这种混合方法可以识别复杂的手势,特别是在手语翻译系统中。通过与几个基准系统的性能比较,验证了该方法的有效性,从而证实了该方法在识别手语手势方面的优越性和鲁棒性。
{"title":"Optimization model for sign language recognition using hybrid convolution networks","authors":"S. Venkatesh ,&nbsp;Pravin R. Kshirsagar ,&nbsp;R. Thiagarajan ,&nbsp;Tan Kuan Tak ,&nbsp;B. Sivaneasan","doi":"10.1016/j.image.2025.117444","DOIUrl":"10.1016/j.image.2025.117444","url":null,"abstract":"<div><div>Sign Language Gestures Recognition model seeks to provide effective communication for people by converting their sign language motions into spoken or written language, allowing them to connect with non-signers. The deep features are extracted using Vision Transformer-YOLOv5 (ViT-YOLOv5), this model is employed to extract the Regions of Interest (ROI) from the images to generate the first set of features as F1. Concurrently, Scale-Invariant Feature Transform (SIFT) is used to extract a second set of features as F2 from the same images. The two extracted features are fed into a Hybrid Convolution-based Adaptive EfficientB7 Network (HCA-EfB7N). In this network, F1 is processed using 1D convolution, while F2 is processed using 2D convolution to obtain the recognized outcome. By utilizing both 1D and 2D convolutions, this proposed model accurately identifies the class of hand gestures, leading to more accurate recognition. The parameters of the HCA-EffB7 network are optimized using the Fitness-based Archimedes Optimization Algorithm (FAOA). This hybrid approach recognizes complex hand gestures, particularly in the sign language translation system. The proposed approach’s effectiveness is validated by comparing its performance against several baseline systems, thereby confirming its superiority and robustness in recognizing sign language gestures.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117444"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tri-modal fusion for dynamic hand gesture recognition: Integrating RGB, depth, and skeleton data 动态手势识别的三模态融合:集成RGB、深度和骨架数据
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-11-25 DOI: 10.1016/j.image.2025.117440
Reena Tripathi, Bindu Verma
Computer vision applications continue to show a strong interest in dynamic hand gesture recognition because of its wide range of applications in automation, human–computer interaction, and other fields. There are several challenges in dynamic hand gesture recognition, such as occlusion and background clutter, which makes gesture tracking and classification difficult. To solve this, we use a fusion of multiple modality concepts in our proposed work and each modality has its advantages. The first modality utilizes RGB data. It gives spatial information that helps interpret the gesturing hand’s shape, texture, and color information. The second modality employs depth data, which records activity motion. The third modality incorporates skeleton data. The challenges of complex backgrounds and occlusion are resolved by using skeletal data. The features are extracted parallelly from all modalities using a pre-trained ResCLIP model. In sequence-to-sequence learning, the LSTM unit processes the generated feature vectors. At the feature level, the features from all three LSTM networks are concatenated before the fully connected (FC) layer and SoftMax function is employed to classify the gestures. The proposed model was applied to two benchmark datasets, demonstrating its effectiveness, the First-person hand action dataset (FPHA) and the Sheffield Kinect Gesture dataset (SKIG). The proposed work outperformed the state-of-the-art techniques on the FPHA and SKIG datasets, exhibiting competitive performance.
由于动态手势识别在自动化、人机交互等领域的广泛应用,计算机视觉应用不断对动态手势识别表现出浓厚的兴趣。在动态手势识别中存在着遮挡和背景杂波等问题,这给手势跟踪和分类带来了困难。为了解决这个问题,我们在我们提出的工作中使用了多种模态概念的融合,每种模态都有其优点。第一种模式利用RGB数据。它提供空间信息,帮助解释手势的形状、纹理和颜色信息。第二种方式采用深度数据,记录活动运动。第三种模式结合了骨架数据。利用骨骼数据解决了复杂背景和遮挡的难题。使用预训练的ResCLIP模型从所有模态中并行提取特征。在序列到序列学习中,LSTM单元处理生成的特征向量。在特征层,在完全连接(FC)层之前,将所有三个LSTM网络的特征进行连接,并使用SoftMax函数对手势进行分类。将该模型应用于两个基准数据集,即第一人称手部动作数据集(FPHA)和Sheffield Kinect手势数据集(SKIG),证明了其有效性。所提出的工作在FPHA和SKIG数据集上的表现优于最先进的技术,表现出具有竞争力的表现。
{"title":"Tri-modal fusion for dynamic hand gesture recognition: Integrating RGB, depth, and skeleton data","authors":"Reena Tripathi,&nbsp;Bindu Verma","doi":"10.1016/j.image.2025.117440","DOIUrl":"10.1016/j.image.2025.117440","url":null,"abstract":"<div><div>Computer vision applications continue to show a strong interest in dynamic hand gesture recognition because of its wide range of applications in automation, human–computer interaction, and other fields. There are several challenges in dynamic hand gesture recognition, such as occlusion and background clutter, which makes gesture tracking and classification difficult. To solve this, we use a fusion of multiple modality concepts in our proposed work and each modality has its advantages. The first modality utilizes RGB data. It gives spatial information that helps interpret the gesturing hand’s shape, texture, and color information. The second modality employs depth data, which records activity motion. The third modality incorporates skeleton data. The challenges of complex backgrounds and occlusion are resolved by using skeletal data. The features are extracted parallelly from all modalities using a pre-trained ResCLIP model. In sequence-to-sequence learning, the LSTM unit processes the generated feature vectors. At the feature level, the features from all three LSTM networks are concatenated before the fully connected (FC) layer and SoftMax function is employed to classify the gestures. The proposed model was applied to two benchmark datasets, demonstrating its effectiveness, the First-person hand action dataset (FPHA) and the Sheffield Kinect Gesture dataset (SKIG). The proposed work outperformed the state-of-the-art techniques on the FPHA and SKIG datasets, exhibiting competitive performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117440"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of image aesthetics assessment as a positive-unlabelled problem 图像美学评价作为一个积极的无标签问题的分析
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-12-01 DOI: 10.1016/j.image.2025.117441
Luis Gonzalez-Naharro, M. Julia Flores, Jesus Martínez-Gómez, Jose M. Puerta
Image aesthetics assessment (IAA) has been traditionally addressed as a supervised learning problem, where the goal is to accurately predict information related to user opinions, such as the mean opinion score, image ratings, or a binary quality label, usually crafted by using a mean score threshold to label images as highly or lowly aesthetic.
Supervised approaches fail to take into account the subjectiveness of this problem, as the idea of aesthetic pleasantness varies among different people and different cultures, thus making the labels extremely noisy. However, the existence of worldwide photographic contests, exhibitions and masters implies that, to a reasonable degree, there is a broader consensus about the quality of very high-quality images and photographs. Furthermore, labelling image data for IAA is a difficult process, as a large amount of non-trivial aesthetic judgements are required for obtaining a large-scale IAA dataset.
Therefore, in this work we analyse the potential of positive-unlabelled techniques for solving IAA. We propose techniques for building PU datasets from traditional IAA datasets and from available reference datasets of high-quality images, and test several well-known PU algorithms on these. Our results highlight the potential of PU approaches for IAA, as we obtain results close to the state-of-the-art with much smaller sets of labelled data: in experiments with only 5% of labelled in AVA, we reach accuracy levels only 0.03 points below NIMA, and we reach competent balanced accuracy levels in settings with a very limited amount of labelled data and with very simple models.
图像美学评估(IAA)传统上被认为是一个有监督的学习问题,其目标是准确地预测与用户意见相关的信息,例如平均意见得分、图像评级或二进制质量标签,通常通过使用平均得分阈值来标记图像的高度或低审美来制作。监督方法没有考虑到这个问题的主观性,因为审美愉悦的概念在不同的人和不同的文化中是不同的,因此使得标签非常嘈杂。然而,世界范围内摄影比赛、展览和大师的存在意味着,在一定程度上,对于非常高质量的图像和照片的质量有着更广泛的共识。此外,标记图像数据的IAA是一个困难的过程,因为获得大规模的IAA数据集需要大量的非琐碎的美学判断。因此,在这项工作中,我们分析了正无标记技术解决IAA的潜力。我们提出了从传统的IAA数据集和可用的高质量图像参考数据集构建PU数据集的技术,并在这些数据集上测试了几种知名的PU算法。我们的结果突出了PU方法在IAA中的潜力,因为我们用更小的标记数据集获得了接近最先进的结果:在AVA中只有5%标记的实验中,我们达到的精度水平仅比NIMA低0.03点,并且我们在标记数据量非常有限和非常简单的模型设置中达到了足够的平衡精度水平。
{"title":"Analysis of image aesthetics assessment as a positive-unlabelled problem","authors":"Luis Gonzalez-Naharro,&nbsp;M. Julia Flores,&nbsp;Jesus Martínez-Gómez,&nbsp;Jose M. Puerta","doi":"10.1016/j.image.2025.117441","DOIUrl":"10.1016/j.image.2025.117441","url":null,"abstract":"<div><div>Image aesthetics assessment (IAA) has been traditionally addressed as a supervised learning problem, where the goal is to accurately predict information related to user opinions, such as the mean opinion score, image ratings, or a binary quality label, usually crafted by using a mean score threshold to label images as highly or lowly aesthetic.</div><div>Supervised approaches fail to take into account the subjectiveness of this problem, as the idea of aesthetic pleasantness varies among different people and different cultures, thus making the labels extremely noisy. However, the existence of worldwide photographic contests, exhibitions and masters implies that, to a reasonable degree, there is a broader consensus about the quality of very high-quality images and photographs. Furthermore, labelling image data for IAA is a difficult process, as a large amount of non-trivial aesthetic judgements are required for obtaining a large-scale IAA dataset.</div><div>Therefore, in this work we analyse the potential of positive-unlabelled techniques for solving IAA. We propose techniques for building PU datasets from traditional IAA datasets and from available reference datasets of high-quality images, and test several well-known PU algorithms on these. Our results highlight the potential of PU approaches for IAA, as we obtain results close to the state-of-the-art with much smaller sets of labelled data: in experiments with only 5% of labelled in AVA, we reach accuracy levels only 0.03 points below NIMA, and we reach competent balanced accuracy levels in settings with a very limited amount of labelled data and with very simple models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117441"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on video emotion recognition: Segmentation, classification, and explainable AI techniques 视频情感识别综述:分割、分类和可解释的人工智能技术
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-11-25 DOI: 10.1016/j.image.2025.117442
Sudhakar Hallur, Anil Gavade, Priyanka Gavade
Emotion recognition from videos has become a pivotal domain in computer vision and affective computing, contributing to advancements in human–computer interaction, healthcare, security, and multimedia analysis. This survey systematically reviews 137 research papers that span segmentation, classification, and explainable artificial intelligence (XAI) techniques for video-based emotion recognition. The study categorizes works into probabilistic, clustering, deep learning, affective computing, fuzzy logic, genetic algorithms, hybrid, multimodal, and XAI-based approaches. Through a structured evaluation of datasets such as FER2013, CK+, RAVDESS, AffectNet, and EMOTIC, this review highlights how convolutional, recurrent, and transformer architectures, combined with multimodal fusion and attention mechanisms, have enhanced emotion detection accuracy to maximum in certain contexts. It also identifies key challenges including dataset bias, multimodal synchronization, interpretability, and computational complexity. The paper emphasizes the rising importance of XAI in bridging the gap between model transparency and human cognition, proposing that future research focus on explainable, context-aware, and ethically grounded frameworks for robust emotion understanding. By consolidating diverse research trajectories, this survey offers a unified perspective on current advancements, limitations, and future directions in video emotion analysis.
视频情感识别已成为计算机视觉和情感计算的关键领域,有助于人机交互、医疗保健、安全和多媒体分析的进步。本调查系统地回顾了137篇研究论文,这些论文涵盖了基于视频的情感识别的分割、分类和可解释人工智能(XAI)技术。该研究将工作分类为概率、聚类、深度学习、情感计算、模糊逻辑、遗传算法、混合、多模态和基于xai的方法。通过对FER2013、CK+、RAVDESS、AffectNet和EMOTIC等数据集的结构化评估,本综述强调了卷积、循环和变压器架构如何结合多模态融合和注意机制,在某些情况下最大限度地提高了情绪检测的准确性。它还确定了关键挑战,包括数据集偏差、多模态同步、可解释性和计算复杂性。本文强调了XAI在弥合模型透明度和人类认知之间的差距方面的重要性,并建议未来的研究重点放在可解释的、上下文感知的和基于伦理的框架上,以实现强大的情感理解。通过整合不同的研究轨迹,本调查为视频情感分析的当前进展、局限性和未来方向提供了统一的视角。
{"title":"A survey on video emotion recognition: Segmentation, classification, and explainable AI techniques","authors":"Sudhakar Hallur,&nbsp;Anil Gavade,&nbsp;Priyanka Gavade","doi":"10.1016/j.image.2025.117442","DOIUrl":"10.1016/j.image.2025.117442","url":null,"abstract":"<div><div>Emotion recognition from videos has become a pivotal domain in computer vision and affective computing, contributing to advancements in human–computer interaction, healthcare, security, and multimedia analysis. This survey systematically reviews 137 research papers that span segmentation, classification, and explainable artificial intelligence (XAI) techniques for video-based emotion recognition. The study categorizes works into probabilistic, clustering, deep learning, affective computing, fuzzy logic, genetic algorithms, hybrid, multimodal, and XAI-based approaches. Through a structured evaluation of datasets such as FER2013, CK+, RAVDESS, AffectNet, and EMOTIC, this review highlights how convolutional, recurrent, and transformer architectures, combined with multimodal fusion and attention mechanisms, have enhanced emotion detection accuracy to maximum in certain contexts. It also identifies key challenges including dataset bias, multimodal synchronization, interpretability, and computational complexity. The paper emphasizes the rising importance of XAI in bridging the gap between model transparency and human cognition, proposing that future research focus on explainable, context-aware, and ethically grounded frameworks for robust emotion understanding. By consolidating diverse research trajectories, this survey offers a unified perspective on current advancements, limitations, and future directions in video emotion analysis.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117442"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank-based transformation algorithm for image contrast adjustment 基于秩的图像对比度调整变换算法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-11-19 DOI: 10.1016/j.image.2025.117432
Cheng-Hui Chen, Torbjörn E.M. Nordling
Performing proper image contrast adjustment without information loss is an art. Many adjustment methods are used. The default settings are often inappropriate for the image in question rendering a contrast adjustment depending on trial and error. We propose a simple method, rank-based transformation (RBT), for image contrast adjustment that requires no prior knowledge. This makes RBT an ideal first tool to apply for underexposed images. The RBT algorithm normalizes and equalizes all the intensity differences of the image over the full intensity range of the image data type, and thus assigns equal weight to all gradients. Even the state-of-the-art AI tool Cellpose visually benefits from RBT preprocessing. Our comparison of histogram normalization methods demonstrates the ability of RBT to bring out image features.
在不丢失信息的情况下进行适当的图像对比度调整是一门艺术。采用了多种调整方法。默认设置通常不适合所讨论的图像,根据试验和错误进行对比度调整。本文提出了一种不需要先验知识的简单图像对比度调整方法——秩基变换(RBT)。这使得RBT成为应用于曝光不足图像的理想首选工具。RBT算法在图像数据类型的整个强度范围内对图像的所有强度差进行归一化和均衡,从而对所有梯度赋予相等的权重。即使是最先进的人工智能工具Cellpose在视觉上也受益于RBT预处理。通过对直方图归一化方法的比较,证明了RBT提取图像特征的能力。
{"title":"Rank-based transformation algorithm for image contrast adjustment","authors":"Cheng-Hui Chen,&nbsp;Torbjörn E.M. Nordling","doi":"10.1016/j.image.2025.117432","DOIUrl":"10.1016/j.image.2025.117432","url":null,"abstract":"<div><div>Performing proper image contrast adjustment without information loss is an art. Many adjustment methods are used. The default settings are often inappropriate for the image in question rendering a contrast adjustment depending on trial and error. We propose a simple method, rank-based transformation (RBT), for image contrast adjustment that requires no prior knowledge. This makes RBT an ideal first tool to apply for underexposed images. The RBT algorithm normalizes and equalizes all the intensity differences of the image over the full intensity range of the image data type, and thus assigns equal weight to all gradients. Even the state-of-the-art AI tool Cellpose visually benefits from RBT preprocessing. Our comparison of histogram normalization methods demonstrates the ability of RBT to bring out image features.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117432"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSTSGM: A multi-scale temporal–spatial guided model for image deblurring MSTSGM:一种多尺度时空导向图像去模糊模型
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-01 Epub Date: 2025-11-25 DOI: 10.1016/j.image.2025.117443
Boyu Pei , Kejun Long , Zhibo Gao , Jian Gu , Shaofei Wang , Xinhu Lu
Image deblurring is a critical task in computer vision, essential for recovering sharp images from blurry ones often caused by motion blur or camera shake. Recent advancements in deep learning have introduced convolutional neural networks (CNNs) as a powerful alternative, enabling the learning of intricate mappings between blurry and sharp images. However, existing deep learning approaches still struggle with effectively capturing low-frequency information and maintaining robustness across diverse blur conditions, while high-frequency details are often inadequately restored due to their susceptibility to motion blur. This paper presents the Multi-Scale Temporal–Spatial Guided Model (MSTSGM), which integrates multi-scale feature decoupling (MSFD), temporal convolution networks (TCN), and edge attention guided reconstruction (EAGR) to enhance deblurring performance. The MSFD captures a wide range of details by decomposing images into multi-scale representations, while the TCN refines these features by modeling temporal dependencies in blur formation. The EAGR focuses on key edge features, effectively improving image clarity. Evaluated on benchmark datasets including GoPro, HIDE, and RealBlur, MSTSGM demonstrates competitive performance, achieving higher PSNR and SSIM metrics compared to state-of-the-art methods. Ablation studies validate the contribution of each component, highlighting the synergistic effects of multi-scale processing, temporal feature integration, and edge attention. Furthermore, MSTSGM’s application as a preprocessing step for object detection tasks illustrates its practical utility in enhancing the accuracy of downstream computer vision applications. MSTSGM provides a robust solution for advancing image deblurring and related tasks in the field. Source code is available for research purposes at https://github.com/priplex/MSTSGM.
图像去模糊是计算机视觉中的一项关键任务,对于从通常由运动模糊或相机抖动引起的模糊图像中恢复清晰图像至关重要。深度学习的最新进展引入了卷积神经网络(cnn)作为强大的替代方案,使学习模糊和清晰图像之间的复杂映射成为可能。然而,现有的深度学习方法仍然难以有效地捕获低频信息并在不同的模糊条件下保持鲁棒性,而高频细节由于易受运动模糊的影响而往往无法充分恢复。本文提出了一种多尺度时空引导模型(MSTSGM),该模型集成了多尺度特征解耦(MSFD)、时间卷积网络(TCN)和边缘注意引导重建(EAGR)来增强图像去模糊性能。MSFD通过将图像分解成多尺度表示来捕获广泛的细节,而TCN通过在模糊形成中建模时间依赖性来细化这些特征。EAGR专注于关键边缘特征,有效提高图像清晰度。在包括GoPro、HIDE和RealBlur在内的基准数据集上进行评估,MSTSGM展示了具有竞争力的性能,与最先进的方法相比,实现了更高的PSNR和SSIM指标。消融研究验证了每个组成部分的贡献,强调了多尺度处理、时间特征整合和边缘关注的协同效应。此外,MSTSGM作为目标检测任务的预处理步骤的应用说明了它在提高下游计算机视觉应用的准确性方面的实际用途。MSTSGM为推进图像去模糊和现场相关任务提供了一个强大的解决方案。源代码可在https://github.com/priplex/MSTSGM上用于研究目的。
{"title":"MSTSGM: A multi-scale temporal–spatial guided model for image deblurring","authors":"Boyu Pei ,&nbsp;Kejun Long ,&nbsp;Zhibo Gao ,&nbsp;Jian Gu ,&nbsp;Shaofei Wang ,&nbsp;Xinhu Lu","doi":"10.1016/j.image.2025.117443","DOIUrl":"10.1016/j.image.2025.117443","url":null,"abstract":"<div><div>Image deblurring is a critical task in computer vision, essential for recovering sharp images from blurry ones often caused by motion blur or camera shake. Recent advancements in deep learning have introduced convolutional neural networks (CNNs) as a powerful alternative, enabling the learning of intricate mappings between blurry and sharp images. However, existing deep learning approaches still struggle with effectively capturing low-frequency information and maintaining robustness across diverse blur conditions, while high-frequency details are often inadequately restored due to their susceptibility to motion blur. This paper presents the Multi-Scale Temporal–Spatial Guided Model (MSTSGM), which integrates multi-scale feature decoupling (MSFD), temporal convolution networks (TCN), and edge attention guided reconstruction (EAGR) to enhance deblurring performance. The MSFD captures a wide range of details by decomposing images into multi-scale representations, while the TCN refines these features by modeling temporal dependencies in blur formation. The EAGR focuses on key edge features, effectively improving image clarity. Evaluated on benchmark datasets including GoPro, HIDE, and RealBlur, MSTSGM demonstrates competitive performance, achieving higher PSNR and SSIM metrics compared to state-of-the-art methods. Ablation studies validate the contribution of each component, highlighting the synergistic effects of multi-scale processing, temporal feature integration, and edge attention. Furthermore, MSTSGM’s application as a preprocessing step for object detection tasks illustrates its practical utility in enhancing the accuracy of downstream computer vision applications. MSTSGM provides a robust solution for advancing image deblurring and related tasks in the field. Source code is available for research purposes at <span><span>https://github.com/priplex/MSTSGM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"141 ","pages":"Article 117443"},"PeriodicalIF":2.7,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1