Pub Date : 2025-12-26DOI: 10.1016/j.image.2025.117458
Jing Li , Tao Chen , Xiangyu Han , Xilin Luan , Jintao Li
To address the issue of image blurring caused by high-speed vehicle motion and complex road conditions in autonomous driving scenarios, this paper proposes a lightweight Multi-frame Adaptive Image Enhancement Network (MAIE-Net). The network innovatively introduces a hybrid motion compensation mechanism that integrates optical flow alignment and deformable convolution, effectively solving the non-rigid motion alignment problem in complex dynamic scenes. Additionally, a temporal feature enhancement module is constructed, leveraging 3D convolution and attention mechanisms to achieve adaptive fusion of multi-frame information. In terms of architecture design, an edge-guided U-Net structure is employed for multi-scale feature extraction and reconstruction. The framework incorporates edge feature extraction and attention mechanisms within the encoder–decoder to balance feature representation and computational efficiency. The overall lightweight design enables the model to adapt to in-vehicle computing platforms. Experimental results demonstrate that the proposed method significantly improves image quality while maintaining efficient real-time processing capabilities, effectively enhancing the environmental perception performance of in-vehicle vision systems across various driving scenarios, thereby providing a reliable visual enhancement solution for autonomous driving.
{"title":"Multi-Frame Adaptive Image Enhancement Algorithm for vehicle-mounted dynamic scenes","authors":"Jing Li , Tao Chen , Xiangyu Han , Xilin Luan , Jintao Li","doi":"10.1016/j.image.2025.117458","DOIUrl":"10.1016/j.image.2025.117458","url":null,"abstract":"<div><div>To address the issue of image blurring caused by high-speed vehicle motion and complex road conditions in autonomous driving scenarios, this paper proposes a lightweight Multi-frame Adaptive Image Enhancement Network (MAIE-Net). The network innovatively introduces a hybrid motion compensation mechanism that integrates optical flow alignment and deformable convolution, effectively solving the non-rigid motion alignment problem in complex dynamic scenes. Additionally, a temporal feature enhancement module is constructed, leveraging 3D convolution and attention mechanisms to achieve adaptive fusion of multi-frame information. In terms of architecture design, an edge-guided U-Net structure is employed for multi-scale feature extraction and reconstruction. The framework incorporates edge feature extraction and attention mechanisms within the encoder–decoder to balance feature representation and computational efficiency. The overall lightweight design enables the model to adapt to in-vehicle computing platforms. Experimental results demonstrate that the proposed method significantly improves image quality while maintaining efficient real-time processing capabilities, effectively enhancing the environmental perception performance of in-vehicle vision systems across various driving scenarios, thereby providing a reliable visual enhancement solution for autonomous driving.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117458"},"PeriodicalIF":2.7,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1016/j.image.2025.117459
Wan Li , Hengji Xie , Bin Yao , Xiaolin Zhang , Rongrong Fei
Low light often leads to poor image visibility, which can easily affect the performance of computer vision algorithms. The traditional enhancement methods focus excessively on illumination map restoration while neglecting the non-local similarity in natural images. In this paper, we propose an effective low-light image enhancement method based on boundary constraints and non-local similarity. First, a fast and effective boundary constraints method is proposed to estimate illumination maps. Then, a combined optimization model with low-rank and context constraints was presented to improve the enhancing results. Among them, low-rank constraints are used to capture the non-local similarity of the reflectance image, and context constraints are used to improve the accuracy of the illumination map. Finally, alternating iterative optimization is employed for solving non-independent constraints between the illumination and reflectance maps. Experimental results demonstrate that the proposed algorithm enhances images efficiently in terms of both objective quality and subjective quality.
{"title":"Low-light image enhancement via boundary constraints and non-local similarity","authors":"Wan Li , Hengji Xie , Bin Yao , Xiaolin Zhang , Rongrong Fei","doi":"10.1016/j.image.2025.117459","DOIUrl":"10.1016/j.image.2025.117459","url":null,"abstract":"<div><div>Low light often leads to poor image visibility, which can easily affect the performance of computer vision algorithms. The traditional enhancement methods focus excessively on illumination map restoration while neglecting the non-local similarity in natural images. In this paper, we propose an effective low-light image enhancement method based on boundary constraints and non-local similarity. First, a fast and effective boundary constraints method is proposed to estimate illumination maps. Then, a combined optimization model with low-rank and context constraints was presented to improve the enhancing results. Among them, low-rank constraints are used to capture the non-local similarity of the reflectance image, and context constraints are used to improve the accuracy of the illumination map. Finally, alternating iterative optimization is employed for solving non-independent constraints between the illumination and reflectance maps. Experimental results demonstrate that the proposed algorithm enhances images efficiently in terms of both objective quality and subjective quality.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117459"},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.image.2025.117469
Mahamat Issa Choueb , Praveen Kumar Sekharamantry , Giulia Martinelli , Francesco De Natale , Nicola Conci
Anomaly detection has been extensively investigated in numerous application areas. Hand-crafted rules have gradually given way to supervised classification techniques, which frequently rely on a small number of anomaly labels and related architectures. When it comes to human motion, abnormalities emerge at a fine-grained temporal or joint level rather than over a whole video sequence.
This study introduces NFlowAD, a self-supervised system that analyzes body joints to detect irregularities in human motion. It blends normalizing flows with masked motion modeling to describe normal motion data without the need for anomaly labels. Inference uses both reconstruction mistakes and flow-based likelihoods to detect anomalies. The validation pipeline on various state-of-the-art datasets demonstrates NFlowAD’s efficiency in recognizing, locating, and analyzing anomalous motion sequences, while maintaining robust detection and interpretability.
{"title":"NFlowAD: A normalizing flow model for anomaly detection in human motion animations","authors":"Mahamat Issa Choueb , Praveen Kumar Sekharamantry , Giulia Martinelli , Francesco De Natale , Nicola Conci","doi":"10.1016/j.image.2025.117469","DOIUrl":"10.1016/j.image.2025.117469","url":null,"abstract":"<div><div>Anomaly detection has been extensively investigated in numerous application areas. Hand-crafted rules have gradually given way to supervised classification techniques, which frequently rely on a small number of anomaly labels and related architectures. When it comes to human motion, abnormalities emerge at a fine-grained temporal or joint level rather than over a whole video sequence.</div><div>This study introduces NFlowAD, a self-supervised system that analyzes body joints to detect irregularities in human motion. It blends normalizing flows with masked motion modeling to describe normal motion data without the need for anomaly labels. Inference uses both reconstruction mistakes and flow-based likelihoods to detect anomalies. The validation pipeline on various state-of-the-art datasets demonstrates NFlowAD’s efficiency in recognizing, locating, and analyzing anomalous motion sequences, while maintaining robust detection and interpretability.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117469"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.image.2025.117467
Hao Zhai, Anyu Li, Yan Wei, Huashan Tan, Yiyang Ru
The goal of infrared and visible light image fusion is to create images that highlight infrared thermal targets while preserving texture information under challenging lighting conditions. However, in extreme environments like heavy fog or overexposure, visible light images often contain redundant information, negatively affecting fusion results. To better emphasize salient targets in infrared images and reduce interference from redundant information, this paper proposes an adaptive salient enhancement fusion method for infrared and visible light images, called SACIFuse. First, we designed a Salient Feature Prediction Enhancement Module (SFEM), which extracts image gradients through edge operators and generates a mask quantifying the probability of redundant information. This mask is used to adaptively weight the source image, thereby suppressing redundant visible light information while enhancing infrared targets. Additionally, we introduced a Salient Feature Interaction Attention Module (SFIM), capable of employing residual attention combined with spatial and channel attention mechanisms to guide the interaction between the enhanced salient features and the source image features, ensuring that the fusion results highlight infrared targets while preserving visible light texture. Finally, our proposed loss function constructs a binary mask of the fused image to impose constraints on salient targets, effectively preventing adverse effects of redundant information on key regions. Extensive testing on public datasets shows that SACIfuse outperforms existing state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, generalization experiments conducted on other datasets demonstrate that the proposed model exhibits strong generalization capabilities.
{"title":"SACIFuse: Adaptive enhancement of salient features and cross-modal attention interaction for infrared and visible image fusion","authors":"Hao Zhai, Anyu Li, Yan Wei, Huashan Tan, Yiyang Ru","doi":"10.1016/j.image.2025.117467","DOIUrl":"10.1016/j.image.2025.117467","url":null,"abstract":"<div><div>The goal of infrared and visible light image fusion is to create images that highlight infrared thermal targets while preserving texture information under challenging lighting conditions. However, in extreme environments like heavy fog or overexposure, visible light images often contain redundant information, negatively affecting fusion results. To better emphasize salient targets in infrared images and reduce interference from redundant information, this paper proposes an adaptive salient enhancement fusion method for infrared and visible light images, called SACIFuse. First, we designed a Salient Feature Prediction Enhancement Module (SFEM), which extracts image gradients through edge operators and generates a mask quantifying the probability of redundant information. This mask is used to adaptively weight the source image, thereby suppressing redundant visible light information while enhancing infrared targets. Additionally, we introduced a Salient Feature Interaction Attention Module (SFIM), capable of employing residual attention combined with spatial and channel attention mechanisms to guide the interaction between the enhanced salient features and the source image features, ensuring that the fusion results highlight infrared targets while preserving visible light texture. Finally, our proposed loss function constructs a binary mask of the fused image to impose constraints on salient targets, effectively preventing adverse effects of redundant information on key regions. Extensive testing on public datasets shows that SACIfuse outperforms existing state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, generalization experiments conducted on other datasets demonstrate that the proposed model exhibits strong generalization capabilities.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117467"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.image.2025.117468
Mohammad Roueinfar, Mohammad Hossein Kahaei
The effect of noise on the Inverse Synthetic Aperture Radar (ISAR) with sparse apertures is challenging for image reconstruction with high resolution at low Signal-to-Noise Ratios (SNRs). It is well-known that the image resolution is affected by the bandwidth of the transmitted signal and the Coherent Processing Interval (CPI) in two dimensions, range and azimuth, respectively. To reduce the noise effect and thus increase the two-dimensional resolution of Unmanned Aerial Vehicles (UAVs) images, we propose the Fast Reweighted Atomic Norm Denoising (FRAND) algorithm by incorporating the weighted atomic norm minimization. To solve the problem, the Two-Dimensional Alternating Direction Method of Multipliers (2D-ADMM) algorithm is developed to speed up the implementation procedure. Assuming sparse apertures for ISAR images of UAVs, we compare the proposed method with the MUltiple SIgnal Classification (MUSIC), Cadzow, and methods in different SNRs. Simulation results show the superiority of FRAND at low SNRs based on the Mean-Square Error (MSE), Peak Signal-to-Noise ratio (PSNR), and Structural Similarity Index Measure (SSIM) criteria.
{"title":"Enhanced ISAR imaging of UAVs: Noise reduction via weighted atomic norm minimization and 2D-ADMM","authors":"Mohammad Roueinfar, Mohammad Hossein Kahaei","doi":"10.1016/j.image.2025.117468","DOIUrl":"10.1016/j.image.2025.117468","url":null,"abstract":"<div><div>The effect of noise on the Inverse Synthetic Aperture Radar (ISAR) with sparse apertures is challenging for image reconstruction with high resolution at low Signal-to-Noise Ratios (SNRs). It is well-known that the image resolution is affected by the bandwidth of the transmitted signal and the Coherent Processing Interval (CPI) in two dimensions, range and azimuth, respectively. To reduce the noise effect and thus increase the two-dimensional resolution of Unmanned Aerial Vehicles (UAVs) images, we propose the Fast Reweighted Atomic Norm Denoising (FRAND) algorithm by incorporating the weighted atomic norm minimization. To solve the problem, the Two-Dimensional Alternating Direction Method of Multipliers (2D-ADMM) algorithm is developed to speed up the implementation procedure. Assuming sparse apertures for ISAR images of UAVs, we compare the proposed method with the MUltiple SIgnal Classification (MUSIC), Cadzow, and <span><math><msub><mrow><mi>SL</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> methods in different SNRs. Simulation results show the superiority of FRAND at low SNRs based on the Mean-Square Error (MSE), Peak Signal-to-Noise ratio (PSNR), and Structural Similarity Index Measure (SSIM) criteria.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117468"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The video object segmentation algorithm based on memory networks stores the information of the target object through the maintained external memory inventory. As the segmentation progresses, the size of the memory inventory will continue to increase, leading to redundancy of feature information and affecting the execution efficiency of the algorithm. In addition, the key value pairs stored in the memory library are subjected to channel dimension reduction using standard convolution, resulting in insufficient representation ability of target object features. In response to the above issues, this chapter proposes a video object segmentation algorithm based on feature compression and attention correction, constructing a reliable and effective memory library to ensure efficient storage and updating of target object information, thereby reducing computational complexity and storage consumption. A dual attention mechanism based on spatial and channel dimensions was proposed to correct feature information and enhance the representation ability of features. A large number of experiments have shown that the proposed algorithm demonstrates reliable competitiveness compared to other mainstream algorithms in recent years.
{"title":"Video object segmentation based on feature compression and attention correction","authors":"Zhiqiang Hou, Jiale Dong, Chenxu Wang, Sugang Ma, Wangsheng Yu, Yuncheng Wang","doi":"10.1016/j.image.2025.117456","DOIUrl":"10.1016/j.image.2025.117456","url":null,"abstract":"<div><div>The video object segmentation algorithm based on memory networks stores the information of the target object through the maintained external memory inventory. As the segmentation progresses, the size of the memory inventory will continue to increase, leading to redundancy of feature information and affecting the execution efficiency of the algorithm. In addition, the key value pairs stored in the memory library are subjected to channel dimension reduction using standard convolution, resulting in insufficient representation ability of target object features. In response to the above issues, this chapter proposes a video object segmentation algorithm based on feature compression and attention correction, constructing a reliable and effective memory library to ensure efficient storage and updating of target object information, thereby reducing computational complexity and storage consumption. A dual attention mechanism based on spatial and channel dimensions was proposed to correct feature information and enhance the representation ability of features. A large number of experiments have shown that the proposed algorithm demonstrates reliable competitiveness compared to other mainstream algorithms in recent years.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117456"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.image.2025.117463
Lixin Wei , Yun Luo , Rongzhe Zhu , Xin Li
To address the challenge of tracking difficulties due to the absence of temporal dynamic information and background clutter interference caused by similar backgrounds, similar objects, target occlusion, and illumination changes during target tracking, this paper proposes a single object tracking algorithm based on spatio-temporal information (SST). The algorithm integrates a Temporal Adaptive Module (TAM) into the backbone network to generate a temporal kernel based on feature maps. This endows the network with the capability to model temporal dynamics, effectively utilizing the temporal relationships between frames to handle complex temporal dynamics such as changes in target motion states and environmental conditions. Additionally, to mitigate background clutter interference, the algorithm employs a Mixed Local Channel Attention (MLCA) mechanism, which captures channel and spatial information to focus the network on the target and reduce the impact of interfering information. The proposed algorithm was evaluated on OTB100, LaSOT, and NFS datasets. It achieved an AUC score of 70.7% on OTB, which represents a 1.3% improvement over the baseline tracker. On LaSOT and NFS datasets, it obtained AUC scores of 65.1% and 65.9%, respectively, showing improvements of 0.2% compared to the baseline tracker. The tracking speed exceeds 80fps, and the performance of the SST algorithm has been verified on self-made videos. The code is available at https://github.com/xuexiaodemenggubao/sst.
{"title":"Single object tracking based on Spatio-Temporal information","authors":"Lixin Wei , Yun Luo , Rongzhe Zhu , Xin Li","doi":"10.1016/j.image.2025.117463","DOIUrl":"10.1016/j.image.2025.117463","url":null,"abstract":"<div><div>To address the challenge of tracking difficulties due to the absence of temporal dynamic information and background clutter interference caused by similar backgrounds, similar objects, target occlusion, and illumination changes during target tracking, this paper proposes a single object tracking algorithm based on spatio-temporal information (SST). The algorithm integrates a Temporal Adaptive Module (TAM) into the backbone network to generate a temporal kernel based on feature maps. This endows the network with the capability to model temporal dynamics, effectively utilizing the temporal relationships between frames to handle complex temporal dynamics such as changes in target motion states and environmental conditions. Additionally, to mitigate background clutter interference, the algorithm employs a Mixed Local Channel Attention (MLCA) mechanism, which captures channel and spatial information to focus the network on the target and reduce the impact of interfering information. The proposed algorithm was evaluated on OTB100, LaSOT, and NFS datasets. It achieved an AUC score of 70.7% on OTB, which represents a 1.3% improvement over the baseline tracker. On LaSOT and NFS datasets, it obtained AUC scores of 65.1% and 65.9%, respectively, showing improvements of 0.2% compared to the baseline tracker. The tracking speed exceeds 80fps, and the performance of the SST algorithm has been verified on self-made videos. The code is available at <span><span>https://github.com/xuexiaodemenggubao/sst</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117463"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.image.2025.117460
Rui Fan , Renhao Jiao , Weigui Nan , Haitao Meng , Abin Jiang , Xiaojia Yang , Zhiqiang Zhao , Jin Dang , Zhixue Wang , Yanshan Tian , Baiying Dong , Xiaowei He , Xiaoli Luo
With the rapid development of the UAV industry, the application of object detection technology based on UAV aerial images is becoming more and more extensive. However, the target of UAV aerial image is small, dense and disturbed by complex environment, which makes object detection face great challenges. In order to solve the problems of dense small targets and strong background interference in UAV aerial images, we propose a YOLO-based UAV aerial image detection model-Content-Conscious and Scale-Sensitive (CS-YOLO). Unlike existing YOLO-based approaches, our contribution lies in the joint design of Bottleneck Attention Module-cross-stage partial (BAM-CSP), Multi-Scale Pooling Attention Fusion Module (MPAFM) and Feature Difference Fusion Module (FDFM). The BAM-CSP module significantly enhances the small target feature response by integrating the channel attention mechanism at the bottleneck layer of the cross-stage partial network; the MPAFM module adopts a multi-scale pooling attention fusion architecture, which suppresses complex background interference through parallel pooling and enhances the background perception ability of small targets. The FDFM module captures the information changes during the sampling process through the feature difference fusion mechanism. The Gradient Adaptive-Efficient IoU (GA-EIoU) loss function is introduced to optimize bounding box regression performance by incorporating the EIoU gradient constraint weighting mechanism. Comparative experiments on the VisDrone2019 dataset, CS-YOLO achieves 22.6% mAP@50:95, which is 2.7% higher than YOLO11n; on the HazyDet dataset, CS-YOLO achieved 53.8% mAP@50:95, an increase of 2.8%. CS-YOLO also comprehensively surpasses the existing advanced methods in terms of recall rate and robustness. Meanwhile, we conducted ablation experiments to verify the gain effect of each module on the detection performance. The model effectively solves the technical problems such as dense small targets and strong environmental interference in UAV aerial images, and provides a high-precision, real-time and reliable detection scheme for complex tasks such as UAV inspection. The source code will be available at https://github.com/unscfr/CS-YOLO.
{"title":"CS-YOLO:A small object detection model based on YOLO for UAV aerial photography","authors":"Rui Fan , Renhao Jiao , Weigui Nan , Haitao Meng , Abin Jiang , Xiaojia Yang , Zhiqiang Zhao , Jin Dang , Zhixue Wang , Yanshan Tian , Baiying Dong , Xiaowei He , Xiaoli Luo","doi":"10.1016/j.image.2025.117460","DOIUrl":"10.1016/j.image.2025.117460","url":null,"abstract":"<div><div>With the rapid development of the UAV industry, the application of object detection technology based on UAV aerial images is becoming more and more extensive. However, the target of UAV aerial image is small, dense and disturbed by complex environment, which makes object detection face great challenges. In order to solve the problems of dense small targets and strong background interference in UAV aerial images, we propose a YOLO-based UAV aerial image detection model-Content-Conscious and Scale-Sensitive (CS-YOLO). Unlike existing YOLO-based approaches, our contribution lies in the joint design of Bottleneck Attention Module-cross-stage partial (BAM-CSP), Multi-Scale Pooling Attention Fusion Module (MPAFM) and Feature Difference Fusion Module (FDFM). The BAM-CSP module significantly enhances the small target feature response by integrating the channel attention mechanism at the bottleneck layer of the cross-stage partial network; the MPAFM module adopts a multi-scale pooling attention fusion architecture, which suppresses complex background interference through parallel pooling and enhances the background perception ability of small targets. The FDFM module captures the information changes during the sampling process through the feature difference fusion mechanism. The Gradient Adaptive-Efficient IoU (GA-EIoU) loss function is introduced to optimize bounding box regression performance by incorporating the EIoU gradient constraint weighting mechanism. Comparative experiments on the VisDrone2019 dataset, CS-YOLO achieves 22.6% mAP@50:95, which is 2.7% higher than YOLO11n; on the HazyDet dataset, CS-YOLO achieved 53.8% mAP@50:95, an increase of 2.8%. CS-YOLO also comprehensively surpasses the existing advanced methods in terms of recall rate and robustness. Meanwhile, we conducted ablation experiments to verify the gain effect of each module on the detection performance. The model effectively solves the technical problems such as dense small targets and strong environmental interference in UAV aerial images, and provides a high-precision, real-time and reliable detection scheme for complex tasks such as UAV inspection. The source code will be available at <span><span>https://github.com/unscfr/CS-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117460"},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Co-salient object detection is a challenging task. Despite advances in existing detectors, two problems remain unsolved. First, although depth maps complement spatial information, existing methods do not effectively fuse multimodal information, and multiscale features are not aggregated appropriately to predict co-salient maps. Second, existing deep-learning methods usually require large numbers of parameters; thus, model sizes must be reduced while ensuring accuracy to enable them to run on streamlined end devices. We propose a multi-stream interaction cooperative encoder by constructing early fusion branches to improve modal interactions and a two-stage transformer decoder to promote multiscale feature fusion. Finally, a multi-stream interaction network with cross-modal contrast knowledge distillation is proposed to connect student and teacher models to improve the performance of the student model while sustaining low computing requirements and achieving collaborative co-salient detection. Our solution is based on a teacher–student architecture that uses contrastive learning to transfer knowledge between deep networks while enhancing semantic consistency and suppressing noise. We employ cross-modal contrast distillation and attention modules in the encoding and decoding phases, respectively, to enhance the response channel and spatial consistency. In addition, a collaborative contrast-learning module is employed to better convey structural knowledge to help students obtain more accurate group semantic information. Experiments on benchmark datasets show the superior performance of the proposed multi-stream interaction network with cross-modal contrast knowledge distillation in collaborative saliency target detection.
{"title":"Multi-stream interaction network with cross-modal contrast distillation for co-salient object detection","authors":"Wujie Zhou , Bingying Wang , Xiena Dong , Caie Xu , Fangfang Qiang","doi":"10.1016/j.image.2025.117454","DOIUrl":"10.1016/j.image.2025.117454","url":null,"abstract":"<div><div>Co-salient object detection is a challenging task. Despite advances in existing detectors, two problems remain unsolved. First, although depth maps complement spatial information, existing methods do not effectively fuse multimodal information, and multiscale features are not aggregated appropriately to predict co-salient maps. Second, existing deep-learning methods usually require large numbers of parameters; thus, model sizes must be reduced while ensuring accuracy to enable them to run on streamlined end devices. We propose a multi-stream interaction cooperative encoder by constructing early fusion branches to improve modal interactions and a two-stage transformer decoder to promote multiscale feature fusion. Finally, a multi-stream interaction network with cross-modal contrast knowledge distillation is proposed to connect student and teacher models to improve the performance of the student model while sustaining low computing requirements and achieving collaborative co-salient detection. Our solution is based on a teacher–student architecture that uses contrastive learning to transfer knowledge between deep networks while enhancing semantic consistency and suppressing noise. We employ cross-modal contrast distillation and attention modules in the encoding and decoding phases, respectively, to enhance the response channel and spatial consistency. In addition, a collaborative contrast-learning module is employed to better convey structural knowledge to help students obtain more accurate group semantic information. Experiments on benchmark datasets show the superior performance of the proposed multi-stream interaction network with cross-modal contrast knowledge distillation in collaborative saliency target detection.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117454"},"PeriodicalIF":2.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolution Neural Networks (CNNs) with attention mechanisms show great potential for improving low-dose computed tomography (LDCT) image quality. However, most of these methods use first-order statistics for channel or space processing, ignoring the higher-order statistics of the channel or space features. In addition, the conventional convolution has limited receptive field and a poor performance on the edge of LDCT images. In this study, we aim to develop a CNN model incorporating higher-order feature attention mechanism that both enlarges the receptive field and clearly recovers edges and details. We propose an LDCT image denoising network named as HOICNet based on a higher-order feature attention mechanism and irregular convolution. Specifically, we first propose a new higher-order feature attention mechanism that utilizes higher-order feature statistics to enhance features in different channels and spatial regions. Second, we propose a new irregular convolutional feature extraction module (ICFE) that contains self-calibrating convolution (SC) and side window convolution (SWC). SC is used to enlarge receptive fields, and SWC is used to improve the edge information in denoised images. Finally, we introduce the contrast regularization mechanism (CRM) with positive and negative samples to bring the denoised image closer and closer to the positive samples while moving away from the negative samples to alleviate the problem of over-smoothing of the denoised images. Our experimental results show that the peak signal-to-noise ratio (PSNR), the structural similarity (SSIM), the root mean square error (RMSE) and the visual information fidelity (VIF) values achieved significant improvements in both the AAPM dataset and the piglet dataset.
{"title":"HOICNet: Low-Dose CT image denoising network based on higher-order feature attention mechanism and irregular convolution","authors":"Aimin Huang , Lina Jia , Beibei Jia , Zhiguo Gui , Jianan Liang","doi":"10.1016/j.image.2025.117457","DOIUrl":"10.1016/j.image.2025.117457","url":null,"abstract":"<div><div>Convolution Neural Networks (CNNs) with attention mechanisms show great potential for improving low-dose computed tomography (LDCT) image quality. However, most of these methods use first-order statistics for channel or space processing, ignoring the higher-order statistics of the channel or space features. In addition, the conventional convolution has limited receptive field and a poor performance on the edge of LDCT images. In this study, we aim to develop a CNN model incorporating higher-order feature attention mechanism that both enlarges the receptive field and clearly recovers edges and details. We propose an LDCT image denoising network named as HOICNet based on a higher-order feature attention mechanism and irregular convolution. Specifically, we first propose a new higher-order feature attention mechanism that utilizes higher-order feature statistics to enhance features in different channels and spatial regions. Second, we propose a new irregular convolutional feature extraction module (ICFE) that contains self-calibrating convolution (SC) and side window convolution (SWC). SC is used to enlarge receptive fields, and SWC is used to improve the edge information in denoised images. Finally, we introduce the contrast regularization mechanism (CRM) with positive and negative samples to bring the denoised image closer and closer to the positive samples while moving away from the negative samples to alleviate the problem of over-smoothing of the denoised images. Our experimental results show that the peak signal-to-noise ratio (PSNR), the structural similarity (SSIM), the root mean square error (RMSE) and the visual information fidelity (VIF) values achieved significant improvements in both the AAPM dataset and the piglet dataset.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"142 ","pages":"Article 117457"},"PeriodicalIF":2.7,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}