首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
DUALF-D: Disentangled dual-hyperprior approach for light field image compression 光场图像压缩的解纠缠双超先验方法
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-13 DOI: 10.1016/j.image.2025.117436
Soheib Takhtardeshir , Roger Olsson , Christine Guillemot , Mårten Sjöström
Light field (LF) imaging captures spatial and angular information, offering a 4D scene representation enabling enhanced visual understanding. However, high dimensionality and redundancy across spatial and angular domains present major challenges for compression, particularly where storage, transmission bandwidth, or processing latency are constrained. We present a novel Variational Autoencoder (VAE)-based framework that explicitly disentangles spatial and angular features using two parallel latent branches. Each branch is coupled with an independent hyperprior model, allowing more precise distribution estimation for entropy coding and finer rate–distortion control. This dual-hyperprior structure enables the network to adaptively compress spatial and angular information based on their unique statistical characteristics, improving coding efficiency. To further enhance latent feature specialization and promote disentanglement, we introduce a mutual information-based regularization term that minimizes redundancy between the two branches while preserving feature diversity. Unlike prior methods relying on covariance-based penalties prone to collapse, our information-theoretic regularizer provides more stable and interpretable latent separation. Experimental results on publicly available LF datasets demonstrate our method achieves strong compression performance, yielding an average BD-PSNR gain of 2.91 dB over HEVC and high compression ratios (e.g., 200:1). Additionally, our design enables fast inference, with a total end-to-end time over 19x faster than the JPEG Pleno standard, making it well-suited for real-time and bandwidth-sensitive applications. By jointly leveraging disentangled representation learning, dual-hyperprior modeling, and information-theoretic regularization, our approach offers a scalable, effective solution for practical light field image compression.
光场(LF)成像捕获空间和角度信息,提供4D场景表示,增强视觉理解。然而,跨空间和角度域的高维和冗余对压缩提出了主要挑战,特别是在存储、传输带宽或处理延迟受到限制的情况下。我们提出了一种新的基于变分自编码器(VAE)的框架,该框架使用两个平行的潜在分支明确地分离空间和角度特征。每个分支与一个独立的超先验模型耦合,允许更精确的熵编码分布估计和更精细的率失真控制。这种双超先验结构使网络能够根据空间信息和角度信息独特的统计特性自适应压缩,提高编码效率。为了进一步增强潜在特征专门化和促进解纠缠,我们引入了一个基于互信息的正则化项,在保持特征多样性的同时最大限度地减少了两个分支之间的冗余。不像以前的方法依赖于基于协方差的惩罚容易崩溃,我们的信息论正则器提供了更稳定和可解释的潜在分离。在公开可用的LF数据集上的实验结果表明,我们的方法实现了强大的压缩性能,在HEVC上产生2.91 dB的平均BD-PSNR增益和高压缩比(例如200:1)。此外,我们的设计支持快速推理,总端到端时间比JPEG Pleno标准快19倍,使其非常适合实时和带宽敏感的应用程序。通过联合利用解纠缠表示学习,双超先验建模和信息论正则化,我们的方法为实际光场图像压缩提供了可扩展的有效解决方案。
{"title":"DUALF-D: Disentangled dual-hyperprior approach for light field image compression","authors":"Soheib Takhtardeshir ,&nbsp;Roger Olsson ,&nbsp;Christine Guillemot ,&nbsp;Mårten Sjöström","doi":"10.1016/j.image.2025.117436","DOIUrl":"10.1016/j.image.2025.117436","url":null,"abstract":"<div><div>Light field (LF) imaging captures spatial and angular information, offering a 4D scene representation enabling enhanced visual understanding. However, high dimensionality and redundancy across spatial and angular domains present major challenges for compression, particularly where storage, transmission bandwidth, or processing latency are constrained. We present a novel Variational Autoencoder (VAE)-based framework that explicitly disentangles spatial and angular features using two parallel latent branches. Each branch is coupled with an independent hyperprior model, allowing more precise distribution estimation for entropy coding and finer rate–distortion control. This dual-hyperprior structure enables the network to adaptively compress spatial and angular information based on their unique statistical characteristics, improving coding efficiency. To further enhance latent feature specialization and promote disentanglement, we introduce a mutual information-based regularization term that minimizes redundancy between the two branches while preserving feature diversity. Unlike prior methods relying on covariance-based penalties prone to collapse, our information-theoretic regularizer provides more stable and interpretable latent separation. Experimental results on publicly available LF datasets demonstrate our method achieves strong compression performance, yielding an average <strong>BD-PSNR gain of 2.91 dB over HEVC</strong> and high compression ratios (e.g., 200:1). Additionally, our design enables fast inference, with a total <strong>end-to-end time</strong> over <strong>19x faster</strong> than the JPEG Pleno standard, making it well-suited for real-time and bandwidth-sensitive applications. By jointly leveraging disentangled representation learning, dual-hyperprior modeling, and information-theoretic regularization, our approach offers a scalable, effective solution for practical light field image compression.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117436"},"PeriodicalIF":2.7,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransformAR: A light-weight transformer-based metric for Augmented Reality quality assessment TransformAR:用于增强现实质量评估的基于轻量级变压器的度量
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-12 DOI: 10.1016/j.image.2025.117437
Aymen Sekhri , Mohamed-Chaker Larabi , Seyed Ali Amirshahi
As Augmented Reality (AR) technology continues to gain traction in various sectors, ensuring a superior user experience has become an essential challenge for both academic researchers and industry professionals. However, the task of automatically predicting the quality of AR images remains difficult due to several inherent challenges, particularly the issue of visual confusion arising from the overlap of virtual and real-world elements. This paper introduces transformAR, a novel and efficient transformer-based framework designed to objectively assess the quality of AR images. The proposed model uses pre-trained vision transformers to capture content features from AR images, calculates distance vectors to measure the impact of distortions, and employs cross-attention-based decoders to effectively model the perceptual qualities of the AR images. Additionally, the training framework uses regularization techniques and label smoothing-like method to reduce the risk of overfitting. Through comprehensive experiments, we demonstrate that transformAR outperforms existing state-of-the-art approaches, offering a more reliable and scalable solution for AR image quality assessment.
随着增强现实(AR)技术在各个领域的发展,确保卓越的用户体验已成为学术研究人员和行业专业人士面临的重要挑战。然而,由于一些固有的挑战,自动预测AR图像质量的任务仍然很困难,特别是虚拟和现实世界元素重叠引起的视觉混淆问题。本文介绍了一种新的、高效的基于变换的框架transformAR,用于客观地评估AR图像的质量。该模型使用预训练的视觉转换器捕获AR图像的内容特征,计算距离向量来测量扭曲的影响,并使用基于交叉注意力的解码器来有效地模拟AR图像的感知质量。此外,训练框架使用正则化技术和类似标签平滑的方法来降低过拟合的风险。通过综合实验,我们证明了transformAR优于现有的最先进的方法,为AR图像质量评估提供了更可靠和可扩展的解决方案。
{"title":"TransformAR: A light-weight transformer-based metric for Augmented Reality quality assessment","authors":"Aymen Sekhri ,&nbsp;Mohamed-Chaker Larabi ,&nbsp;Seyed Ali Amirshahi","doi":"10.1016/j.image.2025.117437","DOIUrl":"10.1016/j.image.2025.117437","url":null,"abstract":"<div><div>As Augmented Reality (AR) technology continues to gain traction in various sectors, ensuring a superior user experience has become an essential challenge for both academic researchers and industry professionals. However, the task of automatically predicting the quality of AR images remains difficult due to several inherent challenges, particularly the issue of visual confusion arising from the overlap of virtual and real-world elements. This paper introduces transformAR, a novel and efficient transformer-based framework designed to objectively assess the quality of AR images. The proposed model uses pre-trained vision transformers to capture content features from AR images, calculates distance vectors to measure the impact of distortions, and employs cross-attention-based decoders to effectively model the perceptual qualities of the AR images. Additionally, the training framework uses regularization techniques and label smoothing-like method to reduce the risk of overfitting. Through comprehensive experiments, we demonstrate that transformAR outperforms existing state-of-the-art approaches, offering a more reliable and scalable solution for AR image quality assessment.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117437"},"PeriodicalIF":2.7,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep noise-tolerant hashing for remote sensing image retrieval 用于遥感图像检索的深度耐噪哈希
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-10 DOI: 10.1016/j.image.2025.117431
Chunyu Yan , Lei Wang , Qibing Qin , Jiangyan Dai , Wenfeng Zhang
Currently, how to quickly retrieve target images from large-scale remote sensing data has emerged as a critical challenge in the context of explosive growth of remote sensing data volume. To deal with this challenge, hash learning becomes an ideal choice with its low storage cost and high efficiency. In recent years, the combination of hash learning with deep neural networks such as CNNs and Transformers has resulted in numerous frameworks demonstrating excellent performance. However, in the field of remote sensing image hashing, previous studies cannot simultaneously consider the effect of noise in feature extraction and loss optimization, so that their retrieval performance is greatly reduced due to noise interference. To resolve the mentioned problem, a Deep Noise-tolerant Hashing (DNtH) framework is proposed to learn the sample complexity and noise level, and adaptively reduce the weight of noisy information. Specifically, to realize the extraction of fine-grained features from information containing irrelevant samples, the noise-aware Transformer is proposed by introducing the patch-wise attention and depth-wise convolution. To reduce the interference of noisy labels on remote sensing image retrieval, an adaptive active-passive loss framework is proposed to dynamically adjust the weights of active passive loss, which learns the weight parameters through a dynamic weighted network while combining with asymmetric strategy for effective compact representation learning. The ratio of entropy to standard deviation and the probability difference are input into the above network and trained with the feature extraction network. Extensive experiments on three publicly available datasets show that the DNtH framework can adapt to noisy environments while achieving optimal performance in remote sensing image retrieval. The source code for the implementation of our DNtH framework is available at https://github.com/QinLab-WFU/DNtH.git.
当前,在遥感数据量爆炸式增长的背景下,如何从大规模遥感数据中快速检索目标图像已成为一项关键挑战。为了应对这一挑战,哈希学习以其低存储成本和高效率成为一种理想的选择。近年来,哈希学习与深度神经网络(如cnn和Transformers)的结合导致许多框架表现出优异的性能。然而,在遥感图像哈希领域,以往的研究在特征提取和损失优化中不能同时考虑噪声的影响,使得它们的检索性能由于噪声干扰而大大降低。为了解决上述问题,提出了一种深度容忍噪声哈希(DNtH)框架来学习样本复杂度和噪声水平,并自适应降低噪声信息的权重。具体来说,为了从包含不相关样本的信息中实现细粒度特征的提取,提出了一种噪声感知变压器,该变压器通过引入斑块感知和深度感知卷积来实现。为了减少噪声标签对遥感图像检索的干扰,提出了一种自适应主-被动损失框架,动态调整主-被动损失的权重,通过动态加权网络学习权重参数,同时结合非对称策略进行有效的紧凑表示学习。将熵标准差比和概率差输入到上述网络中,用特征提取网络进行训练。在三个公开数据集上的大量实验表明,DNtH框架可以适应噪声环境,同时在遥感图像检索中获得最佳性能。实现我们的DNtH框架的源代码可从https://github.com/QinLab-WFU/DNtH.git获得。
{"title":"Deep noise-tolerant hashing for remote sensing image retrieval","authors":"Chunyu Yan ,&nbsp;Lei Wang ,&nbsp;Qibing Qin ,&nbsp;Jiangyan Dai ,&nbsp;Wenfeng Zhang","doi":"10.1016/j.image.2025.117431","DOIUrl":"10.1016/j.image.2025.117431","url":null,"abstract":"<div><div>Currently, how to quickly retrieve target images from large-scale remote sensing data has emerged as a critical challenge in the context of explosive growth of remote sensing data volume. To deal with this challenge, hash learning becomes an ideal choice with its low storage cost and high efficiency. In recent years, the combination of hash learning with deep neural networks such as CNNs and Transformers has resulted in numerous frameworks demonstrating excellent performance. However, in the field of remote sensing image hashing, previous studies cannot simultaneously consider the effect of noise in feature extraction and loss optimization, so that their retrieval performance is greatly reduced due to noise interference. To resolve the mentioned problem, a Deep Noise-tolerant Hashing (DNtH) framework is proposed to learn the sample complexity and noise level, and adaptively reduce the weight of noisy information. Specifically, to realize the extraction of fine-grained features from information containing irrelevant samples, the noise-aware Transformer is proposed by introducing the patch-wise attention and depth-wise convolution. To reduce the interference of noisy labels on remote sensing image retrieval, an adaptive active-passive loss framework is proposed to dynamically adjust the weights of active passive loss, which learns the weight parameters through a dynamic weighted network while combining with asymmetric strategy for effective compact representation learning. The ratio of entropy to standard deviation and the probability difference are input into the above network and trained with the feature extraction network. Extensive experiments on three publicly available datasets show that the DNtH framework can adapt to noisy environments while achieving optimal performance in remote sensing image retrieval. The source code for the implementation of our DNtH framework is available at <span><span>https://github.com/QinLab-WFU/DNtH.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117431"},"PeriodicalIF":2.7,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLANet: Multilevel aggregation network for binocular eye-fixation prediction MLANet:双目注视预测的多级聚合网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-09 DOI: 10.1016/j.image.2025.117434
Wujie Zhou , Jiabao Ma , Yulai Zhang , Lu Yu , Weijia Gao , Ting Luo
Saliency prediction is an underexplored but fundamental task in computer vision, especially for binocular images. For binocular images, saliency prediction aims to find the most visually distinctive parts to imitate the operation of the human visual system. Existing saliency prediction models do not utilize the extracted multilevel features adequately. In this paper, we introduce a novel saliency prediction framework called multilevel aggregation network (MLANet) to explicitly model the multilevel features of binocular images. We split the multilevel features into shallow and deep features using the ResNet-34 backbone. We consider the first three and last three stage features as shallow and deep features, respectively for saliency prediction reconstruction. First, we introduce the aggregation module to integrate adjacent shallow and deep features. Thereafter, we utilize shallow aggregation and multiscale-aware modules to locate objects of different sizes with features from adjacent levels. For better reconstruction, we integrate deep- and shallow-level features with the help of feature guidance and deploy dual-attention modules to select the discriminative enhanced characteristics. Experimental results on two benchmark datasets, NCTU (CC of 0.8575, KLDiv of 0.2648, AUC of 0.8856, and NSS of 2.0138) and S3D (CC of 0.7977, KLDiv of 0.2024, AUC of 0.7954, and NSS of 1.2963), for binocular eye-fixation prediction show that the proposed MLANet outperforms the other compared methods.
显著性预测是计算机视觉中一个未被充分开发的基础任务,特别是对于双目图像。对于双眼图像,显著性预测的目的是寻找视觉上最显著的部分来模仿人类视觉系统的运作。现有的显著性预测模型没有充分利用提取的多层次特征。在本文中,我们引入了一种新的显著性预测框架,称为多层聚集网络(MLANet),以显式地建模双眼图像的多层特征。我们使用ResNet-34骨干网将多层特征分为浅特征和深特征。我们将前三个阶段和后三个阶段特征分别视为浅层和深层特征,用于显著性预测重建。首先,我们引入聚合模块来整合相邻的浅层和深层特征。然后,我们利用浅聚集和多尺度感知模块来定位具有相邻层特征的不同大小的目标。为了更好地进行重建,我们利用特征引导将深层和浅层特征融合在一起,并部署双关注模块来选择有区别的增强特征。在NCTU (CC为0.8575,KLDiv为0.2648,AUC为0.8856,NSS为2.0138)和S3D (CC为0.7977,KLDiv为0.2024,AUC为0.7954,NSS为1.2963)两个基准数据集上进行双目注视预测的实验结果表明,本文提出的MLANet方法优于其他对比方法。
{"title":"MLANet: Multilevel aggregation network for binocular eye-fixation prediction","authors":"Wujie Zhou ,&nbsp;Jiabao Ma ,&nbsp;Yulai Zhang ,&nbsp;Lu Yu ,&nbsp;Weijia Gao ,&nbsp;Ting Luo","doi":"10.1016/j.image.2025.117434","DOIUrl":"10.1016/j.image.2025.117434","url":null,"abstract":"<div><div>Saliency prediction is an underexplored but fundamental task in computer vision, especially for binocular images. For binocular images, saliency prediction aims to find the most visually distinctive parts to imitate the operation of the human visual system. Existing saliency prediction models do not utilize the extracted multilevel features adequately. In this paper, we introduce a novel saliency prediction framework called multilevel aggregation network (MLANet) to explicitly model the multilevel features of binocular images. We split the multilevel features into shallow and deep features using the ResNet-34 backbone. We consider the first three and last three stage features as shallow and deep features, respectively for saliency prediction reconstruction. First, we introduce the aggregation module to integrate adjacent shallow and deep features. Thereafter, we utilize shallow aggregation and multiscale-aware modules to locate objects of different sizes with features from adjacent levels. For better reconstruction, we integrate deep- and shallow-level features with the help of feature guidance and deploy dual-attention modules to select the discriminative enhanced characteristics. Experimental results on two benchmark datasets, NCTU (CC of 0.8575, KLDiv of 0.2648, AUC of 0.8856, and NSS of 2.0138) and S3D (CC of 0.7977, KLDiv of 0.2024, AUC of 0.7954, and NSS of 1.2963), for binocular eye-fixation prediction show that the proposed MLANet outperforms the other compared methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117434"},"PeriodicalIF":2.7,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ad-MNet with FConv: FPGA-enabled advanced MobileNet model with fast convolution accelerator for image resolution and quality enhancement Ad-MNet with FConv:支持fpga的高级MobileNet模型,具有快速卷积加速器,可提高图像分辨率和质量
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-07 DOI: 10.1016/j.image.2025.117433
L. Malathi
In recent decades, image processing professionals have focused a great deal of emphasis on single image super-resolution (SISR), which attempts to reconstruct a high-resolution (HR) image from a low-resolution (LR) image. Particularly, deep learning-based super-resolution (SR) methods have attracted a lot of interest and significantly enhanced reconstruction performance on synthetic data. However, their applications are limited to real-world SR with resource-constrained devices because of their huge number of convolutions and parameters. This requires significant computational costs and memory storage for SR model training. To tackle these issues, an improved deep learning model is developed with the use of the fast convolution (FConv) accelerator and hybrid multiplier architecture. Before performing the quality enhancement procedure, the adaptive Fast Normalized Least Mean Square algorithm-based filtering method is initially applied to perform the denoising process. Then, the Advanced MobileNet (Ad-MNet) model with FConv accelerator is proposed to improve the image quality as well as resolution for recovering lost information during image acquisition. Also, a Hybrid Parallel Adder-based Multiplier (HPAM) is designed to perform the multiplication operation in FConv to speed up the convolution operation. The proposed accelerator is executed using MATLAB and Xilinx Verilog tools, and the performance analysis is done for different metrics. The study of the results shows that the accuracy of the proposed model is 98 % for the image resolution process and 98.9 % for the image enhancement process.
近几十年来,图像处理专业人员将重点放在了单图像超分辨率(SISR)上,它试图从低分辨率(LR)图像中重建高分辨率(HR)图像。特别是,基于深度学习的超分辨率(SR)方法引起了人们的广泛关注,并显著提高了合成数据的重建性能。然而,由于其大量的卷积和参数,它们的应用仅限于具有资源受限设备的现实SR。这需要显著的计算成本和SR模型训练的内存存储。为了解决这些问题,使用快速卷积(FConv)加速器和混合乘法器架构开发了一种改进的深度学习模型。在进行质量增强之前,首先采用基于自适应快速归一化最小均方算法的滤波方法进行去噪处理。然后,提出了带FConv加速器的高级移动网络(Ad-MNet)模型,以提高图像质量和图像采集过程中丢失信息的恢复分辨率。此外,设计了一个基于混合并行加法器的乘法器(HPAM)来执行FConv中的乘法运算,以加快卷积运算。使用MATLAB和Xilinx Verilog工具执行了所提出的加速器,并对不同指标进行了性能分析。研究结果表明,该模型在图像分辨率处理和图像增强处理的准确率分别为98%和98.9%。
{"title":"Ad-MNet with FConv: FPGA-enabled advanced MobileNet model with fast convolution accelerator for image resolution and quality enhancement","authors":"L. Malathi","doi":"10.1016/j.image.2025.117433","DOIUrl":"10.1016/j.image.2025.117433","url":null,"abstract":"<div><div>In recent decades, image processing professionals have focused a great deal of emphasis on single image super-resolution (SISR), which attempts to reconstruct a high-resolution (HR) image from a low-resolution (LR) image. Particularly, deep learning-based super-resolution (SR) methods have attracted a lot of interest and significantly enhanced reconstruction performance on synthetic data. However, their applications are limited to real-world SR with resource-constrained devices because of their huge number of convolutions and parameters. This requires significant computational costs and memory storage for SR model training. To tackle these issues, an improved deep learning model is developed with the use of the fast convolution (FConv) accelerator and hybrid multiplier architecture. Before performing the quality enhancement procedure, the adaptive Fast Normalized Least Mean Square algorithm-based filtering method is initially applied to perform the denoising process. Then, the Advanced MobileNet (Ad-MNet) model with FConv accelerator is proposed to improve the image quality as well as resolution for recovering lost information during image acquisition. Also, a Hybrid Parallel Adder-based Multiplier (HPAM) is designed to perform the multiplication operation in FConv to speed up the convolution operation. The proposed accelerator is executed using MATLAB and Xilinx Verilog tools, and the performance analysis is done for different metrics. The study of the results shows that the accuracy of the proposed model is 98 % for the image resolution process and 98.9 % for the image enhancement process.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117433"},"PeriodicalIF":2.7,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145568191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UIT-OpenViIC: An open-domain benchmark for evaluating image captioning in Vietnamese unit - openviic:一个用于评估越南语图像字幕的开放域基准
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-31 DOI: 10.1016/j.image.2025.117430
Doanh C. Bui , Nghia Hieu Nguyen , Khang Nguyen
Image captioning is one of the vision-language tasks that continues to attract interest from the research community worldwide in the 2020s. The MS-COCO Caption benchmark is commonly used to evaluate the performance of advanced captioning models, even though it was introduced in 2015. However, recent captioning models trained on the MS-COCO Caption dataset perform well only in English language patterns; they do not perform as effectively in describing contexts specific to Vietnam or in generating fluent Vietnamese captions. To contribute to the low-resources research community as in Vietnam, we introduce a novel image captioning dataset in Vietnamese, the Open-domain Vietnamese Image Captioning dataset (UIT-OpenViIC). The introduced dataset includes complex scenes captured in Vietnam and manually annotated by Vietnamese under strict rules and supervision. In this paper, we present in more detail the dataset creation process. From preliminary analysis, we show that our dataset is challenging to recent state-of-the-art (SOTA) Transformer-based baselines, which performed well on the MS COCO dataset. Then, the modest results prove that UIT-OpenViIC has room to grow, which can be one of the standard benchmarks in Vietnamese for the research community to evaluate their captioning models. Furthermore, we present a CAMO approach that effectively enhances the image representation ability by a multi-level encoder output fusion mechanism, which helps improve the quality of generated captions compared to previous captioning models. In our experiments, we show that our dataset is more diverse and challenging than the MS-COCO caption dataset, as indicated by the significantly lower CIDEr scores on our testing set, ranging from 59.52 to 62.47 compared to MS-COCO. For the CAMO approach, experiments on UIT-OpenViIC show that when equipped with a captioning baseline model, it can improve performance by 0.8970 to 4.9167 CIDEr.
图像字幕是21世纪20年代继续吸引全球研究界兴趣的视觉语言任务之一。MS-COCO字幕基准通常用于评估高级字幕模型的性能,尽管它是在2015年引入的。然而,最近在MS-COCO字幕数据集上训练的字幕模型仅在英语语言模式下表现良好;它们在描述越南特定的上下文或生成流利的越南语字幕方面表现不佳。为了对越南的低资源研究界做出贡献,我们引入了一个新的越南语图像字幕数据集,开放域越南语图像字幕数据集(unit - openviic)。引入的数据集包括在越南捕获的复杂场景,并在严格的规则和监督下由越南人手动注释。在本文中,我们更详细地介绍了数据集的创建过程。从初步分析来看,我们的数据集对最近最先进的(SOTA)基于transformer的基线具有挑战性,后者在MS COCO数据集上表现良好。然后,适度的结果证明了unit - openviic有增长的空间,它可以成为越南研究界评估其标题模型的标准基准之一。此外,我们提出了一种通过多级编码器输出融合机制有效增强图像表示能力的CAMO方法,与以前的字幕模型相比,这有助于提高生成字幕的质量。在我们的实验中,我们证明了我们的数据集比MS-COCO标题数据集更具多样性和挑战性,正如我们的测试集上的CIDEr分数显著低于MS-COCO所表明的那样,范围从59.52到62.47。对于CAMO方法,在unit - openviic上的实验表明,当配备字幕基线模型时,可以将性能提高0.8970 ~ 4.9167 CIDEr。
{"title":"UIT-OpenViIC: An open-domain benchmark for evaluating image captioning in Vietnamese","authors":"Doanh C. Bui ,&nbsp;Nghia Hieu Nguyen ,&nbsp;Khang Nguyen","doi":"10.1016/j.image.2025.117430","DOIUrl":"10.1016/j.image.2025.117430","url":null,"abstract":"<div><div>Image captioning is one of the vision-language tasks that continues to attract interest from the research community worldwide in the 2020s. The MS-COCO Caption benchmark is commonly used to evaluate the performance of advanced captioning models, even though it was introduced in 2015. However, recent captioning models trained on the MS-COCO Caption dataset perform well only in English language patterns; they do not perform as effectively in describing contexts specific to Vietnam or in generating fluent Vietnamese captions. To contribute to the low-resources research community as in Vietnam, we introduce a novel image captioning dataset in Vietnamese, the <strong>Open</strong>-domain <strong>Vi</strong>etnamese <strong>I</strong>mage <strong>C</strong>aptioning dataset (UIT-OpenViIC). The introduced dataset includes complex scenes captured in Vietnam and manually annotated by Vietnamese under strict rules and supervision. In this paper, we present in more detail the dataset creation process. From preliminary analysis, we show that our dataset is challenging to recent state-of-the-art (SOTA) Transformer-based baselines, which performed well on the MS COCO dataset. Then, the modest results prove that UIT-OpenViIC has room to grow, which can be one of the standard benchmarks in Vietnamese for the research community to evaluate their captioning models. Furthermore, we present a CAMO approach that effectively enhances the image representation ability by a multi-level encoder output fusion mechanism, which helps improve the quality of generated captions compared to previous captioning models. In our experiments, we show that our dataset is more diverse and challenging than the MS-COCO caption dataset, as indicated by the significantly lower CIDEr scores on our testing set, ranging from 59.52 to 62.47 compared to MS-COCO. For the CAMO approach, experiments on UIT-OpenViIC show that when equipped with a captioning baseline model, it can improve performance by 0.8970 to 4.9167 CIDEr.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117430"},"PeriodicalIF":2.7,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDRUnet3D: High dynamic range image reconstruction network with residual and illumination maps HDRUnet3D:带有残差和光照贴图的高动态范围图像重建网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-30 DOI: 10.1016/j.image.2025.117428
Fengyuan Wu , Jiuzhe Wei , Qinghong Sheng , Xiang Liu , Bo Wang , Jun Li
In the process of reconstructing high dynamic range (HDR) images from multi-exposed low dynamic range (LDR) images, removing the ghosting artifacts induced by the movement of objects and the misalignment of dynamic scenes is a pivotal challenge. Previous methods attempt to mitigate ghosting artifacts using optical flow registration or motion pixel rejection. However, these approaches are prone to errors due to inaccuracies in motion estimation and the lack of effective information guidance during pixel exclusion. This article proposes a novel attention-guided end-to-end deep neural network named HDRUnet3d, which treats multi-exposed LDR images as video stream and utilizes 3D convolution to extract both temporal and spatial features, enabling the network to adaptively capture the temporal dynamics of moving objects. Moreover, two information-guided feature enhancement modules is proposed, consisting of the Residual Map-Guided Attention Module and the Illumination-guided Local Enhanced Module. The former utilizes gamma-corrected residual images to guide the learning of temporal motion semantics information, while the latter adaptively enhance local features based on illumination maps. Additionally, Global Asymmetric Semantic Fusion is proposed to integrate multi-scale features, enriching high-level feature representation. HDRUnet3d achieves state-of-the-art performance on different datasets, demonstrating the effectiveness and robustness of the proposed method.
在从多曝光低动态范围(LDR)图像重构高动态范围(HDR)图像的过程中,消除物体运动和动态场景不对齐引起的重影是一个关键的挑战。以前的方法尝试使用光流配准或运动像素抑制来减轻重影伪影。然而,这些方法由于运动估计不准确以及在排除像素时缺乏有效的信息引导而容易产生误差。本文提出了一种新的注意力引导端到端深度神经网络HDRUnet3d,该网络将多曝光LDR图像作为视频流,利用三维卷积提取时间和空间特征,使网络能够自适应捕捉运动物体的时间动态。在此基础上,提出了残差地图引导注意力模块和光照引导局部增强模块两个信息引导特征增强模块。前者利用伽玛校正后的残差图像指导时间运动语义信息的学习,后者基于光照图自适应增强局部特征。此外,还提出了全局非对称语义融合,以整合多尺度特征,丰富高级特征表示。HDRUnet3d在不同的数据集上实现了最先进的性能,证明了所提出方法的有效性和鲁棒性。
{"title":"HDRUnet3D: High dynamic range image reconstruction network with residual and illumination maps","authors":"Fengyuan Wu ,&nbsp;Jiuzhe Wei ,&nbsp;Qinghong Sheng ,&nbsp;Xiang Liu ,&nbsp;Bo Wang ,&nbsp;Jun Li","doi":"10.1016/j.image.2025.117428","DOIUrl":"10.1016/j.image.2025.117428","url":null,"abstract":"<div><div>In the process of reconstructing high dynamic range (HDR) images from multi-exposed low dynamic range (LDR) images, removing the ghosting artifacts induced by the movement of objects and the misalignment of dynamic scenes is a pivotal challenge. Previous methods attempt to mitigate ghosting artifacts using optical flow registration or motion pixel rejection. However, these approaches are prone to errors due to inaccuracies in motion estimation and the lack of effective information guidance during pixel exclusion. This article proposes a novel attention-guided end-to-end deep neural network named HDRUnet3d, which treats multi-exposed LDR images as video stream and utilizes 3D convolution to extract both temporal and spatial features, enabling the network to adaptively capture the temporal dynamics of moving objects. Moreover, two information-guided feature enhancement modules is proposed, consisting of the Residual Map-Guided Attention Module and the Illumination-guided Local Enhanced Module. The former utilizes gamma-corrected residual images to guide the learning of temporal motion semantics information, while the latter adaptively enhance local features based on illumination maps. Additionally, Global Asymmetric Semantic Fusion is proposed to integrate multi-scale features, enriching high-level feature representation. HDRUnet3d achieves state-of-the-art performance on different datasets, demonstrating the effectiveness and robustness of the proposed method.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117428"},"PeriodicalIF":2.7,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MGMSDNet: Multi gradient multi scale attention driven denoiser network MGMSDNet:多梯度多尺度注意力驱动去噪网络
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-27 DOI: 10.1016/j.image.2025.117426
Debashis Das, Suman Kumar Maji
Image denoising is essential in applications such as medical imaging, remote sensing, and photography. Despite advancements in deep learning, denoising models still face key limitations. Most state-of-the-art methods increase network depth to boost performance, leading to higher computational costs, complex training, and diminishing returns. Moreover, the role of gradient information and negative image features in denoising is often overlooked, limiting the ability to capture fine structures. Our observations reveal that excessively deep networks can reduce denoising performance by introducing redundancy and complicating feature extraction. To address this, we propose MGMSDNet, a Gradient-Guided Convolutional Neural Network (CNN) with attention mechanisms that balance denoising performance and computational efficiency. MGMSDNet introduces a unique attention framework that utilizes multidirectional gradients and negative image features separately, enhancing structural preservation and noise suppression. To the best of our knowledge, this study is the first to explore multidirectional gradients for image denoising literature. MGMSDNet surpasses state-of-the-art methods on benchmark datasets, confirmed by quantitative metrics and visual comparisons. Ablation studies highlight the effectiveness of individual network components. For more details and implementation, visit our GitHub repository: MGMSDNet.
图像去噪在医学成像、遥感和摄影等应用中是必不可少的。尽管深度学习取得了进步,但去噪模型仍然面临着关键的局限性。大多数最先进的方法增加网络深度以提高性能,导致更高的计算成本、复杂的训练和递减的回报。此外,梯度信息和负图像特征在去噪中的作用往往被忽视,限制了捕捉精细结构的能力。我们的观察表明,过深的网络会通过引入冗余和复杂的特征提取来降低去噪性能。为了解决这个问题,我们提出了MGMSDNet,一种梯度引导卷积神经网络(CNN),它具有平衡去噪性能和计算效率的注意机制。MGMSDNet引入了一种独特的注意力框架,该框架分别利用多向梯度和负图像特征,增强了结构保存和噪声抑制。据我们所知,本研究是第一个探索多向梯度图像去噪的文献。MGMSDNet在基准数据集上超越了最先进的方法,得到了定量指标和视觉比较的证实。消融研究强调了单个网络组成部分的有效性。要了解更多细节和实现,请访问我们的GitHub存储库:MGMSDNet。
{"title":"MGMSDNet: Multi gradient multi scale attention driven denoiser network","authors":"Debashis Das,&nbsp;Suman Kumar Maji","doi":"10.1016/j.image.2025.117426","DOIUrl":"10.1016/j.image.2025.117426","url":null,"abstract":"<div><div>Image denoising is essential in applications such as medical imaging, remote sensing, and photography. Despite advancements in deep learning, denoising models still face key limitations. Most state-of-the-art methods increase network depth to boost performance, leading to higher computational costs, complex training, and diminishing returns. Moreover, the role of gradient information and negative image features in denoising is often overlooked, limiting the ability to capture fine structures. Our observations reveal that excessively deep networks can reduce denoising performance by introducing redundancy and complicating feature extraction. To address this, we propose MGMSDNet, a Gradient-Guided Convolutional Neural Network (CNN) with attention mechanisms that balance denoising performance and computational efficiency. MGMSDNet introduces a unique attention framework that utilizes multidirectional gradients and negative image features separately, enhancing structural preservation and noise suppression. To the best of our knowledge, this study is the first to explore multidirectional gradients for image denoising literature. MGMSDNet surpasses state-of-the-art methods on benchmark datasets, confirmed by quantitative metrics and visual comparisons. Ablation studies highlight the effectiveness of individual network components. For more details and implementation, visit our GitHub repository: <span><span>MGMSDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117426"},"PeriodicalIF":2.7,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrast and clustering: Learning neighborhood pair representation for source-free domain adaptation 对比与聚类:学习无源域自适应的邻域对表示
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-24 DOI: 10.1016/j.image.2025.117429
Yuqi Chen , Xiangbin Zhu , Yonggang Li , Yingjian Li , Haojie Fang
Unsupervised domain adaptation aims to address the challenge of classifying data from unlabeled target domains by leveraging source data from different distributions. However, conventional methods often necessitate access to source data, raising concerns about data privacy. In this paper, we tackle a more practical yet challenging scenario where the source domain data is unavailable, and the target domain data remains unlabeled. To address the domain discrepancy problem, we propose a novel approach from the perspective of contrastive learning. Our key idea revolves around learning a domain-invariant feature by: (1) constructing abundant pairs for feature learning by utilizing neighboring samples. (2) Refining negative pairs pool reduces learning confusion; and (3) combining noise-contrastive theory simplifies the function effectively. Through careful ablation studies and extensive experiments on three common benchmarks, VisDA, Office-Home, and Office-31, we demonstrate the superiority of our method over other state-of-the-art works. Our proposed approach not only offers practicality by alleviating the requirement of source domain data but also achieves remarkable performance in handling domain adaptation challenges. The code is available at https://github.com/yukilulu/CaC.
无监督域自适应旨在通过利用来自不同分布的源数据来解决来自未标记目标域的数据分类的挑战。然而,传统方法通常需要访问源数据,这引起了对数据隐私的担忧。在本文中,我们处理了一个更实际但更具挑战性的场景,即源领域数据不可用,而目标领域数据仍然未标记。为了解决领域差异问题,我们从对比学习的角度提出了一种新的方法。我们的核心思想是通过以下方法来学习域不变特征:(1)利用邻近样本构造丰富的特征对进行特征学习。(2)细化负对池,减少学习混淆;(3)结合噪声对比理论有效地简化了函数。通过对三个常见基准(VisDA、Office-Home和Office-31)的仔细研究和广泛实验,我们证明了我们的方法优于其他最先进的工作。该方法不仅减轻了对源域数据的需求,具有实用性,而且在处理域自适应挑战方面取得了显著的效果。代码可在https://github.com/yukilulu/CaC上获得。
{"title":"Contrast and clustering: Learning neighborhood pair representation for source-free domain adaptation","authors":"Yuqi Chen ,&nbsp;Xiangbin Zhu ,&nbsp;Yonggang Li ,&nbsp;Yingjian Li ,&nbsp;Haojie Fang","doi":"10.1016/j.image.2025.117429","DOIUrl":"10.1016/j.image.2025.117429","url":null,"abstract":"<div><div>Unsupervised domain adaptation aims to address the challenge of classifying data from unlabeled target domains by leveraging source data from different distributions. However, conventional methods often necessitate access to source data, raising concerns about data privacy. In this paper, we tackle a more practical yet challenging scenario where the source domain data is unavailable, and the target domain data remains unlabeled. To address the domain discrepancy problem, we propose a novel approach from the perspective of contrastive learning. Our key idea revolves around learning a domain-invariant feature by: (1) constructing abundant pairs for feature learning by utilizing neighboring samples. (2) Refining negative pairs pool reduces learning confusion; and (3) combining noise-contrastive theory simplifies the function effectively. Through careful ablation studies and extensive experiments on three common benchmarks, VisDA, Office-Home, and Office-31, we demonstrate the superiority of our method over other state-of-the-art works. Our proposed approach not only offers practicality by alleviating the requirement of source domain data but also achieves remarkable performance in handling domain adaptation challenges. The code is available at <span><span>https://github.com/yukilulu/CaC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117429"},"PeriodicalIF":2.7,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dark light image recognition technology based on improved ssa and object detection 基于改进ssa和目标检测的暗光图像识别技术
IF 2.7 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-22 DOI: 10.1016/j.image.2025.117427
Yuan Xu , Shaobo Cui , Fan Feng , Hao Wang
Images in dark light environments have issues such as low contrast, high noise, and loss of details, which seriously affect the accuracy of target recognition. Based on this, a dark light image recognition technology based on improved squirrel search algorithm and object detection is developed, aiming to improve image quality and recognition accuracy in dark light environments by optimizing image enhancement and object detection algorithms. This study proposes an improved squirrel search algorithm that optimizes the image enhancement process through strategies such as bidirectional search, spiral foraging, and greedy selection. It combines a cyclic pixel adjustment module, a multi branch feature extraction network, and an object detection architecture adapted to low light environments to build a complete dark light image recognition model. The experimental findings denote that the PSNR of the image enhancement algorithm grounded on the improved squirrel search algorithm is 26.74 dB, and the structural similarity is 0.85, which is significantly better than the comparison algorithm. The average accuracy of the object detection part under dark light conditions is 67.9 %, which is better than the comparison algorithm. The ablation experiment verifies the effectiveness of the improved strategy, and the overall class average accuracy of the complete model under extreme low light conditions is 66.4 %. The proposed recognition model performs well in dark light image enhancement and object detection tasks, combining high accuracy and robustness, providing an effective solution for image recognition under complex lighting conditions.
暗光环境下的图像存在对比度低、噪声大、细节缺失等问题,严重影响目标识别的准确性。在此基础上,开发了一种基于改进松鼠搜索算法和目标检测的暗光图像识别技术,旨在通过优化图像增强和目标检测算法,提高暗光环境下的图像质量和识别精度。本文提出了一种改进的松鼠搜索算法,通过双向搜索、螺旋觅食和贪婪选择等策略优化图像增强过程。结合循环像素调整模块、多分支特征提取网络和适应弱光环境的目标检测架构,构建完整的暗光图像识别模型。实验结果表明,基于改进松鼠搜索算法的图像增强算法的PSNR为26.74 dB,结构相似度为0.85,明显优于对比算法。暗光照条件下目标检测部分的平均准确率为67.9%,优于对比算法。烧蚀实验验证了改进策略的有效性,在极弱光条件下,完整模型的总体类平均精度为66.4%。该识别模型在暗光图像增强和目标检测任务中表现良好,结合了高精度和鲁棒性,为复杂光照条件下的图像识别提供了有效的解决方案。
{"title":"Dark light image recognition technology based on improved ssa and object detection","authors":"Yuan Xu ,&nbsp;Shaobo Cui ,&nbsp;Fan Feng ,&nbsp;Hao Wang","doi":"10.1016/j.image.2025.117427","DOIUrl":"10.1016/j.image.2025.117427","url":null,"abstract":"<div><div>Images in dark light environments have issues such as low contrast, high noise, and loss of details, which seriously affect the accuracy of target recognition. Based on this, a dark light image recognition technology based on improved squirrel search algorithm and object detection is developed, aiming to improve image quality and recognition accuracy in dark light environments by optimizing image enhancement and object detection algorithms. This study proposes an improved squirrel search algorithm that optimizes the image enhancement process through strategies such as bidirectional search, spiral foraging, and greedy selection. It combines a cyclic pixel adjustment module, a multi branch feature extraction network, and an object detection architecture adapted to low light environments to build a complete dark light image recognition model. The experimental findings denote that the PSNR of the image enhancement algorithm grounded on the improved squirrel search algorithm is 26.74 dB, and the structural similarity is 0.85, which is significantly better than the comparison algorithm. The average accuracy of the object detection part under dark light conditions is 67.9 %, which is better than the comparison algorithm. The ablation experiment verifies the effectiveness of the improved strategy, and the overall class average accuracy of the complete model under extreme low light conditions is 66.4 %. The proposed recognition model performs well in dark light image enhancement and object detection tasks, combining high accuracy and robustness, providing an effective solution for image recognition under complex lighting conditions.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117427"},"PeriodicalIF":2.7,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1