Signal Processing-Image Communication最新文献_第3页

Empower network to comprehend: Semantic guided and attention fusion GAN for underwater image enhancement

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-17 DOI: 10.1016/j.image.2025.117271

Xiao Liu , Ziwei Liu , Li Yu

In fields such as underwater exploration, acquiring clear and precise imagery is paramount for gathering diverse underwater information. Consequently, the development of robust underwater image enhancement (UIE) algorithms is of great significance. Leveraged by advancements in deep learning, UIE research has achieved substantial progress. Addressing the scarcity of underwater datasets and the imperative to refine the quality of enhanced reference images, this paper introduces a novel semantic-guided network architecture, termed SGAF-GAN. This model utilizes semantic information as an ancillary supervisory signal within the UIE network, steering the enhancement process towards semantically relevant areas while ameliorating issues with image edge blurriness. Moreover, in scenarios where rare image degradation co-occurs with semantically pertinent features, semantic information furnishes the network with prior knowledge, bolstering model performance and generalizability. This study integrates a feature attention fusion mechanism to preserve context information and amplify the influence of semantic guidance during cross-domain integration. Given the variable degradation in underwater images, the combination of spatial and channel attention empowers the network to assign more accurate weights to the most adversely affected regions, thereby elevating the overall image enhancement efficacy. Empirical evaluations demonstrate that SGAF-GAN excels across various real underwater datasets, aligning with human visual perception standards. On the SUIM dataset, SGAF-GAN achieves a PSNR of 24.30 and an SSIM of 0.9144.

{"title":"Empower network to comprehend: Semantic guided and attention fusion GAN for underwater image enhancement","authors":"Xiao Liu , Ziwei Liu , Li Yu","doi":"10.1016/j.image.2025.117271","DOIUrl":"10.1016/j.image.2025.117271","url":null,"abstract":"<div><div>In fields such as underwater exploration, acquiring clear and precise imagery is paramount for gathering diverse underwater information. Consequently, the development of robust underwater image enhancement (UIE) algorithms is of great significance. Leveraged by advancements in deep learning, UIE research has achieved substantial progress. Addressing the scarcity of underwater datasets and the imperative to refine the quality of enhanced reference images, this paper introduces a novel semantic-guided network architecture, termed SGAF-GAN. This model utilizes semantic information as an ancillary supervisory signal within the UIE network, steering the enhancement process towards semantically relevant areas while ameliorating issues with image edge blurriness. Moreover, in scenarios where rare image degradation co-occurs with semantically pertinent features, semantic information furnishes the network with prior knowledge, bolstering model performance and generalizability. This study integrates a feature attention fusion mechanism to preserve context information and amplify the influence of semantic guidance during cross-domain integration. Given the variable degradation in underwater images, the combination of spatial and channel attention empowers the network to assign more accurate weights to the most adversely affected regions, thereby elevating the overall image enhancement efficacy. Empirical evaluations demonstrate that SGAF-GAN excels across various real underwater datasets, aligning with human visual perception standards. On the SUIM dataset, SGAF-GAN achieves a PSNR of 24.30 and an SSIM of 0.9144.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117271"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAR-CDCFRN: A novel SAR despeckling approach utilizing correlated dual channel feature-based residual network

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117267

Anirban Saha, Arihant K.R., Suman Kumar Maji

As a result of the increasing need for capturing and processing visual data of the Earth’s surface, Synthetic Aperture Radar (SAR) technology has been widely embraced by all space research organisations. The primary drawback in the acquired SAR visuals (images) is the presence of unwanted granular noise, called “speckle”, which poses a limitation to their processing and analysis. Therefore removing this unwanted speckle noise from the captured SAR visuals, a process known as despeckling, becomes an important task. This article introduces a new despeckling residual network named SAR-CDCFRN. This network simultaneously extracts speckle components from both the spatial and inverse spatial channels. The extracted features are then correlated by a dual-layer attention block and further processed to predict the distribution of speckle in the input noisy image. The predicted distribution, which is the residual noise, is then mapped with the input noisy SAR data to generate a despeckled output image. Experimental results confirm the superiority of the proposed despeckling model over other existing technologies in the literature.

{"title":"SAR-CDCFRN: A novel SAR despeckling approach utilizing correlated dual channel feature-based residual network","authors":"Anirban Saha, Arihant K.R., Suman Kumar Maji","doi":"10.1016/j.image.2025.117267","DOIUrl":"10.1016/j.image.2025.117267","url":null,"abstract":"<div><div>As a result of the increasing need for capturing and processing visual data of the Earth’s surface, Synthetic Aperture Radar (SAR) technology has been widely embraced by all space research organisations. The primary drawback in the acquired SAR visuals (images) is the presence of unwanted granular noise, called “speckle”, which poses a limitation to their processing and analysis. Therefore removing this unwanted speckle noise from the captured SAR visuals, a process known as despeckling, becomes an important task. This article introduces a new despeckling residual network named SAR-CDCFRN. This network simultaneously extracts speckle components from both the spatial and inverse spatial channels. The extracted features are then correlated by a dual-layer attention block and further processed to predict the distribution of speckle in the input noisy image. The predicted distribution, which is the residual noise, is then mapped with the input noisy SAR data to generate a despeckled output image. Experimental results confirm the superiority of the proposed despeckling model over other existing technologies in the literature.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117267"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximation-based energy-efficient cyber-secured image classification framework

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117261

M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher

In this work, an energy-efficient cyber-secured framework for deep learning-based image classification is proposed. This simultaneously addresses two major concerns in relevant applications, which are typically handled separately in the existing works. An image approximation-based data storage scheme to improve the efficiency of memory usage while reducing energy consumption at both the source and user ends is discussed. Also, the proposed framework mitigates the impacts of two different adversarial attacks, notably retaining performance. The experimental analysis signifies the academic and industrial importance of this work as it demonstrates reductions of 62.5% in energy consumption for image classification when accessing memory and in the effective memory sizes of both ends by the same amount. During the improvement of memory efficiency, the multi-scale structural similarity index measure (MS-SSIM) is found to be the optimum image quality assessment method among different similarity-based metrics for the image classification task with approximated images and an average image quality of 0.9449 in terms of MS-SSIM is maintained. Also, a comparative analysis of three different classifiers with different depths indicates that the proposed scheme maintains up to 90.17% of original classification accuracy under normal and cyber-attack scenarios, effectively defending against untargeted and targeted white-box adversarial attacks with varying parameters.

{"title":"Approximation-based energy-efficient cyber-secured image classification framework","authors":"M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher","doi":"10.1016/j.image.2025.117261","DOIUrl":"10.1016/j.image.2025.117261","url":null,"abstract":"<div><div>In this work, an energy-efficient cyber-secured framework for deep learning-based image classification is proposed. This simultaneously addresses two major concerns in relevant applications, which are typically handled separately in the existing works. An image approximation-based data storage scheme to improve the efficiency of memory usage while reducing energy consumption at both the source and user ends is discussed. Also, the proposed framework mitigates the impacts of two different adversarial attacks, notably retaining performance. The experimental analysis signifies the academic and industrial importance of this work as it demonstrates reductions of 62.5% in energy consumption for image classification when accessing memory and in the effective memory sizes of both ends by the same amount. During the improvement of memory efficiency, the multi-scale structural similarity index measure (MS-SSIM) is found to be the optimum image quality assessment method among different similarity-based metrics for the image classification task with approximated images and an average image quality of 0.9449 in terms of MS-SSIM is maintained. Also, a comparative analysis of three different classifiers with different depths indicates that the proposed scheme maintains up to 90.17% of original classification accuracy under normal and cyber-attack scenarios, effectively defending against untargeted and targeted white-box adversarial attacks with varying parameters.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117261"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spiking two-stream methods with unsupervised STDP-based learning for action recognition

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117263

Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco

Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep learning methods are currently the state-of-the-art methods for video analysis. Particularly, two-stream methods, which leverage both spatial and temporal information, have proven to be valuable in Human Action Recognition (HAR). However, they have high computational costs, and need a large amount of labeled data for training. In addressing these challenges, this paper adopts a more efficient approach by leveraging Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes, which allows the network to be more energy efficient when implemented on neuromorphic hardware. Furthermore, learning visual features with unsupervised learning reduces the need for labeled data during training, making the approach doubly advantageous. Therefore, we explore transposing two-stream convolutional neural networks into the spiking domain, where we train each stream with the unsupervised STDP learning rule. We investigate the performance of these networks in video analysis by employing five distinct configurations for the temporal stream, and evaluate them across four benchmark HAR datasets. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that replacing a dedicated temporal stream with a spatio-temporal one within a spiking two-stream architecture leads to information redundancy that hinders the performance.

{"title":"Spiking two-stream methods with unsupervised STDP-based learning for action recognition","authors":"Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco","doi":"10.1016/j.image.2025.117263","DOIUrl":"10.1016/j.image.2025.117263","url":null,"abstract":"<div><div>Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep learning methods are currently the state-of-the-art methods for video analysis. Particularly, two-stream methods, which leverage both spatial and temporal information, have proven to be valuable in Human Action Recognition (HAR). However, they have high computational costs, and need a large amount of labeled data for training. In addressing these challenges, this paper adopts a more efficient approach by leveraging Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes, which allows the network to be more energy efficient when implemented on neuromorphic hardware. Furthermore, learning visual features with unsupervised learning reduces the need for labeled data during training, making the approach doubly advantageous. Therefore, we explore transposing two-stream convolutional neural networks into the spiking domain, where we train each stream with the unsupervised STDP learning rule. We investigate the performance of these networks in video analysis by employing five distinct configurations for the temporal stream, and evaluate them across four benchmark HAR datasets. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that replacing a dedicated temporal stream with a spatio-temporal one within a spiking two-stream architecture leads to information redundancy that hinders the performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117263"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conditional Laplacian pyramid networks for exposure correction

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117276

Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li

Improper exposures greatly degenerate the visual quality of images. Correcting various exposure errors in a unified framework is challenging as it requires simultaneously handling global attributes and local details under different exposure conditions. In this paper, we propose a conditional Laplacian pyramid network (CLPN) for correcting different exposure errors in the same framework. It applies Laplacian pyramid to decompose an improperly exposed image into a low-frequency (LF) component and several high-frequency (HF) components, and then enhances the decomposed components in a coarse-to-fine manner. To consistently correct a wide range of exposure errors, a conditional feature extractor is designed to extract the conditional feature from the given image. Afterwards, the conditional feature is used to guide the refinement of LF features, so that a precisely correction for illumination, contrast and color tone can be obtained. As different frequency components exhibit pixel-wise correlations, the frequency components in lower pyramid layers are used to support the reconstruction of the HF components in higher layers. By doing so, fine details can be effectively restored, while noises can be well suppressed. Extensive experiments show that our method is more effective than state-of-the-art methods on correcting various exposure conditions ranging from severe underexposure to intense overexposure.

{"title":"Conditional Laplacian pyramid networks for exposure correction","authors":"Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li","doi":"10.1016/j.image.2025.117276","DOIUrl":"10.1016/j.image.2025.117276","url":null,"abstract":"<div><div>Improper exposures greatly degenerate the visual quality of images. Correcting various exposure errors in a unified framework is challenging as it requires simultaneously handling global attributes and local details under different exposure conditions. In this paper, we propose a conditional Laplacian pyramid network (CLPN) for correcting different exposure errors in the same framework. It applies Laplacian pyramid to decompose an improperly exposed image into a low-frequency (LF) component and several high-frequency (HF) components, and then enhances the decomposed components in a coarse-to-fine manner. To consistently correct a wide range of exposure errors, a conditional feature extractor is designed to extract the conditional feature from the given image. Afterwards, the conditional feature is used to guide the refinement of LF features, so that a precisely correction for illumination, contrast and color tone can be obtained. As different frequency components exhibit pixel-wise correlations, the frequency components in lower pyramid layers are used to support the reconstruction of the HF components in higher layers. By doing so, fine details can be effectively restored, while noises can be well suppressed. Extensive experiments show that our method is more effective than state-of-the-art methods on correcting various exposure conditions ranging from severe underexposure to intense overexposure.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117276"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph-based image captioning with semantic and spatial features

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-15 DOI: 10.1016/j.image.2025.117273

Mohammad Javad Parseh, Saeed Ghadiri

Image captioning is a challenging task of image processing that aims to generate descriptive and accurate textual descriptions for images. In this paper, we propose a novel image captioning framework that leverages the power of spatial and semantic relationships between objects in an image, in addition to traditional visual features. Our approach integrates a pre-trained model, RelTR, as a backbone for extracting object bounding boxes and subject-predicate-object relationship pairs. We use these extracted relationships to construct spatial and semantic graphs, which are processed through separate Graph Convolutional Networks (GCNs) to obtain high-level contextualized features. At the same time, a CNN model is employed to extract visual features from the input image. To merge the feature vectors seamlessly, our approach involves using a multi-modal attention mechanism that is applied separately to the feature maps of the image, the nodes of the semantic graph, and the nodes of the spatial graph during each time step of the LSTM-based decoder. The model concatenates the attended features with the word embedding at the respective time step and fed into the LSTM cell. Our experiments demonstrate the effectiveness of our proposed approach, which competes closely with existing state-of-the-art image captioning techniques by capturing richer contextual information and generating accurate and semantically meaningful captions.

{"title":"Graph-based image captioning with semantic and spatial features","authors":"Mohammad Javad Parseh, Saeed Ghadiri","doi":"10.1016/j.image.2025.117273","DOIUrl":"10.1016/j.image.2025.117273","url":null,"abstract":"<div><div>Image captioning is a challenging task of image processing that aims to generate descriptive and accurate textual descriptions for images. In this paper, we propose a novel image captioning framework that leverages the power of spatial and semantic relationships between objects in an image, in addition to traditional visual features. Our approach integrates a pre-trained model, RelTR, as a backbone for extracting object bounding boxes and subject-predicate-object relationship pairs. We use these extracted relationships to construct spatial and semantic graphs, which are processed through separate Graph Convolutional Networks (GCNs) to obtain high-level contextualized features. At the same time, a CNN model is employed to extract visual features from the input image. To merge the feature vectors seamlessly, our approach involves using a multi-modal attention mechanism that is applied separately to the feature maps of the image, the nodes of the semantic graph, and the nodes of the spatial graph during each time step of the LSTM-based decoder. The model concatenates the attended features with the word embedding at the respective time step and fed into the LSTM cell. Our experiments demonstrate the effectiveness of our proposed approach, which competes closely with existing state-of-the-art image captioning techniques by capturing richer contextual information and generating accurate and semantically meaningful captions.</div><div>© 2025 Elsevier Inc. All rights reserved.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117273"},"PeriodicalIF":3.4,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143093314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-14 DOI: 10.1016/j.image.2025.117268

Siwei Zhang , Yuantao Chen

The current prevailing techniques for image restoration predominantly employ self-encoding and decoding networks, aiming to reconstruct the original image during the decoding phase utilizing the compressed data captured during encoding. Nevertheless, the self-encoding network inherently suffers from information loss during compression, rendering it challenging to achieve nuanced restoration outcomes solely reliant on compressed information, particularly manifesting as blurred imagery and distinct edge artifacts around the restored areas. To mitigate this issue of insufficient image information utilization, we introduce a Multi-Stage Decoding Network in this study. This network leverages multiple decoders to decode and integrate features from each layer of the encoding stage, thereby enhancing the exploitation of encoder features across various scales. Subsequently, a feature mapping is derived that more accurately captures the content of the impaired region. Comparative experiments conducted on globally recognized datasets demonstrate that MSDN achieves a notable enhancement in the visual quality of restored images.

{"title":"ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network","authors":"Siwei Zhang , Yuantao Chen","doi":"10.1016/j.image.2025.117268","DOIUrl":"10.1016/j.image.2025.117268","url":null,"abstract":"<div><div>The current prevailing techniques for image restoration predominantly employ self-encoding and decoding networks, aiming to reconstruct the original image during the decoding phase utilizing the compressed data captured during encoding. Nevertheless, the self-encoding network inherently suffers from information loss during compression, rendering it challenging to achieve nuanced restoration outcomes solely reliant on compressed information, particularly manifesting as blurred imagery and distinct edge artifacts around the restored areas. To mitigate this issue of insufficient image information utilization, we introduce a Multi-Stage Decoding Network in this study. This network leverages multiple decoders to decode and integrate features from each layer of the encoding stage, thereby enhancing the exploitation of encoder features across various scales. Subsequently, a feature mapping is derived that more accurately captures the content of the impaired region. Comparative experiments conducted on globally recognized datasets demonstrate that MSDN achieves a notable enhancement in the visual quality of restored images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117268"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FLQ: Design and implementation of hybrid multi-base full logarithmic quantization neural network acceleration architecture based on FPGA

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-13 DOI: 10.1016/j.image.2025.117270

Longlong Zhang , Xiang Hu , Xuan Liao , Tong Zhou , Yuanxi Peng

As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on resource-limited embedded platforms. In recent years, logarithm-based quantization techniques have shown great potential in reducing the inference cost of neural networks. However, current single-model log-quantization has reached an upper limit of classification performance, and little work has investigated hardware implementation of neural network quantization. In this paper, we propose a full logarithmic quantization (FLQ) mechanism that quantizes both weights and activation values into the logarithmic domain, compressing the parameters on AlexNet and VGG16 model by >6.4 times while maintaining an accuracy loss of within 2.5 % compared with benchmarking. Furthermore, we propose two optimization solutions for FLQ: activation segmented full logarithmic quantization (ASFLQ) and multi-ratio activation segmented full logarithmic quantization (Multi-ASFLQ), which can better balance the numerical representation range and quantization step. Under the condition of weight quantization of 5 bits and activation value quantization of 4 bits, the optimization methods proposed in this paper can improve the TOP1 of the VGG16 network model by 1 % and 1.6 %, respectively. Subsequently, we propose an implementation scheme of computing unit corresponding to the optimized FLQ mechanism above, which can not only convert multiplication operations into a shift operation but also integrate functions such as different ratio logarithmic bases and sparsity processing for activation, minimizing resource consumption as well as avoiding unnecessary calculations. Finally, we experiment with VGG19, Retnet50, and Densenet169 models, proving that the proposed method can achieve good performance under lower bit quantization. © 2001 Elsevier Science. All rights reserved

{"title":"FLQ: Design and implementation of hybrid multi-base full logarithmic quantization neural network acceleration architecture based on FPGA","authors":"Longlong Zhang , Xiang Hu , Xuan Liao , Tong Zhou , Yuanxi Peng","doi":"10.1016/j.image.2025.117270","DOIUrl":"10.1016/j.image.2025.117270","url":null,"abstract":"<div><div>As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on resource-limited embedded platforms. In recent years, logarithm-based quantization techniques have shown great potential in reducing the inference cost of neural networks. However, current single-model log-quantization has reached an upper limit of classification performance, and little work has investigated hardware implementation of neural network quantization. In this paper, we propose a full logarithmic quantization (FLQ) mechanism that quantizes both weights and activation values into the logarithmic domain, compressing the parameters on AlexNet and VGG16 model by >6.4 times while maintaining an accuracy loss of within 2.5 % compared with benchmarking. Furthermore, we propose two optimization solutions for FLQ: activation segmented full logarithmic quantization (ASFLQ) and multi-ratio activation segmented full logarithmic quantization (Multi-ASFLQ), which can better balance the numerical representation range and quantization step. Under the condition of weight quantization of 5 bits and activation value quantization of 4 bits, the optimization methods proposed in this paper can improve the TOP1 of the VGG16 network model by 1 % and 1.6 %, respectively. Subsequently, we propose an implementation scheme of computing unit corresponding to the optimized FLQ mechanism above, which can not only convert multiplication operations into a shift operation but also integrate functions such as different ratio logarithmic bases and sparsity processing for activation, minimizing resource consumption as well as avoiding unnecessary calculations. Finally, we experiment with VGG19, Retnet50, and Densenet169 models, proving that the proposed method can achieve good performance under lower bit quantization. © 2001 Elsevier Science. All rights reserved</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117270"},"PeriodicalIF":3.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image super-resolution based on multifractals in transfer domain

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2025-01-03 DOI: 10.1016/j.image.2024.117221

Xunxiang Yao, Qiang Wub, Peng Zhange, Fangxun Baod

The goal of image super-resolution technique is to reconstruct high-resolution image with fine texture details from its low-resolution version.On Fourier domain,such fine details are more related to the information in the highfrequency spectrum. Most of existing methods do not have specific modules to handle such high-frequency information adaptively. Thus, they cause edge blur or texture disorder. To tackle the problems, this work explores image super-resolution on multiple sub-bands of the corresponding image, which are generated by NonSubsampled Contourlet Transform (NSCT). Different sub-bands hold the information of different frequency which is then related to the detailedness of information of the given low-resolution image.In this work, such image information detailedness is formulated as image roughness. Moreover, fractals analysis is applied to each sub-band image. Since fractals can mathematically represent the image roughness, it then is able to represent the detailedness (i.e. various frequency of image information). Overall, a multi-fractals formulation is established based on multiple sub-bands image. On each sub-band, different fractals representation is created adaptively. In this way, the image super-resolution process is transformed into a multifractal optimization problem. The experiment result demonstrates the effectiveness of the proposed method in recovering high-frequency details.

{"title":"Image super-resolution based on multifractals in transfer domain","authors":"Xunxiang Yao, Qiang Wub, Peng Zhange, Fangxun Baod","doi":"10.1016/j.image.2024.117221","DOIUrl":"10.1016/j.image.2024.117221","url":null,"abstract":"<div><div>The goal of image super-resolution technique is to reconstruct high-resolution image with fine texture details from its low-resolution version.On Fourier domain,such fine details are more related to the information in the highfrequency spectrum. Most of existing methods do not have specific modules to handle such high-frequency information adaptively. Thus, they cause edge blur or texture disorder. To tackle the problems, this work explores image super-resolution on multiple sub-bands of the corresponding image, which are generated by NonSubsampled Contourlet Transform (NSCT). Different sub-bands hold the information of different frequency which is then related to the detailedness of information of the given low-resolution image.In this work, such image information detailedness is formulated as image roughness. Moreover, fractals analysis is applied to each sub-band image. Since fractals can mathematically represent the image roughness, it then is able to represent the detailedness (i.e. various frequency of image information). Overall, a multi-fractals formulation is established based on multiple sub-bands image. On each sub-band, different fractals representation is created adaptively. In this way, the image super-resolution process is transformed into a multifractal optimization problem. The experiment result demonstrates the effectiveness of the proposed method in recovering high-frequency details.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117221"},"PeriodicalIF":3.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Middle-output deep image prior for blind hyperspectral and multispectral image fusion

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication

Pub Date : 2024-12-26 DOI: 10.1016/j.image.2024.117247

Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello

Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.

{"title":"Middle-output deep image prior for blind hyperspectral and multispectral image fusion","authors":"Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello","doi":"10.1016/j.image.2024.117247","DOIUrl":"10.1016/j.image.2024.117247","url":null,"abstract":"<div><div>Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117247"},"PeriodicalIF":3.4,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0