Pub Date : 2025-10-01Epub Date: 2025-04-28DOI: 10.1016/j.image.2025.117336
Peng He , Lin Zhang , Yu Yang , Yue Zhou , Shukai Duan , Xiaofang Hu
Images taken in rainy, hazy, and low-light environments severely hinder the performance of outdoor computer vision systems. Most data-driven image restoration methods are task-specific and computationally intensive, whereas the capture and processing of degraded images occur largely in end-side devices with limited computing resources. Motivated by addressing the above issues, a novel software and hardware co-designed image restoration method named multi-flow attentive memristive neural network (MA-MNN) is proposed in this paper, which combines a deep learning algorithm and the nanoscale device memristor. The multi-level complementary spatial contextual information is exploited by the multi-flow aggregation block. The dense connection design is adopted to provide smooth transportation across units and alleviate the vanishing-gradient. The supervised calibration block is designed to facilitate achieving the dual-attention mechanism that helps the model identify and re-calibrate the transformed features. Besides, a hardware implementation scheme based on memristors is designed to provide low energy consumption solutions for embedded applications. Extensive experiments in image deraining, image dehazing and low-light image enhancement have shown that the proposed method is highly competitive with over 20 state-of-the-art methods.
{"title":"MA-MNN: Multi-flow attentive memristive neural network for multi-task image restoration","authors":"Peng He , Lin Zhang , Yu Yang , Yue Zhou , Shukai Duan , Xiaofang Hu","doi":"10.1016/j.image.2025.117336","DOIUrl":"10.1016/j.image.2025.117336","url":null,"abstract":"<div><div>Images taken in rainy, hazy, and low-light environments severely hinder the performance of outdoor computer vision systems. Most data-driven image restoration methods are task-specific and computationally intensive, whereas the capture and processing of degraded images occur largely in end-side devices with limited computing resources. Motivated by addressing the above issues, a novel software and hardware co-designed image restoration method named multi-flow attentive memristive neural network (MA-MNN) is proposed in this paper, which combines a deep learning algorithm and the nanoscale device memristor. The multi-level complementary spatial contextual information is exploited by the multi-flow aggregation block. The dense connection design is adopted to provide smooth transportation across units and alleviate the vanishing-gradient. The supervised calibration block is designed to facilitate achieving the dual-attention mechanism that helps the model identify and re-calibrate the transformed features. Besides, a hardware implementation scheme based on memristors is designed to provide low energy consumption solutions for embedded applications. Extensive experiments in image deraining, image dehazing and low-light image enhancement have shown that the proposed method is highly competitive with over 20 state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117336"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-15DOI: 10.1016/j.image.2025.117381
Xiaoxi Liu , Ju Liu , Lingchen Gu , Yafeng Li , Xiaojun Chang , Feiping Nie
Recently, 3D Convolutional Neural Networks (3D ConvNets) have been widely exploited for action recognition and achieved satisfying performance. However, the superior action features are often drowned in numerous irrelevant information, which immensely enhances the difficulty of video representation. To find a generic cost-efficient approach to balance the parameters and performance, we present a novel network to mine the Salient Spatio-Temporal Feature based on 3D ConvNets backbone for action recognition, termed as S2TF-Net. Firstly, we extract the salient features of each 3D residual block by constructing a multi-scale module for Salient Semantic Feature mining (SSF-Module). Then, with the aim of preserving the salient features in pooling operations, we establish a Two-branch Salient Feature Preserving Module (TSFP-Module). Besides, these above two modules with proper loss function can collaborate in an “easy-to-concat” fashion for most 3D ResNet backbones to classify more accurately albeit in the shallower network. Finally, we conduct experiments over three popular action recognition datasets, where our S2TF-Net is competitive compared with the deeper 3D backbones or current state-of-the-art results. Treating the P3D, 3D ResNet, Non-local I3D and X3D as baseline, the proposed method improves them to varying degrees. Particularly, for Non-local I3D ResNet, the proposed S2TF-Net enhances 4.1%, 3.0% and 4.6% in Kinetics-400, UCF101 and HMDB51 datasets, achieving the accuracy of 74.8%, 95.1% and 80.9%. We hope this study will provide useful inspiration and experience for future research about more cost-effective methods. Code is released here: https://github.com/xiaoxiAries/S2TFNet.
{"title":"Mining the Salient Spatio-Temporal Feature with S2TF-Net for action recognition","authors":"Xiaoxi Liu , Ju Liu , Lingchen Gu , Yafeng Li , Xiaojun Chang , Feiping Nie","doi":"10.1016/j.image.2025.117381","DOIUrl":"10.1016/j.image.2025.117381","url":null,"abstract":"<div><div>Recently, 3D Convolutional Neural Networks (3D ConvNets) have been widely exploited for action recognition and achieved satisfying performance. However, the superior action features are often drowned in numerous irrelevant information, which immensely enhances the difficulty of video representation. To find a generic cost-efficient approach to balance the parameters and performance, we present a novel network to mine the <strong>S</strong>alient <strong>S</strong>patio-<strong>T</strong>emporal <strong>F</strong>eature based on 3D ConvNets backbone for action recognition, termed as S<sup>2</sup>TF-Net. Firstly, we extract the salient features of each 3D residual block by constructing a multi-scale module for <strong>S</strong>alient <strong>S</strong>emantic <strong>F</strong>eature mining (SSF-Module). Then, with the aim of preserving the salient features in pooling operations, we establish a <strong>T</strong>wo-branch <strong>S</strong>alient <strong>F</strong>eature <strong>P</strong>reserving Module (TSFP-Module). Besides, these above two modules with proper loss function can collaborate in an “easy-to-concat” fashion for most 3D ResNet backbones to classify more accurately albeit in the shallower network. Finally, we conduct experiments over three popular action recognition datasets, where our S<sup>2</sup>TF-Net is competitive compared with the deeper 3D backbones or current state-of-the-art results. Treating the P3D, 3D ResNet, Non-local I3D and X3D as baseline, the proposed method improves them to varying degrees. Particularly, for Non-local I3D ResNet, the proposed S<sup>2</sup>TF-Net enhances 4.1%, 3.0% and 4.6% in Kinetics-400, UCF101 and HMDB51 datasets, achieving the accuracy of 74.8%, 95.1% and 80.9%. We hope this study will provide useful inspiration and experience for future research about more cost-effective methods. Code is released here: <span><span>https://github.com/xiaoxiAries/S2TFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117381"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-05-25DOI: 10.1016/j.image.2025.117353
Muhammad Umair Safdar , Tariq Shah , Asif Ali
Image encryption is crucial for safeguarding sensitive visual data; however, traditional methods often encounter challenges regarding efficiency and adaptability to the unique characteristics of images. This research is motivated by the potential of ring-based algebraic structures to develop lightweight, secure, and efficient encryption schemes specifically designed for image data. The article presents a novel approach to image encryption in cryptography using a local ring algebraic structure. The proposed method involves encrypting multiple images by constructing substitution boxes from subsets, which are not subgroups but have identity and invertibility axioms. The challenge of using subsets for encryption purposes is addressed by taking unit elements of the ring, picking a subgroup, and splitting it into two subsets. The substitution box is generated by one of the subsets and used for the substitution process, while the other subset is mapped to the Galois field. It constructs the substitution box and is used for diffusion. A DNA sequence is applied to the red, green, and blue channels of the image, and a key is generated by hashing the image and using a subset of the subgroup of units of the ring. Finally, all channels are XORed with the key. The performance of the proposed scheme is evaluated using different analyses, and it is found that the scheme outperforms existing approaches. This approach presents a promising solution for image encryption in cryptography.
{"title":"Multiple-image encryption algorithm based on S-boxes and DNA sequences","authors":"Muhammad Umair Safdar , Tariq Shah , Asif Ali","doi":"10.1016/j.image.2025.117353","DOIUrl":"10.1016/j.image.2025.117353","url":null,"abstract":"<div><div>Image encryption is crucial for safeguarding sensitive visual data; however, traditional methods often encounter challenges regarding efficiency and adaptability to the unique characteristics of images. This research is motivated by the potential of ring-based algebraic structures to develop lightweight, secure, and efficient encryption schemes specifically designed for image data. The article presents a novel approach to image encryption in cryptography using a local ring algebraic structure. The proposed method involves encrypting multiple images by constructing substitution boxes from subsets, which are not subgroups but have identity and invertibility axioms. The challenge of using subsets for encryption purposes is addressed by taking unit elements of the ring, picking a subgroup, and splitting it into two subsets. The substitution box is generated by one of the subsets and used for the substitution process, while the other subset is mapped to the Galois field. It constructs the substitution box and is used for diffusion. A DNA sequence is applied to the red, green, and blue channels of the image, and a key is generated by hashing the image and using a subset of the subgroup of units of the ring. Finally, all channels are XORed with the key. The performance of the proposed scheme is evaluated using different analyses, and it is found that the scheme outperforms existing approaches. This approach presents a promising solution for image encryption in cryptography.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117353"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144177675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-04-12DOI: 10.1016/j.image.2025.117331
Fengli Yang, Xuechun Wang, Yue Zhao
Inspired by Ying's work on the calibration technique, this study proposes a new planar pattern (referred to as the phi-type model hereinafter), which includes a circle and diameter, as the calibration scene. In sports scenarios, such as a soccer match or basketball court, most existing methods require information of the scene points in a three-dimensional space. However, an interesting observation in the midfield is that the centre circle and the halfway line form a phi-type template. A new automatic method using the properties of asymptotes is proposed based on the images of the midfield. All intrinsic parameters of the camera can be determined without any assumptions such as zero skew or unitary aspect ratio. The main advantages of our technique are that it neither involves point or line matching nor does it require the metric information of the model plane. The feasibility and validity of the proposed algorithm were verified by testing the noise sensitivity and performing image metric rectification.
{"title":"Camera calibration using property of asymptotes with application to sports scenes","authors":"Fengli Yang, Xuechun Wang, Yue Zhao","doi":"10.1016/j.image.2025.117331","DOIUrl":"10.1016/j.image.2025.117331","url":null,"abstract":"<div><div>Inspired by Ying's work on the calibration technique, this study proposes a new planar pattern (referred to as the phi-type model hereinafter), which includes a circle and diameter, as the calibration scene. In sports scenarios, such as a soccer match or basketball court, most existing methods require information of the scene points in a three-dimensional space. However, an interesting observation in the midfield is that the centre circle and the halfway line form a phi-type template. A new automatic method using the properties of asymptotes is proposed based on the images of the midfield. All intrinsic parameters of the camera can be determined without any assumptions such as zero skew or unitary aspect ratio. The main advantages of our technique are that it neither involves point or line matching nor does it require the metric information of the model plane. The feasibility and validity of the proposed algorithm were verified by testing the noise sensitivity and performing image metric rectification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117331"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-05-15DOI: 10.1016/j.image.2025.117342
Shuangshuang Gu , Bin Wen , Shiyao Chen , Yuanyuan Li , Guanqiu Qi , Linhong Shuai , Zhiqin Zhu
Driver distraction detection is critical to reducing road traffic accidents and increasing the efficiency of advanced driver assistance systems. Real-time lightweight models are especially important for in-vehicle devices with limited computing resources. However, most existing methods focus on designing lighter network architectures and ignore the performance loss when detecting tiny targets. In order to realize the collaborative optimization of tiny target detection accuracy and network lightweight, a driver distraction detection method ATDNet based on adaptive tiny target detection and lightweight networks is proposed. This method aims to reduce model complexity while fully capturing target features for accurate detection. ATDNet consists of three core modules, Channel Reconstruction Perception Module (CRPM), Dynamic Spatial Self-locking Module (DSSM) and Structural Feedback Optimization Module (SFOM). CRPM reconfigures channels and reconstructs them into batch dimensions, uses parallel strategies to perceive interactive features between channels, and significantly enhances feature extraction capabilities. DSSM adopts dynamic locking and adaptive spatial selection mechanisms to capture multi-scale features while injecting adaptive spatial information. It effectively aggregates instance features and reduces the interference of conflicting information and background information, thereby improving the detection ability of tiny targets. SFOM uses dependency trees to model inter-layer relationships and integrate coupling parameters into groupings. It uses a sparse strategy to remove unimportant parameters, achieving lightweight modeling while balancing accuracy and speed. Experimental results show that ATDNet is superior to the latest methods in driver distraction detection, showing excellent performance and good application prospects.
{"title":"Driver distraction detection based on adaptive tiny targets and lightweight networks","authors":"Shuangshuang Gu , Bin Wen , Shiyao Chen , Yuanyuan Li , Guanqiu Qi , Linhong Shuai , Zhiqin Zhu","doi":"10.1016/j.image.2025.117342","DOIUrl":"10.1016/j.image.2025.117342","url":null,"abstract":"<div><div>Driver distraction detection is critical to reducing road traffic accidents and increasing the efficiency of advanced driver assistance systems. Real-time lightweight models are especially important for in-vehicle devices with limited computing resources. However, most existing methods focus on designing lighter network architectures and ignore the performance loss when detecting tiny targets. In order to realize the collaborative optimization of tiny target detection accuracy and network lightweight, a driver distraction detection method ATD<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Net based on adaptive tiny target detection and lightweight networks is proposed. This method aims to reduce model complexity while fully capturing target features for accurate detection. ATD<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Net consists of three core modules, Channel Reconstruction Perception Module (CRPM), Dynamic Spatial Self-locking Module (DSSM) and Structural Feedback Optimization Module (SFOM). CRPM reconfigures channels and reconstructs them into batch dimensions, uses parallel strategies to perceive interactive features between channels, and significantly enhances feature extraction capabilities. DSSM adopts dynamic locking and adaptive spatial selection mechanisms to capture multi-scale features while injecting adaptive spatial information. It effectively aggregates instance features and reduces the interference of conflicting information and background information, thereby improving the detection ability of tiny targets. SFOM uses dependency trees to model inter-layer relationships and integrate coupling parameters into groupings. It uses a sparse strategy to remove unimportant parameters, achieving lightweight modeling while balancing accuracy and speed. Experimental results show that ATD<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Net is superior to the latest methods in driver distraction detection, showing excellent performance and good application prospects.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117342"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tracking marine life plays a crucial role in understanding migration patterns, movements, and population growth of underwater species. Deep learning-based fish-tracking networks have been actively researched and developed, yielding promising results. In this work, we propose an end-to-end deep learning framework for tracking fish in unconstrained marine environments. The core innovation of our approach is a Siamese-based architecture integrated with an image enhancement module, designed to measure appearance similarity effectively. The enhancement module consists of convolutional layers and a squeeze-and-excitation block, pre-trained on degraded and clean image pairs to address underwater distortions. This enhanced feature representation is leveraged within the Siamese framework to compute an appearance similarity score, which is further refined using prediction scores based on fish movement patterns. To ensure robust tracking, we combine the appearance similarity score, prediction score, and IoU-based similarity score to generate fish trajectories using the Hungarian algorithm. Our framework significantly reduces ID switches by 35.6% on the Fish4Knowledge dataset and 3.8% on the GMOT-40 fish category, all while maintaining high tracking accuracy. The source code of this work is available here: https://github.com/srimanta-mandal/Multi-Fish-Tracking-with-Underwater-Image-Enhancement.
{"title":"Multi-fish tracking with underwater image enhancement by deep network in marine ecosystems","authors":"Prerana Mukherjee , Srimanta Mandal , Koteswar Rao Jerripothula , Vrishabhdhwaj Maharshi , Kashish Katara","doi":"10.1016/j.image.2025.117321","DOIUrl":"10.1016/j.image.2025.117321","url":null,"abstract":"<div><div>Tracking marine life plays a crucial role in understanding migration patterns, movements, and population growth of underwater species. Deep learning-based fish-tracking networks have been actively researched and developed, yielding promising results. In this work, we propose an end-to-end deep learning framework for tracking fish in unconstrained marine environments. The core innovation of our approach is a Siamese-based architecture integrated with an image enhancement module, designed to measure appearance similarity effectively. The enhancement module consists of convolutional layers and a squeeze-and-excitation block, pre-trained on degraded and clean image pairs to address underwater distortions. This enhanced feature representation is leveraged within the Siamese framework to compute an appearance similarity score, which is further refined using prediction scores based on fish movement patterns. To ensure robust tracking, we combine the appearance similarity score, prediction score, and IoU-based similarity score to generate fish trajectories using the Hungarian algorithm. Our framework significantly reduces ID switches by 35.6% on the Fish4Knowledge dataset and 3.8% on the GMOT-40 fish category, all while maintaining high tracking accuracy. The source code of this work is available here: <span><span>https://github.com/srimanta-mandal/Multi-Fish-Tracking-with-Underwater-Image-Enhancement</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117321"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-18DOI: 10.1016/j.image.2025.117384
Uğur Erkan , Ahmet Yilmaz , Abdurrahim Toktas , Qiang Lai , Suo Gao
Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.
随着多媒体数据的增长,从大型图像数据库中返回相似图像的图像检索(IR)已成为一项重要任务。现有的研究利用哈希码表示从整个图像生成的图像特征,包括来自背景的冗余语义。在这项研究中,提出了一种新的基于目标检测的哈希红外(ODH-IR)方案,该方案使用You Only Look Once (YOLO)和自动编码器来忽略图像中的杂波。YOLO和自动编码器的集成根据图像中有意义的对象提供了最具代表性的哈希码。利用自动编码器将检测到的对象向量压缩到哈希码的所需位长度。ODH-IR方案通过三个众所周知的精确度量数据集与最新技术的比较来验证。ODH-IR在36次测量中共获得35个指标的最佳结果,最佳平均排名为1.03。此外,从三个说明性IR示例中可以观察到,它检索了最相关的语义。结果表明,ODH-IR是一种有效的哈希方法,利用YOLO和自编码器进行目标检测。
{"title":"Object detection-based deep autoencoder hashing image retrieval","authors":"Uğur Erkan , Ahmet Yilmaz , Abdurrahim Toktas , Qiang Lai , Suo Gao","doi":"10.1016/j.image.2025.117384","DOIUrl":"10.1016/j.image.2025.117384","url":null,"abstract":"<div><div>Image Retrieval (IR), which returns similar images from a large image database, has become an important task as multimedia data grows. Existing studies utilize hash code representing the image features generated from the whole image, including redundant semantics from the background. In this study, a novel Object Detection-based Hashing IR (ODH-IR) scheme using You Only Look Once (YOLO) and an autoencoder is presented to ignore clutter in the images. Integration of YOLO and the autoencoder provides the most representative hash code depending on meaningful objects in the images. The autoencoder is exploited to compress the detected object vector to the desired bit length of the hash code. The ODH-IR scheme is validated by comparison with the state of the art through three well-known datasets in terms of precise metrics. The ODH-IR totally has the best 35 metric results over 36 measurements and the best avg. mean rank of 1.03. Moreover, it is observed from the three illustrative IR examples that it retrieves the most relevant semantics. The results demonstrate that the ODH-IR is an impactful scheme thanks to the effective hashing method through object detection using YOLO and the autoencoder.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117384"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-04-17DOI: 10.1016/j.image.2025.117327
Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding
For frame rate up-conversion (FRUC), one of the key challenges is to deal with irregular and large motions that are widely existed in video scenes. However, most existing FRUC works make constant brightness and linear motion assumptions, easily leading to undesirable artifacts such as motion blurriness and frame flickering. In this work, we propose an advanced FRUC work by using a high-order model for motion calibration and a sparse sampling strategy for outlier correction. Unidirectional motion estimation is used to accurately locate object from the previous frame to the following frame in a coarse-to-fine pyramid structure. Then, object motion trajectory is fine-tuned to approximate real motion, and possible outlier regions are located and recorded. Moreover, image sparsity is exploited as the prior knowledge for outlier correction, and the outlier index map is used to design the measurement matrix. Based on the theory of sparse sampling, the outlier regions are reconstructed to eliminate the side effects such as overlapping, holes and blurring. Extensive experimental results demonstrate that the proposed approach outperforms the state-of-the-art FRUC works in terms of both objective and subjective qualities of interpolated frames.
{"title":"Higher-order motion calibration and sparsity based outlier correction for video FRUC","authors":"Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding","doi":"10.1016/j.image.2025.117327","DOIUrl":"10.1016/j.image.2025.117327","url":null,"abstract":"<div><div>For frame rate up-conversion (FRUC), one of the key challenges is to deal with irregular and large motions that are widely existed in video scenes. However, most existing FRUC works make constant brightness and linear motion assumptions, easily leading to undesirable artifacts such as motion blurriness and frame flickering. In this work, we propose an advanced FRUC work by using a high-order model for motion calibration and a sparse sampling strategy for outlier correction. Unidirectional motion estimation is used to accurately locate object from the previous frame to the following frame in a coarse-to-fine pyramid structure. Then, object motion trajectory is fine-tuned to approximate real motion, and possible outlier regions are located and recorded. Moreover, image sparsity is exploited as the prior knowledge for outlier correction, and the outlier index map is used to design the measurement matrix. Based on the theory of sparse sampling, the outlier regions are reconstructed to eliminate the side effects such as overlapping, holes and blurring. Extensive experimental results demonstrate that the proposed approach outperforms the state-of-the-art FRUC works in terms of both objective and subjective qualities of interpolated frames.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117327"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-05-06DOI: 10.1016/j.image.2025.117343
Xinhao Rao , Weidong Min , Ziyang Deng , Mengxue Liu
Human facial expression transformation has been extensively studied using Generative Adversarial Networks (GANs) recently. GANs have also shown successful attempts in transforming anime-style images. However, current methods for anime pictures fail to refine the expression control efficiently, leading to control effects weaker than expected. Moreover, it remains challenging to maintain the original anime face identity information while transforming. To address these issues, we propose an expression transformation method for anime-style images. In order to enhance the control effect of discrete emoticon tags, a mapping network is proposed to map them to high-dimensional control information, which is then injected into the network multiple times during transformation. Additionally, for better maintaining the anime face identity information while transforming, an integrated attention mask mechanism is introduced to enable the network's expression control to focus on the expression-related features, while avoiding affecting the unrelated features. Finally, we conduct a large number of experiments to verify the validity of the proposed method, and both quantitative and qualitative evaluations are carried out. The results demonstrate the superiority of our proposed method compared to existing methods based on multi-domain image-to-image translation.
{"title":"Facial expression transformation for anime-style image based on decoder control and attention mask","authors":"Xinhao Rao , Weidong Min , Ziyang Deng , Mengxue Liu","doi":"10.1016/j.image.2025.117343","DOIUrl":"10.1016/j.image.2025.117343","url":null,"abstract":"<div><div>Human facial expression transformation has been extensively studied using Generative Adversarial Networks (GANs) recently. GANs have also shown successful attempts in transforming anime-style images. However, current methods for anime pictures fail to refine the expression control efficiently, leading to control effects weaker than expected. Moreover, it remains challenging to maintain the original anime face identity information while transforming. To address these issues, we propose an expression transformation method for anime-style images. In order to enhance the control effect of discrete emoticon tags, a mapping network is proposed to map them to high-dimensional control information, which is then injected into the network multiple times during transformation. Additionally, for better maintaining the anime face identity information while transforming, an integrated attention mask mechanism is introduced to enable the network's expression control to focus on the expression-related features, while avoiding affecting the unrelated features. Finally, we conduct a large number of experiments to verify the validity of the proposed method, and both quantitative and qualitative evaluations are carried out. The results demonstrate the superiority of our proposed method compared to existing methods based on multi-domain image-to-image translation.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117343"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-05-15DOI: 10.1016/j.image.2025.117341
Amit Soni Arya, Susanta Mukhopadhyay
Image inpainting, a crucial task in image restoration, aims to reconstruct highly degraded images with missing pixels while preserving structural and textural integrity. Traditional patch-based and group-based sparse representation methods often struggle with visual artifacts and over-smoothing, limiting their effectiveness. To address these challenges, we propose a novel multi-scale morphological patch-based and group-based sparse representation learning approach for image inpainting. Our method enhances image inpainting by integrating morphological patch-based sparse representation (M-PSR) learning using k-singular value decomposition (k-SVD) and group-based sparse representation using principal component analysis (PCA) to construct adaptive dictionaries for improved reconstruction accuracy. Additionally, we employ the alternating direction method of multipliers (ADMM) to optimize the integration of morphological patch and group based sparse representations, enhancing restoration quality. Extensive experiments on various degraded images demonstrate that our approach outperforms state-of-the-art methods in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM). The proposed method effectively reconstructs images corrupted by missing pixels, scratches, and text inlays, achieving superior structural coherence and perceptual quality. This work contributes a robust and efficient solution for image inpainting, offering significant advances in sparse modeling and morphological image processing.
{"title":"Sparse modeling for image inpainting: A multi-scale morphological patch-based k-SVD and group-based PCA","authors":"Amit Soni Arya, Susanta Mukhopadhyay","doi":"10.1016/j.image.2025.117341","DOIUrl":"10.1016/j.image.2025.117341","url":null,"abstract":"<div><div>Image inpainting, a crucial task in image restoration, aims to reconstruct highly degraded images with missing pixels while preserving structural and textural integrity. Traditional patch-based and group-based sparse representation methods often struggle with visual artifacts and over-smoothing, limiting their effectiveness. To address these challenges, we propose a novel multi-scale morphological patch-based and group-based sparse representation learning approach for image inpainting. Our method enhances image inpainting by integrating morphological patch-based sparse representation (M-PSR) learning using k-singular value decomposition (k-SVD) and group-based sparse representation using principal component analysis (PCA) to construct adaptive dictionaries for improved reconstruction accuracy. Additionally, we employ the alternating direction method of multipliers (ADMM) to optimize the integration of morphological patch and group based sparse representations, enhancing restoration quality. Extensive experiments on various degraded images demonstrate that our approach outperforms state-of-the-art methods in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM). The proposed method effectively reconstructs images corrupted by missing pixels, scratches, and text inlays, achieving superior structural coherence and perceptual quality. This work contributes a robust and efficient solution for image inpainting, offering significant advances in sparse modeling and morphological image processing.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117341"},"PeriodicalIF":3.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}