Journal of Visual Communication and Image Representation最新文献_第10页

A lightweight target tracking algorithm based on online correction for meta-learning 基于元学习在线校正的轻量级目标跟踪算法

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104228

Yongsheng Qi, Guohua Yin, Yongting Li, Liqiang Liu, Zhengting Jiang

The traditional Siamese network based object tracking algorithms suffer from high computational complexity, making them difficult to run on embedded devices. Moreover, when faced with long-term tracking tasks, their success rates significantly decline. To address these issues, we propose a lightweight long-term object tracking algorithm called Meta-Master-based Ghost Fast Tracking (MGTtracker),which based on meta-learning. This algorithm integrates the Ghost mechanism to create a lightweight backbone network called G-ResNet, which accurately extracts target features while operating quickly. We design a tiny adaptive weighted fusion feature pyramid network (TiFPN) to enhance feature information fusion and mitigate interference from similar objects. We introduce a lightweight region regression network, the Ghost Decouple Net (GDNet) for target position prediction. Finally, we propose a meta-learning-based online template correction mechanism called Meta-Master to overcome error accumulation in long-term tracking tasks and the difficulty of reacquiring targets after loss. We evaluate the algorithm on public datasets OTB100, VOT2020, VOT2018LT, and LaSOT and deploy it for performance testing on Jetson Xavier NX. Experimental results demonstrate the effectiveness and superiority of the algorithm. Compared to existing classic object tracking algorithms, our approach achieves a faster running speed of 25 FPS on NX, and real-time correction enhances the algorithm’s robustness. Although similar in accuracy and EAO metrics, our algorithm outperforms similar algorithms in speed and effectively addresses the issues of significant cumulative errors and easy target loss during tracking. Code is released at https://github.com/ygh96521/MGTtracker.git.

传统的基于连体网络的物体跟踪算法计算复杂度高，难以在嵌入式设备上运行。此外，当面临长期跟踪任务时，它们的成功率会明显下降。为了解决这些问题，我们提出了一种基于元学习的轻量级长期目标跟踪算法，称为基于元主控的幽灵快速跟踪（MGTtracker）。该算法整合了 Ghost 机制，创建了一个名为 G-ResNet 的轻量级骨干网络，可以准确提取目标特征，同时快速运行。我们设计了一个微小的自适应加权融合特征金字塔网络（TiFPN），以增强特征信息融合并减轻相似物体的干扰。我们引入了轻量级区域回归网络--Ghost Decouple Net (GDNet)，用于目标位置预测。最后，我们提出了一种基于元学习的在线模板校正机制，称为 Meta-Master，以克服长期跟踪任务中的误差积累和丢失后重新获取目标的困难。我们在公共数据集 OTB100、VOT2020、VOT2018LT 和 LaSOT 上对算法进行了评估，并在 Jetson Xavier NX 上进行了性能测试。实验结果证明了该算法的有效性和优越性。与现有的经典物体跟踪算法相比，我们的方法在 NX 上的运行速度更快，达到 25 FPS，而且实时校正增强了算法的鲁棒性。虽然精度和 EAO 指标相似，但我们的算法在速度上优于同类算法，并有效解决了跟踪过程中累积误差大和目标易丢失的问题。代码发布于 https://github.com/ygh96521/MGTtracker.git。

{"title":"A lightweight target tracking algorithm based on online correction for meta-learning","authors":"Yongsheng Qi, Guohua Yin, Yongting Li, Liqiang Liu, Zhengting Jiang","doi":"10.1016/j.jvcir.2024.104228","DOIUrl":"10.1016/j.jvcir.2024.104228","url":null,"abstract":"<div><p>The traditional Siamese network based object tracking algorithms suffer from high computational complexity, making them difficult to run on embedded devices. Moreover, when faced with long-term tracking tasks, their success rates significantly decline. To address these issues, we propose a lightweight long-term object tracking algorithm called Meta-Master-based Ghost Fast Tracking (MGTtracker),which based on meta-learning. This algorithm integrates the Ghost mechanism to create a lightweight backbone network called G-ResNet, which accurately extracts target features while operating quickly. We design a tiny adaptive weighted fusion feature pyramid network (TiFPN) to enhance feature information fusion and mitigate interference from similar objects. We introduce a lightweight region regression network, the Ghost Decouple Net (GDNet) for target position prediction. Finally, we propose a meta-learning-based online template correction mechanism called Meta-Master to overcome error accumulation in long-term tracking tasks and the difficulty of reacquiring targets after loss. We evaluate the algorithm on public datasets OTB100, VOT2020, VOT2018LT, and LaSOT and deploy it for performance testing on Jetson Xavier NX. Experimental results demonstrate the effectiveness and superiority of the algorithm. Compared to existing classic object tracking algorithms, our approach achieves a faster running speed of 25 FPS on NX, and real-time correction enhances the algorithm’s robustness. Although similar in accuracy and EAO metrics, our algorithm outperforms similar algorithms in speed and effectively addresses the issues of significant cumulative errors and easy target loss during tracking. Code is released at <span><span>https://github.com/ygh96521/MGTtracker.git</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104228"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141710561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation 用于三维人体姿态估计的联合多尺度变换器和姿态等效约束条件

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104247

Yongpeng Wu, Dehui Kong, Junna Gao, Jinghua Li, Baocai Yin

Different from image-based 3D pose estimation, video-based 3D pose estimation gains performance improvement with temporal information. However, these methods still face the challenge of insufficient generalization ability, including human motion speed, body shape, and camera distance. To address the above problems, we propose a novel approach, referred to as joint Spatial–temporal Multi-scale Transformers and Pose Transformation Equivalence Constraints (SMT-PTEC) for 3D human pose estimation from videos. We design a more general spatial–temporal multi-scale feature extraction strategy, and introduce optimization constraints that adapt to the diversity of data to improve the accuracy of pose estimation. Specifically, we first introduce a spatial multi-scale transformer to extract multi-scale features of pose and establish a cross-scale information transfer mechanism, which effectively explores the underlying knowledge of human motion. Then, we present a temporal multi-scale transformer to explore multi-scale dependencies between frames, enhance the adaptability of the network to human motion speed, and improve the estimation accuracy through a context aware fusion of multi-scale predictions. Moreover, we add pose transformation equivalence constraints by changing the training samples with horizontal flipping, scaling, and body shape transformation to effectively overcome the influence of camera distance and body shape for the prediction accuracy. Extensive experimental results demonstrate that our approach achieves superior performance with less computational complexity than previous state-of-the-art methods. Code is available at https://github.com/JNGao123/SMT-PTEC.

与基于图像的三维姿态估算不同，基于视频的三维姿态估算可以利用时间信息提高性能。然而，这些方法仍然面临着泛化能力不足的挑战，包括人体运动速度、身体形状和摄像机距离等。为了解决上述问题，我们提出了一种新方法，即空间-时间多尺度变换器和姿态变换等效约束（SMT-PTEC）联合方法，用于从视频进行三维人体姿态估计。我们设计了一种更通用的时空多尺度特征提取策略，并引入了适应数据多样性的优化约束，以提高姿势估计的准确性。具体来说，我们首先引入了空间多尺度变换器来提取姿势的多尺度特征，并建立了跨尺度信息传递机制，从而有效地探索了人体运动的底层知识。然后，我们提出了时间多尺度变换器，以探索帧与帧之间的多尺度依赖关系，增强网络对人体运动速度的适应性，并通过上下文感知的多尺度预测融合提高估计精度。此外，我们还通过改变训练样本的水平翻转、缩放和体形变换来添加姿势变换等效约束，从而有效克服摄像机距离和体形对预测精度的影响。广泛的实验结果表明，与之前的先进方法相比，我们的方法以更低的计算复杂度实现了更优越的性能。代码见 https://github.com/JNGao123/SMT-PTEC。

{"title":"Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation","authors":"Yongpeng Wu, Dehui Kong, Junna Gao, Jinghua Li, Baocai Yin","doi":"10.1016/j.jvcir.2024.104247","DOIUrl":"10.1016/j.jvcir.2024.104247","url":null,"abstract":"<div><p>Different from image-based 3D pose estimation, video-based 3D pose estimation gains performance improvement with temporal information. However, these methods still face the challenge of insufficient generalization ability, including human motion speed, body shape, and camera distance. To address the above problems, we propose a novel approach, referred to as joint Spatial–temporal Multi-scale Transformers and Pose Transformation Equivalence Constraints (SMT-PTEC) for 3D human pose estimation from videos. We design a more general spatial–temporal multi-scale feature extraction strategy, and introduce optimization constraints that adapt to the diversity of data to improve the accuracy of pose estimation. Specifically, we first introduce a spatial multi-scale transformer to extract multi-scale features of pose and establish a cross-scale information transfer mechanism, which effectively explores the underlying knowledge of human motion. Then, we present a temporal multi-scale transformer to explore multi-scale dependencies between frames, enhance the adaptability of the network to human motion speed, and improve the estimation accuracy through a context aware fusion of multi-scale predictions. Moreover, we add pose transformation equivalence constraints by changing the training samples with horizontal flipping, scaling, and body shape transformation to effectively overcome the influence of camera distance and body shape for the prediction accuracy. Extensive experimental results demonstrate that our approach achieves superior performance with less computational complexity than previous state-of-the-art methods. Code is available at <span><span>https://github.com/JNGao123/SMT-PTEC</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104247"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141954063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting and tracking moving objects in defocus blur scenes 检测和跟踪散焦模糊场景中的移动物体

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104259

Fen Hu, Peng Yang, Jie Dou, Lei Dou

Object tracking stands as a cornerstone challenge within computer vision, with blurriness analysis representing a burgeoning field of interest. Among the various forms of blur encountered in natural scenes, defocus blur remains significantly underexplored. To bridge this gap, this article introduces the Defocus Blur Video Object Tracking (DBVOT) dataset, specifically crafted to facilitate research in visual object tracking under defocus blur conditions. We conduct a comprehensive performance analysis of 18 state-of-the-art object tracking methods on this unique dataset. Additionally, we propose a selective deblurring framework based on Deblurring Auxiliary Learning Net (DID-Anet), innovatively designed to tackle the complexities of defocus blur. This framework integrates a novel defocus blurriness metric for the smart deblurring of video frames, thereby enhancing the efficacy of tracking methods in defocus blur scenarios. Our extensive experimental evaluations underscore the significant advancements in tracking accuracy achieved by incorporating our proposed framework with leading tracking technologies.

物体跟踪是计算机视觉领域的一项基础挑战，而模糊度分析则是一个新兴的研究领域。在自然场景中出现的各种模糊形式中，离焦模糊的研究仍显不足。为了弥补这一不足，本文介绍了离焦模糊视频对象跟踪（DBVOT）数据集，该数据集专门用于促进离焦模糊条件下的视觉对象跟踪研究。我们在这个独特的数据集上对 18 种最先进的物体跟踪方法进行了全面的性能分析。此外，我们还提出了一个基于去模糊辅助学习网（DID-Anet）的选择性去模糊框架，该框架设计新颖，可应对复杂的虚焦模糊问题。该框架集成了一个新颖的虚焦模糊度量，可对视频帧进行智能去模糊处理，从而提高了追踪方法在虚焦模糊场景中的功效。我们进行了广泛的实验评估，通过将我们提出的框架与领先的跟踪技术相结合，显著提高了跟踪精度。

{"title":"Detecting and tracking moving objects in defocus blur scenes","authors":"Fen Hu, Peng Yang, Jie Dou, Lei Dou","doi":"10.1016/j.jvcir.2024.104259","DOIUrl":"10.1016/j.jvcir.2024.104259","url":null,"abstract":"<div><p>Object tracking stands as a cornerstone challenge within computer vision, with blurriness analysis representing a burgeoning field of interest. Among the various forms of blur encountered in natural scenes, defocus blur remains significantly underexplored. To bridge this gap, this article introduces the Defocus Blur Video Object Tracking (DBVOT) dataset, specifically crafted to facilitate research in visual object tracking under defocus blur conditions. We conduct a comprehensive performance analysis of 18 state-of-the-art object tracking methods on this unique dataset. Additionally, we propose a selective deblurring framework based on Deblurring Auxiliary Learning Net (DID-Anet), innovatively designed to tackle the complexities of defocus blur. This framework integrates a novel defocus blurriness metric for the smart deblurring of video frames, thereby enhancing the efficacy of tracking methods in defocus blur scenarios. Our extensive experimental evaluations underscore the significant advancements in tracking accuracy achieved by incorporating our proposed framework with leading tracking technologies.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104259"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reversible data hiding for color images based on prediction-error value ordering and adaptive embedding 基于预测误差值排序和自适应嵌入的彩色图像可逆数据隐藏技术

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-22 DOI: 10.1016/j.jvcir.2024.104239

Hui Wang , Detong Wang , Zhihui Chu , Zheheng Rao , Ye Yao

Prediction-error value ordering (PEVO) is an efficient implementation of reversible data hiding (RDH), which is perfect for color images to exploit the inter-channel and intra-channel correlations synchronously. However, the existing PEVO method has a slight shortage in the mapping selection stage, the candidate mappings are selected under conditions inconsistent with actual embedding in advance, and this is not the optimal solution. Therefore, in this paper, a novel RDH method for color images based on PEVO and adaptive embedding is proposed to implement adaptive two-dimensional (2D) modification for PEVO. Firstly, an improved particle swarm optimization (IPSO) algorithm based on PEVO is designed to alleviate the high temporal complexity caused by the determination of parameters and implement adaptive 2D modification for PEVO. Next, to further optimize the mapping used in embedding, an improved adaptive 2D mapping generation strategy is proposed by introducing the position information of points. In addition, a dynamic payload partition strategy is proposed to improve the embedding performance. Finally, the experimental results show that the PSNR of the image Lena is as high as 62.94 dB and the average PSNR of the proposed method is 1.46 dB higher than that of the state-of-the-art methods for embedding capacity of 20,000 bits.

预测-误差值排序（PEVO）是可逆数据隐藏（RDH）的一种有效实现方法，非常适合彩色图像同步利用信道间和信道内的相关性。然而，现有的 PEVO 方法在映射选择阶段略有不足，候选映射是在与实际嵌入不一致的条件下提前选择的，这并不是最优解。因此，本文提出了一种基于 PEVO 和自适应嵌入的新型彩色图像 RDH 方法，以实现 PEVO 的自适应二维（2D）修改。首先，设计了一种基于 PEVO 的改进粒子群优化算法（IPSO），以减轻参数确定所带来的高时间复杂性，并实现 PEVO 的自适应二维修改。接下来，为了进一步优化嵌入时使用的映射，通过引入点的位置信息，提出了一种改进的自适应二维映射生成策略。此外，还提出了一种动态有效载荷分区策略，以提高嵌入性能。最后，实验结果表明，图像 Lena 的 PSNR 高达 62.94 dB，在嵌入容量为 20,000 比特时，所提方法的平均 PSNR 比最先进方法高 1.46 dB。

{"title":"Reversible data hiding for color images based on prediction-error value ordering and adaptive embedding","authors":"Hui Wang , Detong Wang , Zhihui Chu , Zheheng Rao , Ye Yao","doi":"10.1016/j.jvcir.2024.104239","DOIUrl":"10.1016/j.jvcir.2024.104239","url":null,"abstract":"<div><p>Prediction-error value ordering (PEVO) is an efficient implementation of reversible data hiding (RDH), which is perfect for color images to exploit the inter-channel and intra-channel correlations synchronously. However, the existing PEVO method has a slight shortage in the mapping selection stage, the candidate mappings are selected under conditions inconsistent with actual embedding in advance, and this is not the optimal solution. Therefore, in this paper, a novel RDH method for color images based on PEVO and adaptive embedding is proposed to implement adaptive two-dimensional (2D) modification for PEVO. Firstly, an improved particle swarm optimization (IPSO) algorithm based on PEVO is designed to alleviate the high temporal complexity caused by the determination of parameters and implement adaptive 2D modification for PEVO. Next, to further optimize the mapping used in embedding, an improved adaptive 2D mapping generation strategy is proposed by introducing the position information of points. In addition, a dynamic payload partition strategy is proposed to improve the embedding performance. Finally, the experimental results show that the PSNR of the image Lena is as high as 62.94 dB and the average PSNR of the proposed method is 1.46 dB higher than that of the state-of-the-art methods for embedding capacity of 20,000 bits.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104239"},"PeriodicalIF":2.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EMCFN: Edge-based Multi-scale Cross Fusion Network for video frame interpolation EMCFN：基于边缘的视频帧插值多尺度交叉融合网络

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-09 DOI: 10.1016/j.jvcir.2024.104226

Shaowen Wang , Xiaohui Yang , Zhiquan Feng , Jiande Sun , Ju Liu

Video frame interpolation (VFI) is used to synthesize one or more intermediate frames between two frames in a video sequence to improve the temporal resolution of the video. However, many methods still face challenges when dealing with complex scenes involving high-speed motion, occlusions, and other factors. To address these challenges, we propose an Edge-based Multi-scale Cross Fusion Network (EMCFN) for VFI. We integrate a feature enhancement module (FEM) based on edge information into the U-Net architecture, resulting in richer and more complete feature maps, while also enhancing the preservation of image structure and details. This contributes to generating more accurate and realistic interpolated frames. At the same time, we use a multi-scale cross fusion frame synthesis model (MCFM) composed of three GridNet branches to generate high-quality interpolation frames. We have conducted a series of experiments and the results show that our model exhibits satisfactory performance on different datasets compared with the state-of-the-art methods.

视频帧插值（VFI）用于在视频序列中的两个帧之间合成一个或多个中间帧，以提高视频的时间分辨率。然而，在处理涉及高速运动、遮挡和其他因素的复杂场景时，许多方法仍面临挑战。为了应对这些挑战，我们提出了一种用于 VFI 的基于边缘的多尺度交叉融合网络（EMCFN）。我们将基于边缘信息的特征增强模块（FEM）集成到 U-Net 架构中，从而生成了更丰富、更完整的特征图，同时还增强了对图像结构和细节的保护。这有助于生成更准确、更逼真的插值帧。同时，我们使用由三个网格网分支组成的多尺度交叉融合帧合成模型（MCFM）来生成高质量的插值帧。我们进行了一系列实验，结果表明，与最先进的方法相比，我们的模型在不同的数据集上表现出令人满意的性能。

{"title":"EMCFN: Edge-based Multi-scale Cross Fusion Network for video frame interpolation","authors":"Shaowen Wang , Xiaohui Yang , Zhiquan Feng , Jiande Sun , Ju Liu","doi":"10.1016/j.jvcir.2024.104226","DOIUrl":"10.1016/j.jvcir.2024.104226","url":null,"abstract":"<div><p>Video frame interpolation (VFI) is used to synthesize one or more intermediate frames between two frames in a video sequence to improve the temporal resolution of the video. However, many methods still face challenges when dealing with complex scenes involving high-speed motion, occlusions, and other factors. To address these challenges, we propose an Edge-based Multi-scale Cross Fusion Network (EMCFN) for VFI. We integrate a feature enhancement module (FEM) based on edge information into the U-Net architecture, resulting in richer and more complete feature maps, while also enhancing the preservation of image structure and details. This contributes to generating more accurate and realistic interpolated frames. At the same time, we use a multi-scale cross fusion frame synthesis model (MCFM) composed of three GridNet branches to generate high-quality interpolation frames. We have conducted a series of experiments and the results show that our model exhibits satisfactory performance on different datasets compared with the state-of-the-art methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104226"},"PeriodicalIF":2.6,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FISTA acceleration inspired network design for underwater image enhancement 用于水下图像增强的 FISTA 加速启发网络设计

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-08 DOI: 10.1016/j.jvcir.2024.104224

Bing-Yuan Chen, Jian-Nan Su, Guang-Yong Chen, Min Gan

Underwater image enhancement, especially in color restoration and detail reconstruction, remains a significant challenge. Current models focus on improving accuracy and learning efficiency through neural network design, often neglecting traditional optimization algorithms’ benefits. We propose FAIN-UIE, a novel approach for color and fine-texture recovery in underwater imagery. It leverages insights from the Fast Iterative Shrink-Threshold Algorithm (FISTA) to approximate image degradation, enhancing network fitting speed. FAIN-UIE integrates the residual degradation module (RDM) and momentum calculation module (MC) for gradient descent and momentum simulation, addressing feature fusion losses with the Feature Merge Block (FMB). By integrating multi-scale information and inter-stage pathways, our method effectively maps multi-stage image features, advancing color and fine-texture restoration. Experimental results validate its robust performance, positioning FAIN-UIE as a competitive solution for practical underwater imaging applications.

水下图像增强，尤其是色彩还原和细节重建，仍然是一项重大挑战。目前的模型侧重于通过神经网络设计提高准确性和学习效率，往往忽视了传统优化算法的优势。我们提出的 FAIN-UIE 是一种用于水下图像色彩和细节纹理恢复的新方法。它利用了快速迭代收缩阈值算法（FISTA）的观点来近似处理图像劣化，从而提高了网络拟合速度。FAIN-UIE 集成了用于梯度下降和动量模拟的残差退化模块（RDM）和动量计算模块（MC），并通过特征合并块（FMB）解决了特征融合损失问题。通过整合多尺度信息和阶段间路径，我们的方法能有效映射多阶段图像特征，推进色彩和精细纹理修复。实验结果验证了 FAIN-UIE 强大的性能，使其成为水下成像实际应用中具有竞争力的解决方案。

引用次数: 0

Copy-move forgery detection using Regional Density Center clustering 利用区域密度中心聚类进行复制移动伪造检测

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-05 DOI: 10.1016/j.jvcir.2024.104221

Cong Lin , Yufeng Wu , Ke Huang , Hai Yang , Yuqiao Deng , Yamin Wen

Copy-move forgery detection is a common image tampering detection technology. In this paper, a novel copy-move forgery detection scheme is proposed. The proposed scheme is based on Regional Density Center (RDC) clustering and Refined Length Homogeneity Filtering (RLHF) policy. First, to obtain an adequate number of keypoints in smooth or small areas of the image, the proposed scheme employs scale normalization and adjustment of the contrast threshold of the input image. Subsequently, to speed up the feature matching process, a matching algorithm based on gray value grouping is used to match the keypoints. RLHF policy is applied to filter the mismatched pairs. To guarantee a good estimation of the affine transformation, the RDC clustering algorithm is proposed to group the matched pairs. Finally, the correlation coefficients are computed to precisely locate the tampered regions. The proposed copy-move forgery detection scheme based on RDC and RLHF can effectively identify duplicated regions of digital images. It demonstrates the effectiveness and robustness of the proposed scheme over many state-of-the-art schemes on public datasets.

复制移动伪造检测是一种常见的图像篡改检测技术。本文提出了一种新颖的复制移动伪造检测方案。该方案基于区域密度中心（RDC）聚类和精炼长度均匀性过滤（RLHF）策略。首先，为了在图像平滑或较小的区域获得足够数量的关键点，该方案采用了比例归一化和调整输入图像对比度阈值的方法。随后，为了加快特征匹配过程，采用了基于灰度值分组的匹配算法来匹配关键点。采用 RLHF 策略过滤不匹配的配对。为了保证仿射变换的良好估计，提出了 RDC 聚类算法来对匹配的数据对进行分组。最后，计算相关系数以精确定位篡改区域。基于 RDC 和 RLHF 的复制移动伪造检测方案能有效识别数字图像的复制区域。它在公共数据集上证明了所提出方案的有效性和鲁棒性超过了许多最先进的方案。

{"title":"Copy-move forgery detection using Regional Density Center clustering","authors":"Cong Lin , Yufeng Wu , Ke Huang , Hai Yang , Yuqiao Deng , Yamin Wen","doi":"10.1016/j.jvcir.2024.104221","DOIUrl":"https://doi.org/10.1016/j.jvcir.2024.104221","url":null,"abstract":"<div><p>Copy-move forgery detection is a common image tampering detection technology. In this paper, a novel copy-move forgery detection scheme is proposed. The proposed scheme is based on Regional Density Center (RDC) clustering and Refined Length Homogeneity Filtering (RLHF) policy. First, to obtain an adequate number of keypoints in smooth or small areas of the image, the proposed scheme employs scale normalization and adjustment of the contrast threshold of the input image. Subsequently, to speed up the feature matching process, a matching algorithm based on gray value grouping is used to match the keypoints. RLHF policy is applied to filter the mismatched pairs. To guarantee a good estimation of the affine transformation, the RDC clustering algorithm is proposed to group the matched pairs. Finally, the correlation coefficients are computed to precisely locate the tampered regions. The proposed copy-move forgery detection scheme based on RDC and RLHF can effectively identify duplicated regions of digital images. It demonstrates the effectiveness and robustness of the proposed scheme over many state-of-the-art schemes on public datasets.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104221"},"PeriodicalIF":2.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and optimization of an aerobics movement recognition system based on high-dimensional biotechnological data using neural networks 利用神经网络设计和优化基于高维生物技术数据的健美操动作识别系统

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-05 DOI: 10.1016/j.jvcir.2024.104227

Ma Yihan

This study presents the design and optimization of an aerobics movement recognition system utilizing high-dimensional biotechnological data. Deep learning techniques are employed to achieve accurate classification and recognition of movement actions. Biosensing technology and wearable devices are used to collect real-time, multidimensional physiological signal data from key anatomical regions of athletes. The system is constructed using convolutional neural networks (CNNs) and Long Short-Term Memory. Model performance is optimized through parameter selection and strategies such as Xavier initialization, the cross-entropy loss function, and the Adam optimizer. The results indicate that Model C achieves an accuracy of 0.987, significantly outperforming the standalone CNNs (accuracy 0.975) and the recurrent neural network models (accuracy 0.965). Furthermore, it demonstrates notable efficiency in practical applications, with considerably reduced execution times of 10 s for data processing, 25 s for feature extraction, and 20 s for classification. This aerobics recognition system excels in performance and efficiency, supporting precise movement classification.

本研究介绍了利用高维生物技术数据设计和优化有氧运动动作识别系统的情况。该系统采用了深度学习技术，以实现对动作的准确分类和识别。利用生物传感技术和可穿戴设备从运动员的关键解剖区域实时收集多维生理信号数据。该系统使用卷积神经网络（CNN）和长短期记忆构建。通过参数选择以及 Xavier 初始化、交叉熵损失函数和 Adam 优化器等策略，对模型性能进行了优化。结果表明，模型 C 的准确率达到 0.987，明显优于独立 CNN（准确率为 0.975）和递归神经网络模型（准确率为 0.965）。此外，它在实际应用中也表现出了显著的效率，数据处理执行时间大幅缩短为 10 秒，特征提取执行时间缩短为 25 秒，分类执行时间缩短为 20 秒。该健美操识别系统性能卓越、效率高，可支持精确的动作分类。

{"title":"Design and optimization of an aerobics movement recognition system based on high-dimensional biotechnological data using neural networks","authors":"Ma Yihan","doi":"10.1016/j.jvcir.2024.104227","DOIUrl":"10.1016/j.jvcir.2024.104227","url":null,"abstract":"<div><p>This study presents the design and optimization of an aerobics movement recognition system utilizing high-dimensional biotechnological data. Deep learning techniques are employed to achieve accurate classification and recognition of movement actions. Biosensing technology and wearable devices are used to collect real-time, multidimensional physiological signal data from key anatomical regions of athletes. The system is constructed using convolutional neural networks (CNNs) and Long Short-Term Memory. Model performance is optimized through parameter selection and strategies such as Xavier initialization, the cross-entropy loss function, and the Adam optimizer. The results indicate that Model C achieves an accuracy of 0.987, significantly outperforming the standalone CNNs (accuracy 0.975) and the recurrent neural network models (accuracy 0.965). Furthermore, it demonstrates notable efficiency in practical applications, with considerably reduced execution times of 10 s for data processing, 25 s for feature extraction, and 20 s for classification. This aerobics recognition system excels in performance and efficiency, supporting precise movement classification.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104227"},"PeriodicalIF":2.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324001834/pdfft?md5=54202c0dbea9f24ad0fe8563f43abedb&pid=1-s2.0-S1047320324001834-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IML-SSOD: Interconnected and multi-layer threshold learning for semi-supervised detection IML-SSOD：用于半监督检测的互联多层阈值学习

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-04 DOI: 10.1016/j.jvcir.2024.104220

Bin Ge, Yuyang Li, Huanhuan Liu, Chenxing Xia, Shuaishuai Geng

While semi-supervised anchored detector of the R-CNN series has achieved remarkable success, semi-supervised anchor-free detector lacks the ability to generate high-quality flexible pseudo labels, resulting in serious inconsistencies in SSOD. In order to make the network learn more reliable and consistent label data to solve the problem of information bias, we propose an interconnected and multi-layer threshold learning for semi-supervised object detection (IML-SSOD). The Joint Guided Estimation (JGE) module uses the Core Zone refinement module to improve the position accuracy score of low semantic information, and combines the classification and the centerness score as evaluation criteria to predict stable labels. The multi-layer threshold filtering method selects more potential label samples for the student network ensuring the information used in training. Extensive experiments on MS COCO and PASCAL VOC datasets demonstrated the effectiveness of IML-SSOD. Compared with existing methods, our method on VOC achieved 81.9% AP $_{50}$ and 57.89% AP $_{50 : 95}$ , which is highly competitive.

虽然 R-CNN 系列的半监督有锚检测器取得了显著的成就，但半监督无锚检测器缺乏生成高质量灵活伪标签的能力，导致 SSOD 存在严重的不一致性。为了让网络学习到更可靠、更一致的标签数据，以解决信息偏差问题，我们提出了一种用于半监督对象检测的互联多层阈值学习（IML-SSOD）。联合引导估计（JGE）模块利用核心区细化模块提高低语义信息的位置准确度得分，并结合分类和中心度得分作为评价标准来预测稳定的标签。多层阈值过滤法为学生网络选择了更多潜在标签样本，确保了训练中使用的信息。在 MS COCO 和 PASCAL VOC 数据集上的大量实验证明了 IML-SSOD 的有效性。与现有方法相比，我们的方法在 VOC 数据集上实现了 81.9% 的 AP50 和 57.89% 的 AP50:95，具有很强的竞争力。

{"title":"IML-SSOD: Interconnected and multi-layer threshold learning for semi-supervised detection","authors":"Bin Ge, Yuyang Li, Huanhuan Liu, Chenxing Xia, Shuaishuai Geng","doi":"10.1016/j.jvcir.2024.104220","DOIUrl":"https://doi.org/10.1016/j.jvcir.2024.104220","url":null,"abstract":"<div><p>While semi-supervised anchored detector of the R-CNN series has achieved remarkable success, semi-supervised anchor-free detector lacks the ability to generate high-quality flexible pseudo labels, resulting in serious inconsistencies in SSOD. In order to make the network learn more reliable and consistent label data to solve the problem of information bias, we propose an interconnected and multi-layer threshold learning for semi-supervised object detection (IML-SSOD). The Joint Guided Estimation (JGE) module uses the Core Zone refinement module to improve the position accuracy score of low semantic information, and combines the classification and the centerness score as evaluation criteria to predict stable labels. The multi-layer threshold filtering method selects more potential label samples for the student network ensuring the information used in training. Extensive experiments on MS COCO and PASCAL VOC datasets demonstrated the effectiveness of IML-SSOD. Compared with existing methods, our method on VOC achieved 81.9% <em>AP</em><span><math><msub><mrow></mrow><mrow><mn>50</mn></mrow></msub></math></span> and 57.89% <em>AP</em><span><math><msub><mrow></mrow><mrow><mn>50</mn><mo>:</mo><mn>95</mn></mrow></msub></math></span>, which is highly competitive.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104220"},"PeriodicalIF":2.6,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141594353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CC-SMC: Chain coding-based segmentation map lossless compression CC-SMC：基于链式编码的分割图无损压缩

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-07-03 DOI: 10.1016/j.jvcir.2024.104222

Runyu Yang , Dong Liu , Feng Wu , Wen Gao

A segmentation map, either static or dynamic, refers to a two-dimensional picture that may vary with time and indicates the segmentation label per pixel. Both the semantic map and the occupancy map in video-based point cloud compression (V-PCC) belong to the segmentation map we referred to. The semantic map can work for many machine vision tasks like tracking and has been used as a layer of image representation in some image compression methods. The occupancy map constitutes a part of the point cloud coding bitstream. Since segmentation maps are widely used, how to efficiently compress them is of interest. We propose a segmentation map lossless compression scheme namely CC-SMC, exploiting the nature of segmentation maps that usually contain limited colors and sharp edges. Specifically, we design a chain coding-based scheme combined with quadtree-based block partitioning. For intraframe coding, one block is partitioned recursively with a quadtree structure, until the block contains only one color, is smaller than a threshold, or satisfies the defined chain coding condition. We revise the three-orthogonal chain coding method to incorporate contextual information and design effective intraframe prediction methods. For interframe coding, one block may find a reference block; the chain difference between the current and the reference blocks is coded. We implement the proposed scheme and test it on several different kinds of segmentation maps. Compared with advanced lossless image compression techniques, our proposed scheme obtains more than 10% bits reduction as well as more than 20% decoding time-saving. The code is available at https://github.com/Yang-Runyu/CC-SMC.

无论是静态还是动态的分割图，都是指可能随时间变化的二维图片，并表示每个像素的分割标签。基于视频的点云压缩（V-PCC）中的语义图和占位图都属于我们所说的分割图。语义图可用于许多机器视觉任务，如跟踪，并在一些图像压缩方法中用作图像表示层。占位图是点云编码比特流的一部分。由于分割图被广泛使用，如何高效地对其进行压缩成为了人们关注的焦点。利用分割图通常包含有限颜色和锐利边缘的特性，我们提出了一种分割图无损压缩方案，即 CC-SMC。具体来说，我们设计了一种基于链式编码的方案，并将其与基于四叉树的块分割相结合。在帧内编码时，用四叉树结构递归分割一个区块，直到该区块只包含一种颜色、小于阈值或满足定义的链式编码条件。我们对三正交链编码方法进行了修改，将上下文信息纳入其中，并设计出有效的帧内预测方法。对于帧间编码，一个块可能会找到一个参考块；当前块和参考块之间的链差会被编码。我们实现了所提出的方案，并在几种不同的分割图上进行了测试。与先进的无损图像压缩技术相比，我们提出的方案减少了 10% 以上的比特，并节省了 20% 以上的解码时间。代码可在以下网址获取。

{"title":"CC-SMC: Chain coding-based segmentation map lossless compression","authors":"Runyu Yang , Dong Liu , Feng Wu , Wen Gao","doi":"10.1016/j.jvcir.2024.104222","DOIUrl":"10.1016/j.jvcir.2024.104222","url":null,"abstract":"<div><p>A segmentation map, either static or dynamic, refers to a two-dimensional picture that may vary with time and indicates the segmentation label per pixel. Both the semantic map and the occupancy map in video-based point cloud compression (V-PCC) belong to the segmentation map we referred to. The semantic map can work for many machine vision tasks like tracking and has been used as a layer of image representation in some image compression methods. The occupancy map constitutes a part of the point cloud coding bitstream. Since segmentation maps are widely used, how to efficiently compress them is of interest. We propose a segmentation map lossless compression scheme namely CC-SMC, exploiting the nature of segmentation maps that usually contain limited colors and sharp edges. Specifically, we design a chain coding-based scheme combined with quadtree-based block partitioning. For intraframe coding, one block is partitioned recursively with a quadtree structure, until the block contains only one color, is smaller than a threshold, or satisfies the defined chain coding condition. We revise the three-orthogonal chain coding method to incorporate contextual information and design effective intraframe prediction methods. For interframe coding, one block may find a reference block; the chain difference between the current and the reference blocks is coded. We implement the proposed scheme and test it on several different kinds of segmentation maps. Compared with advanced lossless image compression techniques, our proposed scheme obtains more than 10% bits reduction as well as more than 20% decoding time-saving. The code is available at <span>https://github.com/Yang-Runyu/CC-SMC</span><svg><path></path></svg>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104222"},"PeriodicalIF":2.6,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0