首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Versatile depth estimator based on common relative depth estimation and camera-specific relative-to-metric depth conversion 基于普通相对深度估算和相机特定的相对深度到公制深度转换的多功能深度估算器
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104252

A typical monocular depth estimator is trained for a single camera, so its performance drops severely on images taken with different cameras. To address this issue, we propose a versatile depth estimator (VDE), composed of a common relative depth estimator (CRDE) and multiple relative-to-metric converters (R2MCs). The CRDE extracts relative depth information, and each R2MC converts the relative information to predict metric depths for a specific camera. The proposed VDE can cope with diverse scenes, including both indoor and outdoor scenes, with only a 1.12% parameter increase per camera. Experimental results demonstrate that VDE supports multiple cameras effectively and efficiently and also achieves state-of-the-art performance in the conventional single-camera scenario.

典型的单目深度估计器只针对单台相机进行训练,因此在使用不同相机拍摄的图像上,其性能会严重下降。为了解决这个问题,我们提出了一种通用深度估计器(VDE),它由一个通用相对深度估计器(CRDE)和多个相对到度量转换器(R2MC)组成。CRDE 提取相对深度信息,每个 R2MC 将相对信息转换为预测特定相机的度量深度。所提出的 VDE 可应对包括室内和室外场景在内的各种不同场景,而每台摄像机的参数只需增加 1.12%。实验结果表明,VDE 能高效地支持多摄像头,在传统的单摄像头场景中也能达到最先进的性能。
{"title":"Versatile depth estimator based on common relative depth estimation and camera-specific relative-to-metric depth conversion","authors":"","doi":"10.1016/j.jvcir.2024.104252","DOIUrl":"10.1016/j.jvcir.2024.104252","url":null,"abstract":"<div><p>A typical monocular depth estimator is trained for a single camera, so its performance drops severely on images taken with different cameras. To address this issue, we propose a versatile depth estimator (VDE), composed of a common relative depth estimator (CRDE) and multiple relative-to-metric converters (R2MCs). The CRDE extracts relative depth information, and each R2MC converts the relative information to predict metric depths for a specific camera. The proposed VDE can cope with diverse scenes, including both indoor and outdoor scenes, with only a 1.12% parameter increase per camera. Experimental results demonstrate that VDE supports multiple cameras effectively and efficiently and also achieves state-of-the-art performance in the conventional single-camera scenario.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EM-Gait: Gait recognition using motion excitation and feature embedding self-attention EM-Gait:利用运动激励和特征嵌入自我关注进行步态识别
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104266

Gait recognition, which can realize long-distance and contactless identification, is an important biometric technology. Recent gait recognition methods focus on learning the pattern of human movement or appearance during walking, and construct the corresponding spatio-temporal representations. However, different individuals have their own laws of movement patterns, simple spatial–temporal features are difficult to describe changes in motion of human parts, especially when confounding variables such as clothing and carrying are included, thus distinguishability of features is reduced. To this end, we propose the Embedding and Motion (EM) block and Fine Feature Extractor (FFE) to capture the motion mode of walking and enhance the difference of local motion rules. The EM block consists of a Motion Excitation (ME) module to capture the changes of temporal motion and an Embedding Self-attention (ES) module to enhance the expression of motion rules. Specifically, without introducing additional parameters, ME module learns the difference information between frames and intervals to obtain the dynamic change representation of walking for frame sequences with uncertain length. By contrast, ES module divides the feature map hierarchically based on element values, blurring the difference of elements to highlight the motion track. Furthermore, we present the FFE, which independently learns the spatio-temporal representations of human body according to different horizontal parts of individuals. Benefiting from EM block and our proposed motion branch, our method innovatively combines motion change information, significantly improving the performance of the model under cross appearance conditions. On the popular dataset CASIA-B, our proposed EM-Gait is better than the existing single-modal gait recognition methods.

步态识别可以实现远距离和非接触式身份识别,是一项重要的生物识别技术。近年来的步态识别方法主要是学习人在行走过程中的运动或外观模式,并构建相应的时空表征。然而,不同个体的运动模式有其自身的规律,简单的时空特征难以描述人体各部位的运动变化,特别是当包括服装、携带等混杂变量时,特征的可区分性就会降低。为此,我们提出了嵌入与运动(EM)模块和精细特征提取器(FFE)来捕捉行走的运动模式,并增强局部运动规则的差异性。EM 模块包括用于捕捉时间运动变化的运动激励(ME)模块和用于增强运动规则表达的嵌入自注意(ES)模块。具体来说,在不引入额外参数的情况下,ME 模块学习帧与帧之间的差异信息,以获得长度不确定的帧序列的步行动态变化表示。相比之下,ES 模块根据元素值对特征图进行分层,模糊元素之间的差异,从而突出运动轨迹。此外,我们还提出了 FFE,它能根据个体的不同水平部位独立学习人体的时空表征。得益于电磁块和我们提出的运动分支,我们的方法创新性地结合了运动变化信息,显著提高了交叉外观条件下模型的性能。在流行的数据集 CASIA-B 上,我们提出的 EM-Gait 优于现有的单模态步态识别方法。
{"title":"EM-Gait: Gait recognition using motion excitation and feature embedding self-attention","authors":"","doi":"10.1016/j.jvcir.2024.104266","DOIUrl":"10.1016/j.jvcir.2024.104266","url":null,"abstract":"<div><p>Gait recognition, which can realize long-distance and contactless identification, is an important biometric technology. Recent gait recognition methods focus on learning the pattern of human movement or appearance during walking, and construct the corresponding spatio-temporal representations. However, different individuals have their own laws of movement patterns, simple spatial–temporal features are difficult to describe changes in motion of human parts, especially when confounding variables such as clothing and carrying are included, thus distinguishability of features is reduced. To this end, we propose the Embedding and Motion (EM) block and Fine Feature Extractor (FFE) to capture the motion mode of walking and enhance the difference of local motion rules. The EM block consists of a Motion Excitation (ME) module to capture the changes of temporal motion and an Embedding Self-attention (ES) module to enhance the expression of motion rules. Specifically, without introducing additional parameters, ME module learns the difference information between frames and intervals to obtain the dynamic change representation of walking for frame sequences with uncertain length. By contrast, ES module divides the feature map hierarchically based on element values, blurring the difference of elements to highlight the motion track. Furthermore, we present the FFE, which independently learns the spatio-temporal representations of human body according to different horizontal parts of individuals. Benefiting from EM block and our proposed motion branch, our method innovatively combines motion change information, significantly improving the performance of the model under cross appearance conditions. On the popular dataset CASIA-B, our proposed EM-Gait is better than the existing single-modal gait recognition methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142075777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments 融合运动结构和光流模拟增强姿态回归,应对室内环境挑战
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104256

The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.

物体定位在机器人、虚拟现实和增强现实以及仓储物流等许多应用中都至关重要。深度学习领域的最新进展使得使用单目摄像头进行定位成为可能。传统上,运动结构(SfM)技术根据点云预测物体的绝对位置,而绝对姿态回归(APR)方法则利用神经网络从语义上理解环境。然而,这两种方法都面临着环境因素的挑战,如运动模糊、光照变化、重复模式和无特征区域。本研究采用相对姿态回归 (RPR) 方法,通过整合额外信息和完善绝对姿态估计值来应对这些挑战。相对姿态回归法也很难解决运动模糊等问题。为了克服这一问题,我们使用 Lucas-Kanade 算法计算连续图像之间的光流,并使用小型递归卷积网络预测相对姿势。由于全局坐标系和局部坐标系之间存在差异,因此很难将绝对姿势和相对姿势结合起来。目前的方法使用姿势图优化(PGO)来对齐这些姿势。在这项工作中,我们提出了递归融合网络,以更好地整合绝对姿势和相对姿势预测,提高绝对姿势估计的准确性。我们评估了八种不同的递归单元,并创建了一个模拟环境,对 APR 和 RPR 网络进行预训练,以提高泛化能力。此外,我们还记录了一个具有挑战性的室内环境(类似于有运输机器人的仓库)中各种场景的大型数据集。通过超参数搜索和实验,我们证明了我们的循环融合方法在有效性上优于 PGO。
{"title":"Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments","authors":"","doi":"10.1016/j.jvcir.2024.104256","DOIUrl":"10.1016/j.jvcir.2024.104256","url":null,"abstract":"<div><p>The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324002128/pdfft?md5=f88e7c25e01d5af99626350e7efd4744&pid=1-s2.0-S1047320324002128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation 用于三维人体姿态估计的联合多尺度变换器和姿态等效约束条件
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104247

Different from image-based 3D pose estimation, video-based 3D pose estimation gains performance improvement with temporal information. However, these methods still face the challenge of insufficient generalization ability, including human motion speed, body shape, and camera distance. To address the above problems, we propose a novel approach, referred to as joint Spatial–temporal Multi-scale Transformers and Pose Transformation Equivalence Constraints (SMT-PTEC) for 3D human pose estimation from videos. We design a more general spatial–temporal multi-scale feature extraction strategy, and introduce optimization constraints that adapt to the diversity of data to improve the accuracy of pose estimation. Specifically, we first introduce a spatial multi-scale transformer to extract multi-scale features of pose and establish a cross-scale information transfer mechanism, which effectively explores the underlying knowledge of human motion. Then, we present a temporal multi-scale transformer to explore multi-scale dependencies between frames, enhance the adaptability of the network to human motion speed, and improve the estimation accuracy through a context aware fusion of multi-scale predictions. Moreover, we add pose transformation equivalence constraints by changing the training samples with horizontal flipping, scaling, and body shape transformation to effectively overcome the influence of camera distance and body shape for the prediction accuracy. Extensive experimental results demonstrate that our approach achieves superior performance with less computational complexity than previous state-of-the-art methods. Code is available at https://github.com/JNGao123/SMT-PTEC.

与基于图像的三维姿态估算不同,基于视频的三维姿态估算可以利用时间信息提高性能。然而,这些方法仍然面临着泛化能力不足的挑战,包括人体运动速度、身体形状和摄像机距离等。为了解决上述问题,我们提出了一种新方法,即空间-时间多尺度变换器和姿态变换等效约束(SMT-PTEC)联合方法,用于从视频进行三维人体姿态估计。我们设计了一种更通用的时空多尺度特征提取策略,并引入了适应数据多样性的优化约束,以提高姿势估计的准确性。具体来说,我们首先引入了空间多尺度变换器来提取姿势的多尺度特征,并建立了跨尺度信息传递机制,从而有效地探索了人体运动的底层知识。然后,我们提出了时间多尺度变换器,以探索帧与帧之间的多尺度依赖关系,增强网络对人体运动速度的适应性,并通过上下文感知的多尺度预测融合提高估计精度。此外,我们还通过改变训练样本的水平翻转、缩放和体形变换来添加姿势变换等效约束,从而有效克服摄像机距离和体形对预测精度的影响。广泛的实验结果表明,与之前的先进方法相比,我们的方法以更低的计算复杂度实现了更优越的性能。代码见 https://github.com/JNGao123/SMT-PTEC。
{"title":"Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation","authors":"","doi":"10.1016/j.jvcir.2024.104247","DOIUrl":"10.1016/j.jvcir.2024.104247","url":null,"abstract":"<div><p>Different from image-based 3D pose estimation, video-based 3D pose estimation gains performance improvement with temporal information. However, these methods still face the challenge of insufficient generalization ability, including human motion speed, body shape, and camera distance. To address the above problems, we propose a novel approach, referred to as joint Spatial–temporal Multi-scale Transformers and Pose Transformation Equivalence Constraints (SMT-PTEC) for 3D human pose estimation from videos. We design a more general spatial–temporal multi-scale feature extraction strategy, and introduce optimization constraints that adapt to the diversity of data to improve the accuracy of pose estimation. Specifically, we first introduce a spatial multi-scale transformer to extract multi-scale features of pose and establish a cross-scale information transfer mechanism, which effectively explores the underlying knowledge of human motion. Then, we present a temporal multi-scale transformer to explore multi-scale dependencies between frames, enhance the adaptability of the network to human motion speed, and improve the estimation accuracy through a context aware fusion of multi-scale predictions. Moreover, we add pose transformation equivalence constraints by changing the training samples with horizontal flipping, scaling, and body shape transformation to effectively overcome the influence of camera distance and body shape for the prediction accuracy. Extensive experimental results demonstrate that our approach achieves superior performance with less computational complexity than previous state-of-the-art methods. Code is available at <span><span>https://github.com/JNGao123/SMT-PTEC</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141954063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting and tracking moving objects in defocus blur scenes 检测和跟踪散焦模糊场景中的移动物体
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104259

Object tracking stands as a cornerstone challenge within computer vision, with blurriness analysis representing a burgeoning field of interest. Among the various forms of blur encountered in natural scenes, defocus blur remains significantly underexplored. To bridge this gap, this article introduces the Defocus Blur Video Object Tracking (DBVOT) dataset, specifically crafted to facilitate research in visual object tracking under defocus blur conditions. We conduct a comprehensive performance analysis of 18 state-of-the-art object tracking methods on this unique dataset. Additionally, we propose a selective deblurring framework based on Deblurring Auxiliary Learning Net (DID-Anet), innovatively designed to tackle the complexities of defocus blur. This framework integrates a novel defocus blurriness metric for the smart deblurring of video frames, thereby enhancing the efficacy of tracking methods in defocus blur scenarios. Our extensive experimental evaluations underscore the significant advancements in tracking accuracy achieved by incorporating our proposed framework with leading tracking technologies.

物体跟踪是计算机视觉领域的一项基础挑战,而模糊度分析则是一个新兴的研究领域。在自然场景中出现的各种模糊形式中,离焦模糊的研究仍显不足。为了弥补这一不足,本文介绍了离焦模糊视频对象跟踪(DBVOT)数据集,该数据集专门用于促进离焦模糊条件下的视觉对象跟踪研究。我们在这个独特的数据集上对 18 种最先进的物体跟踪方法进行了全面的性能分析。此外,我们还提出了一个基于去模糊辅助学习网(DID-Anet)的选择性去模糊框架,该框架设计新颖,可应对复杂的虚焦模糊问题。该框架集成了一个新颖的虚焦模糊度量,可对视频帧进行智能去模糊处理,从而提高了追踪方法在虚焦模糊场景中的功效。我们进行了广泛的实验评估,通过将我们提出的框架与领先的跟踪技术相结合,显著提高了跟踪精度。
{"title":"Detecting and tracking moving objects in defocus blur scenes","authors":"","doi":"10.1016/j.jvcir.2024.104259","DOIUrl":"10.1016/j.jvcir.2024.104259","url":null,"abstract":"<div><p>Object tracking stands as a cornerstone challenge within computer vision, with blurriness analysis representing a burgeoning field of interest. Among the various forms of blur encountered in natural scenes, defocus blur remains significantly underexplored. To bridge this gap, this article introduces the Defocus Blur Video Object Tracking (DBVOT) dataset, specifically crafted to facilitate research in visual object tracking under defocus blur conditions. We conduct a comprehensive performance analysis of 18 state-of-the-art object tracking methods on this unique dataset. Additionally, we propose a selective deblurring framework based on Deblurring Auxiliary Learning Net (DID-Anet), innovatively designed to tackle the complexities of defocus blur. This framework integrates a novel defocus blurriness metric for the smart deblurring of video frames, thereby enhancing the efficacy of tracking methods in defocus blur scenarios. Our extensive experimental evaluations underscore the significant advancements in tracking accuracy achieved by incorporating our proposed framework with leading tracking technologies.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight target tracking algorithm based on online correction for meta-learning 基于元学习在线校正的轻量级目标跟踪算法
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104228

The traditional Siamese network based object tracking algorithms suffer from high computational complexity, making them difficult to run on embedded devices. Moreover, when faced with long-term tracking tasks, their success rates significantly decline. To address these issues, we propose a lightweight long-term object tracking algorithm called Meta-Master-based Ghost Fast Tracking (MGTtracker),which based on meta-learning. This algorithm integrates the Ghost mechanism to create a lightweight backbone network called G-ResNet, which accurately extracts target features while operating quickly. We design a tiny adaptive weighted fusion feature pyramid network (TiFPN) to enhance feature information fusion and mitigate interference from similar objects. We introduce a lightweight region regression network, the Ghost Decouple Net (GDNet) for target position prediction. Finally, we propose a meta-learning-based online template correction mechanism called Meta-Master to overcome error accumulation in long-term tracking tasks and the difficulty of reacquiring targets after loss. We evaluate the algorithm on public datasets OTB100, VOT2020, VOT2018LT, and LaSOT and deploy it for performance testing on Jetson Xavier NX. Experimental results demonstrate the effectiveness and superiority of the algorithm. Compared to existing classic object tracking algorithms, our approach achieves a faster running speed of 25 FPS on NX, and real-time correction enhances the algorithm’s robustness. Although similar in accuracy and EAO metrics, our algorithm outperforms similar algorithms in speed and effectively addresses the issues of significant cumulative errors and easy target loss during tracking. Code is released at https://github.com/ygh96521/MGTtracker.git.

传统的基于连体网络的物体跟踪算法计算复杂度高,难以在嵌入式设备上运行。此外,当面临长期跟踪任务时,它们的成功率会明显下降。为了解决这些问题,我们提出了一种基于元学习的轻量级长期目标跟踪算法,称为基于元主控的幽灵快速跟踪(MGTtracker)。该算法整合了 Ghost 机制,创建了一个名为 G-ResNet 的轻量级骨干网络,可以准确提取目标特征,同时快速运行。我们设计了一个微小的自适应加权融合特征金字塔网络(TiFPN),以增强特征信息融合并减轻相似物体的干扰。我们引入了轻量级区域回归网络--Ghost Decouple Net (GDNet),用于目标位置预测。最后,我们提出了一种基于元学习的在线模板校正机制,称为 Meta-Master,以克服长期跟踪任务中的误差积累和丢失后重新获取目标的困难。我们在公共数据集 OTB100、VOT2020、VOT2018LT 和 LaSOT 上对算法进行了评估,并在 Jetson Xavier NX 上进行了性能测试。实验结果证明了该算法的有效性和优越性。与现有的经典物体跟踪算法相比,我们的方法在 NX 上的运行速度更快,达到 25 FPS,而且实时校正增强了算法的鲁棒性。虽然精度和 EAO 指标相似,但我们的算法在速度上优于同类算法,并有效解决了跟踪过程中累积误差大和目标易丢失的问题。代码发布于 https://github.com/ygh96521/MGTtracker.git。
{"title":"A lightweight target tracking algorithm based on online correction for meta-learning","authors":"","doi":"10.1016/j.jvcir.2024.104228","DOIUrl":"10.1016/j.jvcir.2024.104228","url":null,"abstract":"<div><p>The traditional Siamese network based object tracking algorithms suffer from high computational complexity, making them difficult to run on embedded devices. Moreover, when faced with long-term tracking tasks, their success rates significantly decline. To address these issues, we propose a lightweight long-term object tracking algorithm called Meta-Master-based Ghost Fast Tracking (MGTtracker),which based on meta-learning. This algorithm integrates the Ghost mechanism to create a lightweight backbone network called G-ResNet, which accurately extracts target features while operating quickly. We design a tiny adaptive weighted fusion feature pyramid network (TiFPN) to enhance feature information fusion and mitigate interference from similar objects. We introduce a lightweight region regression network, the Ghost Decouple Net (GDNet) for target position prediction. Finally, we propose a meta-learning-based online template correction mechanism called Meta-Master to overcome error accumulation in long-term tracking tasks and the difficulty of reacquiring targets after loss. We evaluate the algorithm on public datasets OTB100, VOT2020, VOT2018LT, and LaSOT and deploy it for performance testing on Jetson Xavier NX. Experimental results demonstrate the effectiveness and superiority of the algorithm. Compared to existing classic object tracking algorithms, our approach achieves a faster running speed of 25 FPS on NX, and real-time correction enhances the algorithm’s robustness. Although similar in accuracy and EAO metrics, our algorithm outperforms similar algorithms in speed and effectively addresses the issues of significant cumulative errors and easy target loss during tracking. Code is released at <span><span>https://github.com/ygh96521/MGTtracker.git</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141710561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Dual-stream mutually adaptive quality assessment for authentic distortion image” [J. Vis. Commun. Image Represent. 102 (2024) 104216] 对 "真实失真图像的双流互适质量评估 "的更正 [J. Vis. Commun. Image Represent. 102 (2024) 104216]
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104236
{"title":"Corrigendum to “Dual-stream mutually adaptive quality assessment for authentic distortion image” [J. Vis. Commun. Image Represent. 102 (2024) 104216]","authors":"","doi":"10.1016/j.jvcir.2024.104236","DOIUrl":"10.1016/j.jvcir.2024.104236","url":null,"abstract":"","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324001925/pdfft?md5=f78ef7571d3ace1bf67cf930af0b3d50&pid=1-s2.0-S1047320324001925-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141689350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible data hiding for color images based on prediction-error value ordering and adaptive embedding 基于预测误差值排序和自适应嵌入的彩色图像可逆数据隐藏技术
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-22 DOI: 10.1016/j.jvcir.2024.104239

Prediction-error value ordering (PEVO) is an efficient implementation of reversible data hiding (RDH), which is perfect for color images to exploit the inter-channel and intra-channel correlations synchronously. However, the existing PEVO method has a slight shortage in the mapping selection stage, the candidate mappings are selected under conditions inconsistent with actual embedding in advance, and this is not the optimal solution. Therefore, in this paper, a novel RDH method for color images based on PEVO and adaptive embedding is proposed to implement adaptive two-dimensional (2D) modification for PEVO. Firstly, an improved particle swarm optimization (IPSO) algorithm based on PEVO is designed to alleviate the high temporal complexity caused by the determination of parameters and implement adaptive 2D modification for PEVO. Next, to further optimize the mapping used in embedding, an improved adaptive 2D mapping generation strategy is proposed by introducing the position information of points. In addition, a dynamic payload partition strategy is proposed to improve the embedding performance. Finally, the experimental results show that the PSNR of the image Lena is as high as 62.94 dB and the average PSNR of the proposed method is 1.46 dB higher than that of the state-of-the-art methods for embedding capacity of 20,000 bits.

预测-误差值排序(PEVO)是可逆数据隐藏(RDH)的一种有效实现方法,非常适合彩色图像同步利用信道间和信道内的相关性。然而,现有的 PEVO 方法在映射选择阶段略有不足,候选映射是在与实际嵌入不一致的条件下提前选择的,这并不是最优解。因此,本文提出了一种基于 PEVO 和自适应嵌入的新型彩色图像 RDH 方法,以实现 PEVO 的自适应二维(2D)修改。首先,设计了一种基于 PEVO 的改进粒子群优化算法(IPSO),以减轻参数确定所带来的高时间复杂性,并实现 PEVO 的自适应二维修改。接下来,为了进一步优化嵌入时使用的映射,通过引入点的位置信息,提出了一种改进的自适应二维映射生成策略。此外,还提出了一种动态有效载荷分区策略,以提高嵌入性能。最后,实验结果表明,图像 Lena 的 PSNR 高达 62.94 dB,在嵌入容量为 20,000 比特时,所提方法的平均 PSNR 比最先进方法高 1.46 dB。
{"title":"Reversible data hiding for color images based on prediction-error value ordering and adaptive embedding","authors":"","doi":"10.1016/j.jvcir.2024.104239","DOIUrl":"10.1016/j.jvcir.2024.104239","url":null,"abstract":"<div><p>Prediction-error value ordering (PEVO) is an efficient implementation of reversible data hiding (RDH), which is perfect for color images to exploit the inter-channel and intra-channel correlations synchronously. However, the existing PEVO method has a slight shortage in the mapping selection stage, the candidate mappings are selected under conditions inconsistent with actual embedding in advance, and this is not the optimal solution. Therefore, in this paper, a novel RDH method for color images based on PEVO and adaptive embedding is proposed to implement adaptive two-dimensional (2D) modification for PEVO. Firstly, an improved particle swarm optimization (IPSO) algorithm based on PEVO is designed to alleviate the high temporal complexity caused by the determination of parameters and implement adaptive 2D modification for PEVO. Next, to further optimize the mapping used in embedding, an improved adaptive 2D mapping generation strategy is proposed by introducing the position information of points. In addition, a dynamic payload partition strategy is proposed to improve the embedding performance. Finally, the experimental results show that the PSNR of the image Lena is as high as 62.94 dB and the average PSNR of the proposed method is 1.46 dB higher than that of the state-of-the-art methods for embedding capacity of 20,000 bits.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMCFN: Edge-based Multi-scale Cross Fusion Network for video frame interpolation EMCFN:基于边缘的视频帧插值多尺度交叉融合网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-09 DOI: 10.1016/j.jvcir.2024.104226

Video frame interpolation (VFI) is used to synthesize one or more intermediate frames between two frames in a video sequence to improve the temporal resolution of the video. However, many methods still face challenges when dealing with complex scenes involving high-speed motion, occlusions, and other factors. To address these challenges, we propose an Edge-based Multi-scale Cross Fusion Network (EMCFN) for VFI. We integrate a feature enhancement module (FEM) based on edge information into the U-Net architecture, resulting in richer and more complete feature maps, while also enhancing the preservation of image structure and details. This contributes to generating more accurate and realistic interpolated frames. At the same time, we use a multi-scale cross fusion frame synthesis model (MCFM) composed of three GridNet branches to generate high-quality interpolation frames. We have conducted a series of experiments and the results show that our model exhibits satisfactory performance on different datasets compared with the state-of-the-art methods.

视频帧插值(VFI)用于在视频序列中的两个帧之间合成一个或多个中间帧,以提高视频的时间分辨率。然而,在处理涉及高速运动、遮挡和其他因素的复杂场景时,许多方法仍面临挑战。为了应对这些挑战,我们提出了一种用于 VFI 的基于边缘的多尺度交叉融合网络(EMCFN)。我们将基于边缘信息的特征增强模块(FEM)集成到 U-Net 架构中,从而生成了更丰富、更完整的特征图,同时还增强了对图像结构和细节的保护。这有助于生成更准确、更逼真的插值帧。同时,我们使用由三个网格网分支组成的多尺度交叉融合帧合成模型(MCFM)来生成高质量的插值帧。我们进行了一系列实验,结果表明,与最先进的方法相比,我们的模型在不同的数据集上表现出令人满意的性能。
{"title":"EMCFN: Edge-based Multi-scale Cross Fusion Network for video frame interpolation","authors":"","doi":"10.1016/j.jvcir.2024.104226","DOIUrl":"10.1016/j.jvcir.2024.104226","url":null,"abstract":"<div><p>Video frame interpolation (VFI) is used to synthesize one or more intermediate frames between two frames in a video sequence to improve the temporal resolution of the video. However, many methods still face challenges when dealing with complex scenes involving high-speed motion, occlusions, and other factors. To address these challenges, we propose an Edge-based Multi-scale Cross Fusion Network (EMCFN) for VFI. We integrate a feature enhancement module (FEM) based on edge information into the U-Net architecture, resulting in richer and more complete feature maps, while also enhancing the preservation of image structure and details. This contributes to generating more accurate and realistic interpolated frames. At the same time, we use a multi-scale cross fusion frame synthesis model (MCFM) composed of three GridNet branches to generate high-quality interpolation frames. We have conducted a series of experiments and the results show that our model exhibits satisfactory performance on different datasets compared with the state-of-the-art methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FISTA acceleration inspired network design for underwater image enhancement 用于水下图像增强的 FISTA 加速启发网络设计
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-08 DOI: 10.1016/j.jvcir.2024.104224

Underwater image enhancement, especially in color restoration and detail reconstruction, remains a significant challenge. Current models focus on improving accuracy and learning efficiency through neural network design, often neglecting traditional optimization algorithms’ benefits. We propose FAIN-UIE, a novel approach for color and fine-texture recovery in underwater imagery. It leverages insights from the Fast Iterative Shrink-Threshold Algorithm (FISTA) to approximate image degradation, enhancing network fitting speed. FAIN-UIE integrates the residual degradation module (RDM) and momentum calculation module (MC) for gradient descent and momentum simulation, addressing feature fusion losses with the Feature Merge Block (FMB). By integrating multi-scale information and inter-stage pathways, our method effectively maps multi-stage image features, advancing color and fine-texture restoration. Experimental results validate its robust performance, positioning FAIN-UIE as a competitive solution for practical underwater imaging applications.

水下图像增强,尤其是色彩还原和细节重建,仍然是一项重大挑战。目前的模型侧重于通过神经网络设计提高准确性和学习效率,往往忽视了传统优化算法的优势。我们提出的 FAIN-UIE 是一种用于水下图像色彩和细节纹理恢复的新方法。它利用了快速迭代收缩阈值算法(FISTA)的观点来近似处理图像劣化,从而提高了网络拟合速度。FAIN-UIE 集成了用于梯度下降和动量模拟的残差退化模块(RDM)和动量计算模块(MC),并通过特征合并块(FMB)解决了特征融合损失问题。通过整合多尺度信息和阶段间路径,我们的方法能有效映射多阶段图像特征,推进色彩和精细纹理修复。实验结果验证了 FAIN-UIE 强大的性能,使其成为水下成像实际应用中具有竞争力的解决方案。
{"title":"FISTA acceleration inspired network design for underwater image enhancement","authors":"","doi":"10.1016/j.jvcir.2024.104224","DOIUrl":"10.1016/j.jvcir.2024.104224","url":null,"abstract":"<div><p>Underwater image enhancement, especially in color restoration and detail reconstruction, remains a significant challenge. Current models focus on improving accuracy and learning efficiency through neural network design, often neglecting traditional optimization algorithms’ benefits. We propose FAIN-UIE, a novel approach for color and fine-texture recovery in underwater imagery. It leverages insights from the Fast Iterative Shrink-Threshold Algorithm (FISTA) to approximate image degradation, enhancing network fitting speed. FAIN-UIE integrates the residual degradation module (RDM) and momentum calculation module (MC) for gradient descent and momentum simulation, addressing feature fusion losses with the Feature Merge Block (FMB). By integrating multi-scale information and inter-stage pathways, our method effectively maps multi-stage image features, advancing color and fine-texture restoration. Experimental results validate its robust performance, positioning FAIN-UIE as a competitive solution for practical underwater imaging applications.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141701198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1