Journal of Visual Communication and Image Representation最新文献

英文中文

Corrigendum to “Heterogeneity constrained color ellipsoid prior image dehazing algorithm” [J. Vis. Commun. Image Represent. 101 (2024) 104177] 对 "异质性约束彩色椭圆先验图像去毛刺算法 "的更正 [J. Vis. Commun. Image Represent.

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104235

Yuxi Wang , Jing Hu , Rongguo Zhang , Lifang Wang , Rui Zhang , Xiaojun Liu

引用次数: 0

Hypergraph clustering based multi-label cross-modal retrieval 基于超图聚类的多标签跨模态检索

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104258

Shengtang Guo , Huaxiang Zhang , Li Liu , Dongmei Liu , Xu Lu , Liujian Li

Most existing cross-modal retrieval methods face challenges in establishing semantic connections between different modalities due to inherent heterogeneity among them. To establish semantic connections between different modalities and align relevant semantic features across modalities, so as to fully capture important information within the same modality, this paper considers the superiority of hypergraph in representing higher-order relationships, and proposes an image-text retrieval method based on hypergraph clustering. Specifically, we construct hypergraphs to capture feature relationships within image and text modalities, as well as between image and text. This allows us to effectively model complex relationships between features of different modalities and explore the semantic connectivity within and across modalities. To compensate for potential semantic feature loss during the construction of the hypergraph neural network, we design a weight-adaptive coarse and fine-grained feature fusion module for semantic supplementation. Comprehensive experimental results on three common datasets demonstrate the effectiveness of the proposed method.

由于不同模态之间固有的异质性，大多数现有的跨模态检索方法在建立不同模态之间的语义联系方面面临挑战。为了建立不同模态之间的语义联系，并对跨模态的相关语义特征进行排列，从而充分捕捉同一模态内的重要信息，本文考虑到超图在表示高阶关系方面的优越性，提出了一种基于超图聚类的图像-文本检索方法。具体来说，我们构建超图来捕捉图像和文本模式内以及图像和文本之间的特征关系。这使我们能够有效地模拟不同模态特征之间的复杂关系，并探索模态内部和模态之间的语义连接。为了弥补超图神经网络构建过程中可能出现的语义特征损失，我们设计了一个权重自适应的粗粒度和细粒度特征融合模块，用于语义补充。在三个常见数据集上的综合实验结果证明了所提方法的有效性。

{"title":"Hypergraph clustering based multi-label cross-modal retrieval","authors":"Shengtang Guo , Huaxiang Zhang , Li Liu , Dongmei Liu , Xu Lu , Liujian Li","doi":"10.1016/j.jvcir.2024.104258","DOIUrl":"10.1016/j.jvcir.2024.104258","url":null,"abstract":"<div><p>Most existing cross-modal retrieval methods face challenges in establishing semantic connections between different modalities due to inherent heterogeneity among them. To establish semantic connections between different modalities and align relevant semantic features across modalities, so as to fully capture important information within the same modality, this paper considers the superiority of hypergraph in representing higher-order relationships, and proposes an image-text retrieval method based on hypergraph clustering. Specifically, we construct hypergraphs to capture feature relationships within image and text modalities, as well as between image and text. This allows us to effectively model complex relationships between features of different modalities and explore the semantic connectivity within and across modalities. To compensate for potential semantic feature loss during the construction of the hypergraph neural network, we design a weight-adaptive coarse and fine-grained feature fusion module for semantic supplementation. Comprehensive experimental results on three common datasets demonstrate the effectiveness of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104258"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-local feature aggregation quaternion network for single image deraining 用于单幅图像派生的非本地特征聚合四元数网络

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104250

Gonghe Xiong , Shan Gai , Bofan Nie , Feilong Chen , Chengli Sun

The existing deraining methods are based on convolutional neural networks (CNN) learning the mapping relationship between rainy and clean images. However, the real-valued CNN processes the color images as three independent channels separately, which fails to fully leverage color information. Additionally, sliding-window-based neural networks cannot effectively model the non-local characteristics of an image. In this work, we proposed a non-local feature aggregation quaternion network (NLAQNet), which is composed of two concurrent sub-networks: the Quaternion Local Detail Repair Network (QLDRNet) and the Multi-Level Feature Aggregation Network (MLFANet). Furthermore, in the subnetwork of QLDRNet, the Local Detail Repair Block (LDRB) is proposed to repair the backdrop of an image that has not been damaged by rain streaks. Finally, within the MLFANet subnetwork, we have introduced two specialized blocks, namely the Non-Local Feature Aggregation Block (NLAB) and the Feature Aggregation Block (Mix), specifically designed to address the restoration of rain-streak-damaged image backgrounds. Extensive experiments demonstrate that the proposed network delivers strong performance in both qualitative and quantitative evaluations on existing datasets. The code is available at https://github.com/xionggonghe/NLAQNet.

现有的去污方法都是基于卷积神经网络（CNN）学习雨天图像和干净图像之间的映射关系。然而，实值神经网络将彩色图像作为三个独立通道分别处理，无法充分利用色彩信息。此外，基于滑动窗口的神经网络不能有效地模拟图像的非局部特征。在这项工作中，我们提出了一种非局部特征聚合四元数网络（NLAQNet），它由两个并发的子网络组成：四元数局部细节修复网络（QLDRNet）和多级特征聚合网络（MLFANet）。此外，在 QLDRNet 子网络中，还提出了局部细节修复块（LDRB），用于修复未被雨条纹破坏的图像背景。最后，在 MLFANet 子网络中，我们引入了两个专门的区块，即非局部特征聚合区块（NLAB）和特征聚合区块（Mix），专门用于修复受雨滴条纹破坏的图像背景。广泛的实验证明，在现有数据集的定性和定量评估中，所提出的网络都具有很强的性能。代码可在 https://github.com/xionggonghe/NLAQNet 上获取。

{"title":"Non-local feature aggregation quaternion network for single image deraining","authors":"Gonghe Xiong , Shan Gai , Bofan Nie , Feilong Chen , Chengli Sun","doi":"10.1016/j.jvcir.2024.104250","DOIUrl":"10.1016/j.jvcir.2024.104250","url":null,"abstract":"<div><p>The existing deraining methods are based on convolutional neural networks (CNN) learning the mapping relationship between rainy and clean images. However, the real-valued CNN processes the color images as three independent channels separately, which fails to fully leverage color information. Additionally, sliding-window-based neural networks cannot effectively model the non-local characteristics of an image. In this work, we proposed a non-local feature aggregation quaternion network (NLAQNet), which is composed of two concurrent sub-networks: the Quaternion Local Detail Repair Network (QLDRNet) and the Multi-Level Feature Aggregation Network (MLFANet). Furthermore, in the subnetwork of QLDRNet, the Local Detail Repair Block (LDRB) is proposed to repair the backdrop of an image that has not been damaged by rain streaks. Finally, within the MLFANet subnetwork, we have introduced two specialized blocks, namely the Non-Local Feature Aggregation Block (NLAB) and the Feature Aggregation Block (Mix), specifically designed to address the restoration of rain-streak-damaged image backgrounds. Extensive experiments demonstrate that the proposed network delivers strong performance in both qualitative and quantitative evaluations on existing datasets. The code is available at <span><span>https://github.com/xionggonghe/NLAQNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104250"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Facial feature point detection under large range of face deformations 大范围人脸变形下的人脸特征点检测

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104264

Nora Algaraawi , Tim Morris , Timothy F. Cootes

Facial Feature Point Detection (FFPD) plays a significant role in several face analysis tasks such as feature extraction and classification. This paper presents a Fully Automatic FFPD system using the application of Random Forest Regression Voting in a Constrained Local Model (RFRV-CLM) framework. A global detector is used to find the approximate positions of the facial region and eye centers. A sequence of local RFRV-CLMs are used to locate a detailed set of points around the facial features. Both global and local models use Random Forest Regression to vote for optimal positions. The system is evaluated in the task of facial expression localization using five different facial expression databases of different characteristics including age, intensity, 6-basic expressions, 22 compound expressions, static and dynamic images, and deliberate and spontaneous expressions. Quantitative results of the evaluation of automatic point localization against manual points (ground truth) demonstrated that the results of the proposed approach are encouraging and outperform the results of alternative techniques tested on the same databases.

面部特征点检测（FFPD）在特征提取和分类等多项人脸分析任务中发挥着重要作用。本文介绍了一种全自动人脸特征点检测系统，该系统采用了受限局部模型中的随机森林回归投票（RFRV-CLM）框架。全局检测器用于找到面部区域和眼睛中心的大致位置。一系列局部 RFRV-CLM 用于定位面部特征周围的详细点集。全局和局部模型都使用随机森林回归来投票选出最佳位置。在面部表情定位任务中，使用五个不同的面部表情数据库对该系统进行了评估，这些数据库具有不同的特征，包括年龄、强度、6 种基本表情、22 种复合表情、静态和动态图像、刻意和自发表情。对照人工点（地面实况）对自动点定位的定量评估结果表明，所提出方法的结果令人鼓舞，优于在相同数据库上测试的其他技术的结果。

{"title":"Facial feature point detection under large range of face deformations","authors":"Nora Algaraawi , Tim Morris , Timothy F. Cootes","doi":"10.1016/j.jvcir.2024.104264","DOIUrl":"10.1016/j.jvcir.2024.104264","url":null,"abstract":"<div><p>Facial Feature Point Detection (FFPD) plays a significant role in several face analysis tasks such as feature extraction and classification. This paper presents a Fully Automatic FFPD system using the application of Random Forest Regression Voting in a Constrained Local Model (RFRV-CLM) framework. A global detector is used to find the approximate positions of the facial region and eye centers. A sequence of local RFRV-CLMs are used to locate a detailed set of points around the facial features. Both global and local models use Random Forest Regression to vote for optimal positions. The system is evaluated in the task of facial expression localization using five different facial expression databases of different characteristics including age, intensity, 6-basic expressions, 22 compound expressions, static and dynamic images, and deliberate and spontaneous expressions. Quantitative results of the evaluation of automatic point localization against manual points (ground truth) demonstrated that the results of the proposed approach are encouraging and outperform the results of alternative techniques tested on the same databases.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104264"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EM-Gait: Gait recognition using motion excitation and feature embedding self-attention EM-Gait：利用运动激励和特征嵌入自我关注进行步态识别

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104266

Zhengyou Wang , Chengyu Du , Yunpeng Zhang , Jing Bai , Shanna Zhuang

Gait recognition, which can realize long-distance and contactless identification, is an important biometric technology. Recent gait recognition methods focus on learning the pattern of human movement or appearance during walking, and construct the corresponding spatio-temporal representations. However, different individuals have their own laws of movement patterns, simple spatial–temporal features are difficult to describe changes in motion of human parts, especially when confounding variables such as clothing and carrying are included, thus distinguishability of features is reduced. To this end, we propose the Embedding and Motion (EM) block and Fine Feature Extractor (FFE) to capture the motion mode of walking and enhance the difference of local motion rules. The EM block consists of a Motion Excitation (ME) module to capture the changes of temporal motion and an Embedding Self-attention (ES) module to enhance the expression of motion rules. Specifically, without introducing additional parameters, ME module learns the difference information between frames and intervals to obtain the dynamic change representation of walking for frame sequences with uncertain length. By contrast, ES module divides the feature map hierarchically based on element values, blurring the difference of elements to highlight the motion track. Furthermore, we present the FFE, which independently learns the spatio-temporal representations of human body according to different horizontal parts of individuals. Benefiting from EM block and our proposed motion branch, our method innovatively combines motion change information, significantly improving the performance of the model under cross appearance conditions. On the popular dataset CASIA-B, our proposed EM-Gait is better than the existing single-modal gait recognition methods.

步态识别可以实现远距离和非接触式身份识别，是一项重要的生物识别技术。近年来的步态识别方法主要是学习人在行走过程中的运动或外观模式，并构建相应的时空表征。然而，不同个体的运动模式有其自身的规律，简单的时空特征难以描述人体各部位的运动变化，特别是当包括服装、携带等混杂变量时，特征的可区分性就会降低。为此，我们提出了嵌入与运动（EM）模块和精细特征提取器（FFE）来捕捉行走的运动模式，并增强局部运动规则的差异性。EM 模块包括用于捕捉时间运动变化的运动激励（ME）模块和用于增强运动规则表达的嵌入自注意（ES）模块。具体来说，在不引入额外参数的情况下，ME 模块学习帧与帧之间的差异信息，以获得长度不确定的帧序列的步行动态变化表示。相比之下，ES 模块根据元素值对特征图进行分层，模糊元素之间的差异，从而突出运动轨迹。此外，我们还提出了 FFE，它能根据个体的不同水平部位独立学习人体的时空表征。得益于电磁块和我们提出的运动分支，我们的方法创新性地结合了运动变化信息，显著提高了交叉外观条件下模型的性能。在流行的数据集 CASIA-B 上，我们提出的 EM-Gait 优于现有的单模态步态识别方法。

{"title":"EM-Gait: Gait recognition using motion excitation and feature embedding self-attention","authors":"Zhengyou Wang , Chengyu Du , Yunpeng Zhang , Jing Bai , Shanna Zhuang","doi":"10.1016/j.jvcir.2024.104266","DOIUrl":"10.1016/j.jvcir.2024.104266","url":null,"abstract":"<div><p>Gait recognition, which can realize long-distance and contactless identification, is an important biometric technology. Recent gait recognition methods focus on learning the pattern of human movement or appearance during walking, and construct the corresponding spatio-temporal representations. However, different individuals have their own laws of movement patterns, simple spatial–temporal features are difficult to describe changes in motion of human parts, especially when confounding variables such as clothing and carrying are included, thus distinguishability of features is reduced. To this end, we propose the Embedding and Motion (EM) block and Fine Feature Extractor (FFE) to capture the motion mode of walking and enhance the difference of local motion rules. The EM block consists of a Motion Excitation (ME) module to capture the changes of temporal motion and an Embedding Self-attention (ES) module to enhance the expression of motion rules. Specifically, without introducing additional parameters, ME module learns the difference information between frames and intervals to obtain the dynamic change representation of walking for frame sequences with uncertain length. By contrast, ES module divides the feature map hierarchically based on element values, blurring the difference of elements to highlight the motion track. Furthermore, we present the FFE, which independently learns the spatio-temporal representations of human body according to different horizontal parts of individuals. Benefiting from EM block and our proposed motion branch, our method innovatively combines motion change information, significantly improving the performance of the model under cross appearance conditions. On the popular dataset CASIA-B, our proposed EM-Gait is better than the existing single-modal gait recognition methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104266"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142075777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DDR: A network of image deraining systems for dark environments DDR：黑暗环境下的图像衍生系统网络

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104244

Zhongning Ding , Yun Zhu , Shaoshan Niu , Jianyu Wang , Yan Su

In the domain of computer vision, addressing the degradation of image quality under adverse weather conditions remains a significant challenge. To tackle the challenges of image enhancement and deraining in dark settings, we have integrated image enhancement and deraining technologies to develop the DDR (Dark Environment Deraining Network) system. This specialized network is designed to enhance and clarify images in low-light conditions compromised by raindrops. DDR employs a strategic divide-and-conquer approach and an apt network selection to discern patterns of raindrops and background elements within images. It is capable of mitigating noise and blurring induced by raindrops in dark settings, thus enhancing the visual fidelity of images. Through testing on real-world imagery and the Rain LOL dataset, this innovative network offers a robust solution for deraining tasks in dark conditions, inspiring advancements in the performance of computer vision systems under challenging weather scenarios. The research of DDR provides technical and theoretical support for improving image quality in dark environment.

在计算机视觉领域，解决恶劣天气条件下图像质量下降的问题仍然是一项重大挑战。为了应对黑暗环境下图像增强和去污的挑战，我们整合了图像增强和去污技术，开发出了 DDR（黑暗环境去污网络）系统。这种专用网络旨在增强和澄清受雨滴影响的低照度条件下的图像。DDR 采用战略性的分而治之方法和适当的网络选择来辨别图像中的雨滴和背景元素模式。它能够在黑暗环境中减少雨滴引起的噪音和模糊，从而提高图像的视觉保真度。通过对真实世界图像和《Rain LOL》数据集的测试，这一创新网络为黑暗条件下的派生任务提供了强大的解决方案，从而推动了计算机视觉系统在具有挑战性的天气情况下的性能进步。DDR 的研究为提高黑暗环境中的图像质量提供了技术和理论支持。

{"title":"DDR: A network of image deraining systems for dark environments","authors":"Zhongning Ding , Yun Zhu , Shaoshan Niu , Jianyu Wang , Yan Su","doi":"10.1016/j.jvcir.2024.104244","DOIUrl":"10.1016/j.jvcir.2024.104244","url":null,"abstract":"<div><p>In the domain of computer vision, addressing the degradation of image quality under adverse weather conditions remains a significant challenge. To tackle the challenges of image enhancement and deraining in dark settings, we have integrated image enhancement and deraining technologies to develop the DDR (Dark Environment Deraining Network) system. This specialized network is designed to enhance and clarify images in low-light conditions compromised by raindrops. DDR employs a strategic divide-and-conquer approach and an apt network selection to discern patterns of raindrops and background elements within images. It is capable of mitigating noise and blurring induced by raindrops in dark settings, thus enhancing the visual fidelity of images. Through testing on real-world imagery and the Rain LOL dataset, this innovative network offers a robust solution for deraining tasks in dark conditions, inspiring advancements in the performance of computer vision systems under challenging weather scenarios. The research of DDR provides technical and theoretical support for improving image quality in dark environment.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104244"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-capacity multi-MSB predictive reversible data hiding in encrypted domain for triangular mesh models 三角形网格模型加密域中的大容量多 MSB 预测可逆数据隐藏

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104246

Guoyou Zhang , Xiaoxue Cheng , Fan Yang , Anhong Wang , Xuenan Zhang , Li Liu

Reversible data hiding in encrypted domain (RDH-ED) is widely used in sensitive fields such as privacy protection and copyright authentication. However, the embedding capacity of existing methods is generally low due to the insufficient use of model topology. In order to improve the embedding capacity, this paper proposes a high-capacity multi-MSB predictive reversible data hiding in encrypted domain (MMPRDH-ED). Firstly, the 3D model is subdivided by triangular mesh subdivision (TMS) algorithm, and its vertices are divided into reference set and embedded set. Then, in order to make full use of the redundant space of embedded vertices, Multi-MSB prediction (MMP) and Multi-layer Embedding Strategy (MLES) are used to improve the capacity. Finally, stream encryption technology is used to encrypt the model and data to ensure data security. The experimental results show that compared with the existing methods, the embedding capacity of MMPRDH-ED is increased by 53 %, which has higher advantages.

加密域中的可逆数据隐藏（RDH-ED）被广泛应用于隐私保护和版权认证等敏感领域。然而，由于没有充分利用模型拓扑，现有方法的嵌入能力普遍较低。为了提高嵌入能力，本文提出了一种高容量的加密域多MSB预测可逆数据隐藏（MMPRDH-ED）。首先，利用三角形网格细分算法（TMS）对三维模型进行细分，将其顶点分为参考集和嵌入集。然后，为了充分利用嵌入顶点的冗余空间，采用了多多字节预测（MMP）和多层嵌入策略（MLES）来提高容量。最后，采用流加密技术对模型和数据进行加密，确保数据安全。实验结果表明，与现有方法相比，MMPRDH-ED 的嵌入容量提高了 53%，具有更高的优势。

引用次数: 0

Versatile depth estimator based on common relative depth estimation and camera-specific relative-to-metric depth conversion 基于普通相对深度估算和相机特定的相对深度到公制深度转换的多功能深度估算器

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104252

Jinyoung Jun , Jae-Han Lee , Chang-Su Kim

A typical monocular depth estimator is trained for a single camera, so its performance drops severely on images taken with different cameras. To address this issue, we propose a versatile depth estimator (VDE), composed of a common relative depth estimator (CRDE) and multiple relative-to-metric converters (R2MCs). The CRDE extracts relative depth information, and each R2MC converts the relative information to predict metric depths for a specific camera. The proposed VDE can cope with diverse scenes, including both indoor and outdoor scenes, with only a 1.12% parameter increase per camera. Experimental results demonstrate that VDE supports multiple cameras effectively and efficiently and also achieves state-of-the-art performance in the conventional single-camera scenario.

典型的单目深度估计器只针对单台相机进行训练，因此在使用不同相机拍摄的图像上，其性能会严重下降。为了解决这个问题，我们提出了一种通用深度估计器（VDE），它由一个通用相对深度估计器（CRDE）和多个相对到度量转换器（R2MC）组成。CRDE 提取相对深度信息，每个 R2MC 将相对信息转换为预测特定相机的度量深度。所提出的 VDE 可应对包括室内和室外场景在内的各种不同场景，而每台摄像机的参数只需增加 1.12%。实验结果表明，VDE 能高效地支持多摄像头，在传统的单摄像头场景中也能达到最先进的性能。

引用次数: 0

Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments 融合运动结构和光流模拟增强姿态回归，应对室内环境挑战

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104256

Felix Ott , Lucas Heublein , David Rügamer , Bernd Bischl , Christopher Mutschler

The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.

物体定位在机器人、虚拟现实和增强现实以及仓储物流等许多应用中都至关重要。深度学习领域的最新进展使得使用单目摄像头进行定位成为可能。传统上，运动结构（SfM）技术根据点云预测物体的绝对位置，而绝对姿态回归（APR）方法则利用神经网络从语义上理解环境。然而，这两种方法都面临着环境因素的挑战，如运动模糊、光照变化、重复模式和无特征区域。本研究采用相对姿态回归 (RPR) 方法，通过整合额外信息和完善绝对姿态估计值来应对这些挑战。相对姿态回归法也很难解决运动模糊等问题。为了克服这一问题，我们使用 Lucas-Kanade 算法计算连续图像之间的光流，并使用小型递归卷积网络预测相对姿势。由于全局坐标系和局部坐标系之间存在差异，因此很难将绝对姿势和相对姿势结合起来。目前的方法使用姿势图优化（PGO）来对齐这些姿势。在这项工作中，我们提出了递归融合网络，以更好地整合绝对姿势和相对姿势预测，提高绝对姿势估计的准确性。我们评估了八种不同的递归单元，并创建了一个模拟环境，对 APR 和 RPR 网络进行预训练，以提高泛化能力。此外，我们还记录了一个具有挑战性的室内环境（类似于有运输机器人的仓库）中各种场景的大型数据集。通过超参数搜索和实验，我们证明了我们的循环融合方法在有效性上优于 PGO。

{"title":"Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments","authors":"Felix Ott , Lucas Heublein , David Rügamer , Bernd Bischl , Christopher Mutschler","doi":"10.1016/j.jvcir.2024.104256","DOIUrl":"10.1016/j.jvcir.2024.104256","url":null,"abstract":"<div><p>The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104256"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324002128/pdfft?md5=f88e7c25e01d5af99626350e7efd4744&pid=1-s2.0-S1047320324002128-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to “Dual-stream mutually adaptive quality assessment for authentic distortion image” [J. Vis. Commun. Image Represent. 102 (2024) 104216] 对 "真实失真图像的双流互适质量评估 "的更正 [J. Vis. Commun. Image Represent. 102 (2024) 104216]

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation

Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104236

Jia Huizhen, Zhou Huaibo, Qin Hongzheng, Wang Tonghan

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Visual Communication and Image Representation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀