首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification CAgMLP:类似mlp的体系结构,具有用于图像分类的交叉轴门控令牌混频器
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-25 DOI: 10.1016/j.jvcir.2025.104590
Jielin Jiang , Quan Zhang , Yan Cui , Shun Wei , Yingnan Zhao
Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.
最近基于mlp的模型采用轴向投影将整个空间正交分解为水平和垂直方向,有效地平衡了远程依赖关系和计算成本。然而,这些方法沿着两个轴独立操作,阻碍了它们捕捉图像全局空间结构的能力。在本文中,我们提出了一种新的MLP架构,称为交叉轴门控MLP (CAgMLP),它由两个主要模块组成,交叉轴门控令牌混合MLP (CGTM)和卷积门控信道混合MLP (CGCM)。CGTM通过利用乘法门控机制解决了单维交互中的信息丢失问题,该机制促进了沿两个空间轴捕获的特征的交叉融合,增强了特征选择和信息流。CGCM通过将融合的低维输入投影到两个高维特征空间中,并通过元素乘法引入非线性特征,改进了乘法门控单元的双分支结构,进一步提高了模型的表达能力。最后,两个模块都结合了本地令牌聚合,以弥补传统MLP模型中缺乏本地归纳偏差。在多个数据集上进行的实验表明,与其他最先进的方法相比,CAgMLP具有更好的分类性能,同时具有更少的参数和更低的计算复杂度。
{"title":"CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification","authors":"Jielin Jiang ,&nbsp;Quan Zhang ,&nbsp;Yan Cui ,&nbsp;Shun Wei ,&nbsp;Yingnan Zhao","doi":"10.1016/j.jvcir.2025.104590","DOIUrl":"10.1016/j.jvcir.2025.104590","url":null,"abstract":"<div><div>Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104590"},"PeriodicalIF":3.1,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image forgery localization with sparse reward compensation using curiosity-driven deep reinforcement learning 基于好奇心驱动的深度强化学习的稀疏奖励补偿图像伪造定位
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-25 DOI: 10.1016/j.jvcir.2025.104587
Yan Cheng , Xiong Li , Xin Zhang , Chaohong Yang
Advanced editing and deepfakes make image tampering harder to detect, threatening image security, credibility, and personal privacy. To address this challenging issue, we propose a novel end-to-end image forgery localization method, based on the curiosity-driven deep reinforcement learning method with intrinsic reward. The proposed method provides reliable localization results for forged regions in images of various types of forgery. This study designs a new Focal-based reward function that is suitable for scenarios with highly imbalanced numbers of forged and real pixels. Furthermore, considering the issue of sparse rewards caused by sparse forgery regions in real-world forgery scenarios, we introduce a surprise-based intrinsic reward generation module, which guides the agent to explore and learn the optimal strategy. Extensive experiments conducted on multiple benchmark datasets show that the proposed method outperforms other methods in pixel-level forgery localization. Additionally, the proposed method demonstrates stable robustness to image degradation caused by different post-processing attacks.
高级编辑和深度伪造使图像篡改更难检测,威胁到图像的安全性、可信度和个人隐私。为了解决这一具有挑战性的问题,我们提出了一种新颖的端到端图像伪造定位方法,该方法基于好奇心驱动的带有内在奖励的深度强化学习方法。该方法对不同类型的伪造图像中的伪造区域提供了可靠的定位结果。本研究设计了一个新的基于焦点的奖励函数,适用于伪造和真实像素高度不平衡的场景。此外,考虑到真实伪造场景中由于伪造区域稀疏而导致的奖励稀疏问题,我们引入了基于惊喜的内在奖励生成模块,引导智能体探索和学习最优策略。在多个基准数据集上进行的大量实验表明,该方法在像素级伪造定位方面优于其他方法。此外,该方法对各种后处理攻击引起的图像退化具有稳定的鲁棒性。
{"title":"Image forgery localization with sparse reward compensation using curiosity-driven deep reinforcement learning","authors":"Yan Cheng ,&nbsp;Xiong Li ,&nbsp;Xin Zhang ,&nbsp;Chaohong Yang","doi":"10.1016/j.jvcir.2025.104587","DOIUrl":"10.1016/j.jvcir.2025.104587","url":null,"abstract":"<div><div>Advanced editing and deepfakes make image tampering harder to detect, threatening image security, credibility, and personal privacy. To address this challenging issue, we propose a novel end-to-end image forgery localization method, based on the curiosity-driven deep reinforcement learning method with intrinsic reward. The proposed method provides reliable localization results for forged regions in images of various types of forgery. This study designs a new Focal-based reward function that is suitable for scenarios with highly imbalanced numbers of forged and real pixels. Furthermore, considering the issue of sparse rewards caused by sparse forgery regions in real-world forgery scenarios, we introduce a surprise-based intrinsic reward generation module, which guides the agent to explore and learn the optimal strategy. Extensive experiments conducted on multiple benchmark datasets show that the proposed method outperforms other methods in pixel-level forgery localization. Additionally, the proposed method demonstrates stable robustness to image degradation caused by different post-processing attacks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104587"},"PeriodicalIF":3.1,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure preserving point cloud completion and classification with coarse-to-fine information 保持结构的点云补全与粗到精信息分类
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-25 DOI: 10.1016/j.jvcir.2025.104591
Seema Kumari , Srimanta Mandal , Shanmuganathan Raman
Point clouds are the predominant data structure for representing 3D shapes. However, captured point clouds are often partial due to practical constraints, necessitating point cloud completion. In this paper, we propose a novel deep network architecture that preserves the structure of available points while incorporating coarse-to-fine information to generate dense and consistent point clouds. Our network comprises three sub-networks: Coarse-to-Fine, Structure, and Tail. The Coarse-to-Fine sub-net extracts multi-scale features, while the Structure sub-net utilizes a stacked auto-encoder with weighted skip connections to preserve structural information. The fused features are then processed by the Tail sub-net to produce a dense point cloud. Additionally, we demonstrate the effectiveness of our structure-preserving approach in point cloud classification by proposing a classification architecture based on the Structure sub-net. Experimental results show that our method outperforms existing approaches in both tasks, highlighting the importance of preserving structural information and incorporating coarse-to-fine details.
点云是表示三维形状的主要数据结构。然而,由于实际条件的限制,捕获的点云往往是局部的,需要点云补全。在本文中,我们提出了一种新的深度网络架构,该架构保留了可用点的结构,同时结合了粗到细的信息来生成密集和一致的点云。我们的网络包括三个子网络:粗到精、结构和尾。粗到精子网提取多尺度特征,而结构子网利用加权跳跃连接的堆叠自编码器来保留结构信息。融合后的特征由尾子网络进行处理,生成密集的点云。此外,我们提出了一种基于结构子网络的分类体系结构,证明了我们的结构保持方法在点云分类中的有效性。实验结果表明,我们的方法在这两个任务中都优于现有的方法,突出了保留结构信息和结合粗到细细节的重要性。
{"title":"Structure preserving point cloud completion and classification with coarse-to-fine information","authors":"Seema Kumari ,&nbsp;Srimanta Mandal ,&nbsp;Shanmuganathan Raman","doi":"10.1016/j.jvcir.2025.104591","DOIUrl":"10.1016/j.jvcir.2025.104591","url":null,"abstract":"<div><div>Point clouds are the predominant data structure for representing 3D shapes. However, captured point clouds are often partial due to practical constraints, necessitating point cloud completion. In this paper, we propose a novel deep network architecture that preserves the structure of available points while incorporating coarse-to-fine information to generate dense and consistent point clouds. Our network comprises three sub-networks: Coarse-to-Fine, Structure, and Tail. The Coarse-to-Fine sub-net extracts multi-scale features, while the Structure sub-net utilizes a stacked auto-encoder with weighted skip connections to preserve structural information. The fused features are then processed by the Tail sub-net to produce a dense point cloud. Additionally, we demonstrate the effectiveness of our structure-preserving approach in point cloud classification by proposing a classification architecture based on the Structure sub-net. Experimental results show that our method outperforms existing approaches in both tasks, highlighting the importance of preserving structural information and incorporating coarse-to-fine details.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104591"},"PeriodicalIF":3.1,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F-MDM: Rethinking image denoising with a feature map-based Poisson–Gaussian Mixture Diffusion Model F-MDM:基于特征映射的泊松-高斯混合扩散模型图像去噪的再思考
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-25 DOI: 10.1016/j.jvcir.2025.104593
Bin Wang, Jiajia Hu, Fengyuan Zuo, Junfei Shi, Haiyan Jin
In image-denoising tasks, the diffusion model has shown great potential. Usually, the diffusion model uses a real scene’s noise-free and clean image dataset as the starting point for diffusion. When the denoising network trained on this dataset is applied to image denoising in other scenes, the generalization of the denoising network will decrease due to changes in scene priors. In order to improve generalization, we hope to find a clean image dataset that not only has rich scene priors but also has a certain scene independence. The VGG-16 network is a network trained from a large number of images. After the real scene images are processed through the VGG-16 convolution layer, the shallow feature maps obtained have scene priors and break free from the scene dependency caused by minor details. This paper uses the shallow feature maps of VGG-16 as a clean image dataset for the diffusion model, and the results of denoising experiments are surprising. Furthermore, considering that the noise of the image mainly includes Gaussian noise and Poisson noise, the classical diffusion model uses Gaussian noise for diffusion to improve the interpretability of the model. We introduce a novel Poisson–Gaussian noise mixture for the diffusion process, and the theoretical derivation is given. Finally, we propose a Poisson–Gaussian Denoising Mixture Diffusion Model based on Feature maps (F-MDM). Experiments demonstrate that our method exhibits excellent generalization ability compared to some other advanced algorithms.
在图像去噪任务中,扩散模型显示出很大的潜力。通常,扩散模型使用真实场景的无噪声和干净的图像数据集作为扩散的起点。当将在该数据集上训练的去噪网络应用于其他场景的图像去噪时,由于场景先验的变化,去噪网络的泛化能力会降低。为了提高泛化,我们希望找到一个干净的图像数据集,它既具有丰富的场景先验,又具有一定的场景独立性。VGG-16网络是由大量图像训练而成的网络。真实场景图像经过VGG-16卷积层处理后,得到的浅层特征图具有场景先验,摆脱了小细节对场景的依赖。本文采用VGG-16的浅层特征图作为扩散模型的干净图像数据集,去噪实验结果令人惊讶。此外,考虑到图像的噪声主要包括高斯噪声和泊松噪声,经典扩散模型采用高斯噪声进行扩散,提高了模型的可解释性。在扩散过程中引入了一种新的泊松-高斯混合噪声,并给出了理论推导。最后,我们提出了一种基于特征映射(F-MDM)的泊松-高斯去噪混合扩散模型。实验表明,与其他先进的算法相比,我们的方法具有出色的泛化能力。
{"title":"F-MDM: Rethinking image denoising with a feature map-based Poisson–Gaussian Mixture Diffusion Model","authors":"Bin Wang,&nbsp;Jiajia Hu,&nbsp;Fengyuan Zuo,&nbsp;Junfei Shi,&nbsp;Haiyan Jin","doi":"10.1016/j.jvcir.2025.104593","DOIUrl":"10.1016/j.jvcir.2025.104593","url":null,"abstract":"<div><div>In image-denoising tasks, the diffusion model has shown great potential. Usually, the diffusion model uses a real scene’s noise-free and clean image dataset as the starting point for diffusion. When the denoising network trained on this dataset is applied to image denoising in other scenes, the generalization of the denoising network will decrease due to changes in scene priors. In order to improve generalization, we hope to find a clean image dataset that not only has rich scene priors but also has a certain scene independence. The VGG-16 network is a network trained from a large number of images. After the real scene images are processed through the VGG-16 convolution layer, the shallow feature maps obtained have scene priors and break free from the scene dependency caused by minor details. This paper uses the shallow feature maps of VGG-16 as a clean image dataset for the diffusion model, and the results of denoising experiments are surprising. Furthermore, considering that the noise of the image mainly includes Gaussian noise and Poisson noise, the classical diffusion model uses Gaussian noise for diffusion to improve the interpretability of the model. We introduce a novel Poisson–Gaussian noise mixture for the diffusion process, and the theoretical derivation is given. Finally, we propose a Poisson–Gaussian Denoising <strong>M</strong>ixture <strong>D</strong>iffusion <strong>M</strong>odel based on <strong>F</strong>eature maps (<strong>F-MDM</strong>). Experiments demonstrate that our method exhibits excellent generalization ability compared to some other advanced algorithms.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104593"},"PeriodicalIF":3.1,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual anomaly detection algorithms: Development and Frontier review 视觉异常检测算法:发展与前沿回顾
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-22 DOI: 10.1016/j.jvcir.2025.104585
Jia Huang, Wei Quan, Xiwen Li
Visual anomaly detection includes image anomaly detection and video anomaly detection, focusing on identifying and locating anomalous patterns or events in images or videos. This technology finds widespread applications across multiple domains, including industrial surface defect inspection, medical image lesion analysis, and security surveillance systems. By identifying patterns that do not conform to normal conditions, it helps to detect anomalies in a timely manner and reduce risks and losses. This paper provides a comprehensive review of existing visual anomaly detection algorithms. It introduces a taxonomy of algorithms from a new perspective: statistical-based algorithms, measurement-based algorithms, generative-based algorithms, and representation-based algorithms. Furthermore, this paper systematically introduces datasets for visual anomaly detection and compares the performance of various algorithms on different datasets under typical evaluation metrics. By analyzing existing algorithms, we identify current challenges and suggest promising future research directions.
视觉异常检测包括图像异常检测和视频异常检测,重点是识别和定位图像或视频中的异常模式或事件。该技术广泛应用于多个领域,包括工业表面缺陷检测、医学图像损伤分析和安全监控系统。通过识别不符合正常情况的模式,可以及时发现异常,降低风险和损失。本文对现有的视觉异常检测算法进行了综述。它从一个新的角度介绍了算法的分类:基于统计的算法、基于测量的算法、基于生成的算法和基于表示的算法。此外,本文系统地介绍了用于视觉异常检测的数据集,并比较了在典型评价指标下各种算法在不同数据集上的性能。通过分析现有算法,我们确定了当前的挑战,并提出了有希望的未来研究方向。
{"title":"Visual anomaly detection algorithms: Development and Frontier review","authors":"Jia Huang,&nbsp;Wei Quan,&nbsp;Xiwen Li","doi":"10.1016/j.jvcir.2025.104585","DOIUrl":"10.1016/j.jvcir.2025.104585","url":null,"abstract":"<div><div>Visual anomaly detection includes image anomaly detection and video anomaly detection, focusing on identifying and locating anomalous patterns or events in images or videos. This technology finds widespread applications across multiple domains, including industrial surface defect inspection, medical image lesion analysis, and security surveillance systems. By identifying patterns that do not conform to normal conditions, it helps to detect anomalies in a timely manner and reduce risks and losses. This paper provides a comprehensive review of existing visual anomaly detection algorithms. It introduces a taxonomy of algorithms from a new perspective: statistical-based algorithms, measurement-based algorithms, generative-based algorithms, and representation-based algorithms. Furthermore, this paper systematically introduces datasets for visual anomaly detection and compares the performance of various algorithms on different datasets under typical evaluation metrics. By analyzing existing algorithms, we identify current challenges and suggest promising future research directions.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104585"},"PeriodicalIF":3.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-image Token Relation Learning for weakly supervised semantic segmentation 弱监督语义分割的图像间Token关系学习
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-22 DOI: 10.1016/j.jvcir.2025.104576
Jingfeng Tang, Keyang Cheng, Liutao Wei, Yongzhao Zhan
In recent years, Vision Transformer-based methods have emerged as promising approaches for localizing semantic objects in weakly supervised semantic segmentation tasks. However, existing methods primarily rely on the attention mechanism to establish relations between classes and image patches, often neglecting the intrinsic interrelations among tokens within datasets. To address this gap, we propose the Inter-image Token Relation Learning (ITRL) framework, which advances weakly supervised semantic segmentation by inter-image consistency. Specifically, the Inter-image Class Token Contrast method is introduced to generate comprehensive class representations by contrasting class tokens in a memory bank manner. Additionally, the Inter-image Patch Token Align approach is presented, which enhances the normalized mutual information among patch tokens, thereby strengthening their interdependencies. Extensive experiments validated the proposed framework, showcasing competitive mean Intersection over Union scores on the PASCAL VOC 2012 and MS COCO 2014 datasets.
近年来,基于视觉变换的方法已成为弱监督语义分割任务中定位语义对象的一种很有前途的方法。然而,现有的方法主要依靠注意力机制来建立类与图像patch之间的关系,往往忽略了数据集中token之间的内在相互关系。为了解决这一差距,我们提出了图像间令牌关系学习(ITRL)框架,该框架通过图像间一致性推进弱监督语义分割。具体而言,引入了图像间类令牌对比方法,通过在内存库中对比类令牌来生成全面的类表示。此外,提出了图像间补丁令牌对齐方法,增强了补丁令牌之间的规范化互信息,从而增强了它们之间的相互依赖性。大量的实验验证了所提出的框架,展示了PASCAL VOC 2012和MS COCO 2014数据集上Union分数的竞争平均交叉点。
{"title":"Inter-image Token Relation Learning for weakly supervised semantic segmentation","authors":"Jingfeng Tang,&nbsp;Keyang Cheng,&nbsp;Liutao Wei,&nbsp;Yongzhao Zhan","doi":"10.1016/j.jvcir.2025.104576","DOIUrl":"10.1016/j.jvcir.2025.104576","url":null,"abstract":"<div><div>In recent years, Vision Transformer-based methods have emerged as promising approaches for localizing semantic objects in weakly supervised semantic segmentation tasks. However, existing methods primarily rely on the attention mechanism to establish relations between classes and image patches, often neglecting the intrinsic interrelations among tokens within datasets. To address this gap, we propose the Inter-image Token Relation Learning (ITRL) framework, which advances weakly supervised semantic segmentation by inter-image consistency. Specifically, the Inter-image Class Token Contrast method is introduced to generate comprehensive class representations by contrasting class tokens in a memory bank manner. Additionally, the Inter-image Patch Token Align approach is presented, which enhances the normalized mutual information among patch tokens, thereby strengthening their interdependencies. Extensive experiments validated the proposed framework, showcasing competitive mean Intersection over Union scores on the PASCAL VOC 2012 and MS COCO 2014 datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104576"},"PeriodicalIF":3.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge NeRF: Few-shot novel view synthesis for dynamic articulated objects 知识NeRF:动态铰接对象的少镜头新视图合成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-22 DOI: 10.1016/j.jvcir.2025.104586
Wenxiao Cai , Xinyue Lei , Xinyu He , Junming Leo Chen , Yuzhi Hao , Yangang Wang
We introduce Knowledge NeRF, a few-shot framework for novel-view synthesis of dynamic articulated objects. Conventional dynamic-NeRF methods learn a deformation field from long monocular videos, yet they degrade sharply when only sparse observations are available. Our key idea is to reuse a high-quality, pose-specific NeRF as a knowledge base and learn a lightweight projection module for each new pose that maps 3-D points in the current state to their canonical counterparts. By freezing the pretrained radiance field and training only this module with five input images, Knowledge NeRF renders novel views whose fidelity matches a NeRF trained with one hundred images. Experimental results demonstrate the effectiveness of our method in reconstructing dynamic 3D scenes with 5 input images in one state. Knowledge NeRF is a new pipeline and a promising solution for novel view synthesis in dynamic articulated objects. The data and implementation will be publicly available at: https://github.com/RussRobin/Knowledge_NeRF.
我们介绍了Knowledge NeRF,这是一个用于动态铰接对象的新视图合成的几个镜头框架。传统的动态- nerf方法从长单目视频中学习变形场,然而当只有稀疏观测时,它们的性能会急剧下降。我们的主要想法是重用一个高质量的、特定姿势的NeRF作为知识库,并为每个新姿势学习一个轻量级的投影模块,该模块将当前状态下的3-D点映射到它们的规范对应点。通过冻结预训练的亮度场并仅用五个输入图像训练该模块,Knowledge NeRF呈现出新颖的视图,其保真度与使用100个图像训练的NeRF相匹配。实验结果表明,该方法可以有效地重建5幅输入图像在同一状态下的动态三维场景。知识NeRF是一种新的管道,是动态铰接对象中新颖视图合成的一种很有前景的解决方案。数据和实施将在https://github.com/RussRobin/Knowledge_NeRF上公开。
{"title":"Knowledge NeRF: Few-shot novel view synthesis for dynamic articulated objects","authors":"Wenxiao Cai ,&nbsp;Xinyue Lei ,&nbsp;Xinyu He ,&nbsp;Junming Leo Chen ,&nbsp;Yuzhi Hao ,&nbsp;Yangang Wang","doi":"10.1016/j.jvcir.2025.104586","DOIUrl":"10.1016/j.jvcir.2025.104586","url":null,"abstract":"<div><div>We introduce Knowledge NeRF, a few-shot framework for novel-view synthesis of dynamic articulated objects. Conventional dynamic-NeRF methods learn a deformation field from long monocular videos, yet they degrade sharply when only sparse observations are available. Our key idea is to reuse a high-quality, pose-specific NeRF as a knowledge base and learn a lightweight projection module for each new pose that maps 3-D points in the current state to their canonical counterparts. By freezing the pretrained radiance field and training only this module with five input images, Knowledge NeRF renders novel views whose fidelity matches a NeRF trained with one hundred images. Experimental results demonstrate the effectiveness of our method in reconstructing dynamic 3D scenes with 5 input images in one state. Knowledge NeRF is a new pipeline and a promising solution for novel view synthesis in dynamic articulated objects. The data and implementation will be publicly available at: <span><span>https://github.com/RussRobin/Knowledge_NeRF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104586"},"PeriodicalIF":3.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint airport runway segmentation and line detection via multi-task learning for intelligent visual navigation 基于多任务学习的智能视觉导航联合机场跑道分割与线路检测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-19 DOI: 10.1016/j.jvcir.2025.104589
Lichun Yang , Jianghao Wu , Hongguang Li , Chunlei Liu , Shize Wei
This paper presents a novel multi-task learning framework for joint airport runway segmentation and line detection, addressing two key challenges in aircraft visual navigation: (1) edge detection for sub-5 %-pixel targets and (2) computational inefficiencies in existing methods. Our contributions include: (i) ENecNet, a lightweight yet powerful encoder that boosts small-target detection IoU by 15.5 % through optimized channel expansion and architectural refinement; (ii) a dual-decoder design with task-specific branches for area segmentation and edge line detection; and (iii) a dynamically weighted multi-task loss function to ensure balanced training. Extensive evaluations on the RDD5000 dataset show state-of-the-art performance with 0.9709 segmentation IoU and 0.6256 line detection IoU at 38.4 FPS. The framework also demonstrates robust performance (0.9513–0.9664 IoU) across different airports and challenging conditions such as nighttime, smog, and mountainous terrain, proving its suitability for real-time onboard navigation systems.
本文提出了一种新的多任务学习框架,用于联合机场跑道分割和直线检测,解决了飞机视觉导航中的两个关键挑战:(1)低于5%像素目标的边缘检测;(2)现有方法的计算效率低下。我们的贡献包括:(i) ENecNet,一个轻量级但功能强大的编码器,通过优化的通道扩展和架构改进,将小目标检测IoU提高了15.5%;(ii)具有特定任务分支的双解码器设计,用于区域分割和边缘线检测;(iii)动态加权多任务损失函数,保证均衡训练。对RDD5000数据集的广泛评估显示,在38.4 FPS的情况下,分割IoU为0.9709,线检测IoU为0.6256,具有最先进的性能。该框架还在不同的机场和具有挑战性的条件下(如夜间、烟雾和山区地形)展示了强大的性能(0.9513-0.9664 IoU),证明了其适用于实时机载导航系统。
{"title":"Joint airport runway segmentation and line detection via multi-task learning for intelligent visual navigation","authors":"Lichun Yang ,&nbsp;Jianghao Wu ,&nbsp;Hongguang Li ,&nbsp;Chunlei Liu ,&nbsp;Shize Wei","doi":"10.1016/j.jvcir.2025.104589","DOIUrl":"10.1016/j.jvcir.2025.104589","url":null,"abstract":"<div><div>This paper presents a novel multi-task learning framework for joint airport runway segmentation and line detection, addressing two key challenges in aircraft visual navigation: (1) edge detection for sub-5 %-pixel targets and (2) computational inefficiencies in existing methods. Our contributions include: (i) ENecNet, a lightweight yet powerful encoder that boosts small-target detection IoU by 15.5 % through optimized channel expansion and architectural refinement; (ii) a dual-decoder design with task-specific branches for area segmentation and edge line detection; and (iii) a dynamically weighted multi-task loss function to ensure balanced training. Extensive evaluations on the RDD5000 dataset show state-of-the-art performance with 0.9709 segmentation IoU and 0.6256 line detection IoU at 38.4 FPS. The framework also demonstrates robust performance (0.9513–0.9664 IoU) across different airports and challenging conditions such as nighttime, smog, and mountainous terrain, proving its suitability for real-time onboard navigation systems.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104589"},"PeriodicalIF":3.1,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A non-extended 3D mesh secret sharing scheme adapted for FPGA processing 一种适用于FPGA处理的非扩展三维网格秘密共享方案
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-18 DOI: 10.1016/j.jvcir.2025.104580
Hao Kong , Zi-Ming Wu , Bin Yan , Jeng-Shyang Pan , Hong-Mei Yang
The existing meaningful secret sharing schemes for 3D model face the issue of model extension. To address this problem, we propose a non-extended secret 3D mesh sharing scheme. Considering the large amount of data that needs to be shared in a 3D model, we designed a circuit structure to accelerate the computation during sharing. In the sharing stage, vertex data is encoded and converted to integer data from floating-point data. This is more conducive to handling the computation in FPGA. By adjusting the length of the encoding, multiple secrets can be embedded in the vertex encoding stage. This solves the extension problem of the scheme. Experiments were conducted on a set of 3D meshes to compare the differences between the cover models and the shares. This experimental result shows that the shares maintain high fidelity with the cover meshes. Furthermore, the FPGA implementation achieves a throughput of 675Mbit/s. Simulation results show that the parallel circuit structure is 30 times faster than the serial structure. In terms of resource consumption, the circuit structure designed in this scheme occupies less than 5% of the on-chip resources.
现有的有意义的三维模型秘密共享方案都面临着模型可拓的问题。为了解决这个问题,我们提出了一种非扩展的秘密3D网格共享方案。考虑到三维模型中需要共享的数据量很大,我们设计了一种电路结构来加快共享过程中的计算速度。在共享阶段,顶点数据被编码并从浮点数据转换为整数数据。这更有利于在FPGA中处理计算。通过调整编码长度,可以在顶点编码阶段嵌入多个秘密。解决了方案的可拓性问题。在一组三维网格上进行了实验,比较了覆盖模型与份额的差异。实验结果表明,份额与覆盖网格保持了较高的保真度。此外,FPGA实现实现了675Mbit/s的吞吐量。仿真结果表明,并联电路结构的速度是串行电路结构的30倍。在资源消耗方面,本方案设计的电路结构占用片上资源不到5%。
{"title":"A non-extended 3D mesh secret sharing scheme adapted for FPGA processing","authors":"Hao Kong ,&nbsp;Zi-Ming Wu ,&nbsp;Bin Yan ,&nbsp;Jeng-Shyang Pan ,&nbsp;Hong-Mei Yang","doi":"10.1016/j.jvcir.2025.104580","DOIUrl":"10.1016/j.jvcir.2025.104580","url":null,"abstract":"<div><div>The existing meaningful secret sharing schemes for 3D model face the issue of model extension. To address this problem, we propose a non-extended secret 3D mesh sharing scheme. Considering the large amount of data that needs to be shared in a 3D model, we designed a circuit structure to accelerate the computation during sharing. In the sharing stage, vertex data is encoded and converted to integer data from floating-point data. This is more conducive to handling the computation in FPGA. By adjusting the length of the encoding, multiple secrets can be embedded in the vertex encoding stage. This solves the extension problem of the scheme. Experiments were conducted on a set of 3D meshes to compare the differences between the cover models and the shares. This experimental result shows that the shares maintain high fidelity with the cover meshes. Furthermore, the FPGA implementation achieves a throughput of 675Mbit/s. Simulation results show that the parallel circuit structure is 30 times faster than the serial structure. In terms of resource consumption, the circuit structure designed in this scheme occupies less than 5% of the on-chip resources.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104580"},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145121127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defending against adversarial attacks via an Adaptive Guided Denoising Diffusion model 通过自适应制导去噪扩散模型防御对抗性攻击
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-17 DOI: 10.1016/j.jvcir.2025.104584
Yanlei Wei , Yongping Wang , Xiaolin Zhang , Jingyu Wang , Lixin Liu
The emergence of a large number of adversarial samples has exposed the vulnerabilities of Deep Neural Networks (DNNs). With the rise of diffusion models, their powerful denoising capabilities have made them a popular strategy for adversarial defense. The defense capability of diffusion models is effective against simple adversarial attacks; however, their effectiveness diminishes when facing more sophisticated and complex attacks. To address this issue, this paper proposes a method called Adaptive Guided Denoising Diffusion (AGDD), which can effectively defend against adversarial attacks. Specifically, we first apply a small noise perturbation to the given adversarial samples, performing the forward diffusion process. Then, in the reverse denoising phase, the diffusion model is guided by the adaptive guided formula gAG to perform denoising. At the same time, the adaptive guided formula gAG is adjusted according to the adaptive matrix Gt and the residual rt. Additionally, we introduced a momentum factor m to further optimize the denoising process, reduce the oscillations caused by gradient variations, and enhance the stability and convergence of the optimization process. Through AGDD, the denoised images accurately reconstruct the characteristics of the original observations (i.e., the unperturbed images) and exhibit strong robustness and adaptability across diverse noise conditions. Extensive experiments on the ImageNet dataset using Convolutional Neural Networks (CNN) and Vision Transformer (ViT) architectures demonstrate that the proposed method exhibits superior robustness against adversarial attacks, with classification accuracy reaching 87.4% for CNN and 85.9% for ViT, surpassing other state-of-the-art defense techniques.
大量对抗性样本的出现暴露了深度神经网络(dnn)的脆弱性。随着扩散模型的兴起,其强大的去噪能力使其成为对抗防御的一种流行策略。扩散模型的防御能力对简单的对抗性攻击是有效的;然而,当面对更复杂的攻击时,它们的有效性就会降低。为了解决这个问题,本文提出了一种称为自适应制导去噪扩散(AGDD)的方法,该方法可以有效地防御对抗性攻击。具体来说,我们首先对给定的对抗样本施加小的噪声扰动,执行正向扩散过程。然后,在反向去噪阶段,利用自适应制导公式gAG引导扩散模型进行去噪。同时,根据自适应矩阵Gt和残差rt对自适应制导公式gAG进行调整,并引入动量因子m进一步优化去噪过程,降低梯度变化引起的振荡,增强优化过程的稳定性和收敛性。通过AGDD,去噪后的图像能够准确地重建原始观测值(即未扰动图像)的特征,并在不同噪声条件下表现出较强的鲁棒性和适应性。在ImageNet数据集上使用卷积神经网络(CNN)和视觉变压器(ViT)架构进行的大量实验表明,所提出的方法对对对性攻击具有优越的鲁棒性,CNN和ViT的分类准确率分别达到87.4%和85.9%,超过了其他最先进的防御技术。
{"title":"Defending against adversarial attacks via an Adaptive Guided Denoising Diffusion model","authors":"Yanlei Wei ,&nbsp;Yongping Wang ,&nbsp;Xiaolin Zhang ,&nbsp;Jingyu Wang ,&nbsp;Lixin Liu","doi":"10.1016/j.jvcir.2025.104584","DOIUrl":"10.1016/j.jvcir.2025.104584","url":null,"abstract":"<div><div>The emergence of a large number of adversarial samples has exposed the vulnerabilities of Deep Neural Networks (DNNs). With the rise of diffusion models, their powerful denoising capabilities have made them a popular strategy for adversarial defense. The defense capability of diffusion models is effective against simple adversarial attacks; however, their effectiveness diminishes when facing more sophisticated and complex attacks. To address this issue, this paper proposes a method called Adaptive Guided Denoising Diffusion (AGDD), which can effectively defend against adversarial attacks. Specifically, we first apply a small noise perturbation to the given adversarial samples, performing the forward diffusion process. Then, in the reverse denoising phase, the diffusion model is guided by the adaptive guided formula <span><math><msub><mrow><mi>g</mi></mrow><mrow><mi>A</mi><mi>G</mi></mrow></msub></math></span> to perform denoising. At the same time, the adaptive guided formula <span><math><msub><mrow><mi>g</mi></mrow><mrow><mi>A</mi><mi>G</mi></mrow></msub></math></span> is adjusted according to the adaptive matrix <span><math><msub><mrow><mi>G</mi></mrow><mrow><mi>t</mi></mrow></msub></math></span> and the residual <span><math><msub><mrow><mi>r</mi></mrow><mrow><mi>t</mi></mrow></msub></math></span>. Additionally, we introduced a momentum factor <span><math><mi>m</mi></math></span> to further optimize the denoising process, reduce the oscillations caused by gradient variations, and enhance the stability and convergence of the optimization process. Through AGDD, the denoised images accurately reconstruct the characteristics of the original observations (i.e., the unperturbed images) and exhibit strong robustness and adaptability across diverse noise conditions. Extensive experiments on the ImageNet dataset using Convolutional Neural Networks (CNN) and Vision Transformer (ViT) architectures demonstrate that the proposed method exhibits superior robustness against adversarial attacks, with classification accuracy reaching 87.4% for CNN and 85.9% for ViT, surpassing other state-of-the-art defense techniques.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104584"},"PeriodicalIF":3.1,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145121128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1