首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
All-in-One Transformer for Image Restoration Under Adverse Weather Degradations. 用于恶劣天气退化下图像恢复的一体化变压器。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tpami.2026.3658598
Jiawei Mao,Yu Yang,Xuesong Yin,Ling Shao,Hao Tang
Severe weather restoration models often face the simultaneous interaction of multiple degradations in real-world scenarios. Existing approaches typically handle single or composite degradations based on scene descriptors derived from text or image embeddings. However, due to the varying proportions of different degradations within an image, these scene descriptors may not accurately differentiate between degradations, leading to suboptimal restoration in practical applications. To address this issue, we propose a novel Transformer-based restoration framework, AllRestorer, for dealing with four physical severe weather impairments: low-light, haze, rain, and snow. In AllRestorer, we enable the model to adaptively consider all weather impairments, thereby avoiding errors from scene descriptor misdirection. Specifically, we introduce the All-in-One Transformer Block (AiOTB), the core innovation of which is the ability to adaptively handle multiple degradations in a single image, beyond the limitation of existing Transformers that can only handle one type of degradation at a time. To accurately address different variations potentially present within the same type of degradation and minimize ambiguity, AiOTB utilizes a Composite Scene Embedding consisting of both image and text embeddings to define the degradation. Moreover, AiOTB includes an adaptive weight for each degradation, allowing for precise control of the restoration intensity. By leveraging AiOTB, AllRestorer avoids misdirection caused by inaccurate scene descriptors, achieving a 5.00 dB increase in PSNR compared to the baseline on the CDD-11 dataset.
在现实场景中,恶劣天气恢复模型经常面临多重退化的同时相互作用。现有的方法通常处理基于文本或图像嵌入衍生的场景描述符的单一或复合退化。然而,由于图像中不同退化的比例不同,这些场景描述符可能无法准确区分退化,从而导致实际应用中的次优恢复。为了解决这个问题,我们提出了一种新的基于transformer的恢复框架AllRestorer,用于处理四种物理恶劣天气损害:低光、雾霾、雨和雪。在AllRestorer中,我们使模型能够自适应地考虑所有天气损害,从而避免场景描述符误导的错误。具体来说,我们介绍了All-in-One Transformer Block (AiOTB),其核心创新是能够自适应地处理单个图像中的多种退化,超越了现有变压器一次只能处理一种退化的限制。为了准确地处理同一类型退化中可能出现的不同变化并最大限度地减少歧义,AiOTB使用由图像和文本嵌入组成的复合场景嵌入来定义退化。此外,AiOTB还包括每个退化的自适应权重,允许精确控制恢复强度。通过利用AiOTB, AllRestorer避免了由不准确的场景描述符引起的误导,与CDD-11数据集的基线相比,实现了5.00 dB的PSNR增加。
{"title":"All-in-One Transformer for Image Restoration Under Adverse Weather Degradations.","authors":"Jiawei Mao,Yu Yang,Xuesong Yin,Ling Shao,Hao Tang","doi":"10.1109/tpami.2026.3658598","DOIUrl":"https://doi.org/10.1109/tpami.2026.3658598","url":null,"abstract":"Severe weather restoration models often face the simultaneous interaction of multiple degradations in real-world scenarios. Existing approaches typically handle single or composite degradations based on scene descriptors derived from text or image embeddings. However, due to the varying proportions of different degradations within an image, these scene descriptors may not accurately differentiate between degradations, leading to suboptimal restoration in practical applications. To address this issue, we propose a novel Transformer-based restoration framework, AllRestorer, for dealing with four physical severe weather impairments: low-light, haze, rain, and snow. In AllRestorer, we enable the model to adaptively consider all weather impairments, thereby avoiding errors from scene descriptor misdirection. Specifically, we introduce the All-in-One Transformer Block (AiOTB), the core innovation of which is the ability to adaptively handle multiple degradations in a single image, beyond the limitation of existing Transformers that can only handle one type of degradation at a time. To accurately address different variations potentially present within the same type of degradation and minimize ambiguity, AiOTB utilizes a Composite Scene Embedding consisting of both image and text embeddings to define the degradation. Moreover, AiOTB includes an adaptive weight for each degradation, allowing for precise control of the restoration intensity. By leveraging AiOTB, AllRestorer avoids misdirection caused by inaccurate scene descriptors, achieving a 5.00 dB increase in PSNR compared to the baseline on the CDD-11 dataset.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"44 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocabulary-free Image Classification and Semantic Segmentation 无词汇图像分类与语义分割
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1109/tpami.2026.3657989
Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci
{"title":"Vocabulary-free Image Classification and Semantic Segmentation","authors":"Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci","doi":"10.1109/tpami.2026.3657989","DOIUrl":"https://doi.org/10.1109/tpami.2026.3657989","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"15 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFormer++: Improving RGBD Representation Learning for Semantic Segmentation 改进用于语义分割的RGBD表示学习
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1109/tpami.2026.3658114
Bo-Wen Yin, Jiao-Long Cao, Dan Xu, Ming-Ming Cheng, Qibin Hou
{"title":"DFormer++: Improving RGBD Representation Learning for Semantic Segmentation","authors":"Bo-Wen Yin, Jiao-Long Cao, Dan Xu, Ming-Ming Cheng, Qibin Hou","doi":"10.1109/tpami.2026.3658114","DOIUrl":"https://doi.org/10.1109/tpami.2026.3658114","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"60 1","pages":"1-14"},"PeriodicalIF":23.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146056018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positive Data Augmentation Based on Manifold Heuristic Optimization for Image Classification. 基于流形启发式优化的正数据增强图像分类。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1109/tpami.2026.3657249
Fangqing Liu,Han Huang,Fujian Feng,Xueming Yan,Zhifeng Hao
Data augmentation is crucial for addressing insufficient training data, especially for augmenting positive samples. However, existing methods mostly rely on neural network-based feedback for data augmentation and often overlook the optimization of feature distribution. In this study, we present a practical, distribution-preserving data augmentation pipeline that augments positive samples by optimizing a feature indicator (e.g., two-dimensional entropy), aiming to maintain alignment with the original data distribution. Inspired by the manifold hypothesis, we propose a Manifold Heuristic Optimization Algorithm (MHOA), which augments positive samples by exploring the low-dimensional Euclidean space around object contour pixels instead of the entire decision space. Guided by a "distribution-preservation-first" perspective, our approach explicitly optimizes fidelity to the original data manifold and only retains augmented samples whose feature statistics (e.g., mean, variance) align with the source class. It significantly improves image classification accuracy across neural networks, outperforming state-of-the-art data augmentation methods-especially when the dataset's feature indicator follows a Gaussian distribution. The algorithm's search space, focused on neighborhoods of key feature pixels, is the core driver of its superior performance.
数据增强对于解决训练数据不足的问题至关重要,特别是对于增强阳性样本。然而,现有的方法大多依赖于基于神经网络的反馈进行数据增强,往往忽略了特征分布的优化。在这项研究中,我们提出了一个实用的,保持分布的数据增强管道,通过优化特征指标(例如,二维熵)来增强正样本,旨在保持与原始数据分布的一致性。受流形假设的启发,我们提出了一种流形启发式优化算法(MHOA),该算法通过探索物体轮廓像素周围的低维欧几里德空间而不是整个决策空间来增加正样本。在“分布-保存优先”观点的指导下,我们的方法明确地优化了原始数据流形的保真度,并且只保留其特征统计(例如,均值,方差)与源类一致的增强样本。它显著提高了跨神经网络的图像分类精度,优于最先进的数据增强方法——特别是当数据集的特征指标遵循高斯分布时。该算法的搜索空间,重点关注关键特征像素的邻域,是其优越性能的核心驱动因素。
{"title":"Positive Data Augmentation Based on Manifold Heuristic Optimization for Image Classification.","authors":"Fangqing Liu,Han Huang,Fujian Feng,Xueming Yan,Zhifeng Hao","doi":"10.1109/tpami.2026.3657249","DOIUrl":"https://doi.org/10.1109/tpami.2026.3657249","url":null,"abstract":"Data augmentation is crucial for addressing insufficient training data, especially for augmenting positive samples. However, existing methods mostly rely on neural network-based feedback for data augmentation and often overlook the optimization of feature distribution. In this study, we present a practical, distribution-preserving data augmentation pipeline that augments positive samples by optimizing a feature indicator (e.g., two-dimensional entropy), aiming to maintain alignment with the original data distribution. Inspired by the manifold hypothesis, we propose a Manifold Heuristic Optimization Algorithm (MHOA), which augments positive samples by exploring the low-dimensional Euclidean space around object contour pixels instead of the entire decision space. Guided by a \"distribution-preservation-first\" perspective, our approach explicitly optimizes fidelity to the original data manifold and only retains augmented samples whose feature statistics (e.g., mean, variance) align with the source class. It significantly improves image classification accuracy across neural networks, outperforming state-of-the-art data augmentation methods-especially when the dataset's feature indicator follows a Gaussian distribution. The algorithm's search space, focused on neighborhoods of key feature pixels, is the core driver of its superior performance.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"56 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146034078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSNeRF: Dynamic View Synthesis for Ultra-Fast Scenes from Continuous Spike Streams. DSNeRF:动态视图合成从连续尖峰流超快速场景。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1109/tpami.2026.3656825
Lin Zhu,Kangmin Jia,Yifan Zhao,Yunshan Qi,Lizhi Wang,Hua Huang
Spike cameras generate binary spikes in response to light intensity changes, enabling high-speed visual perception with unprecedented temporal resolution. However, the unique characteristics of spike stream present significant challenges for reconstructing dense 3D scene representations, particularly in dynamic environments and under non-ideal lighting conditions. In this paper, we introduce DSNeRF, the first method to derive a NeRF-based volumetric scene representation from spike camera data. Our approach leverages NeRF's multi-view consistency to establish robust self-supervision, effectively eliminating erroneous measurements and uncovering coherent structures within exceedingly noisy input amidst diverse real-world illumination scenarios. We propose a novel mapping from pixel rays to the spike domain, integrating the spike generation process directly into NeRF training. Specifically, DSNeRF introduces an integrate-and-fire neuron layer that models non-idealities to capture intrinsic camera noise, including both random and fixed-pattern spike noise, thereby enhancing scene fidelity. Additionally, we propose a motion-guided spiking neuron layer and a long-term rendering photometric loss to better align dynamic spike streams, ensuring accurate scene geometry. Our method optimizes neural radiance fields to render photorealistic novel views from continuous spike streams, demonstrating advantages over other vision sensors in certain scenes. Empirical evaluations on both real and simulated sequences validate the effectiveness of our approach. The dataset and source code will be released at https://github.com/BIT-Vision/DSNeRF.
尖峰相机产生响应光强度变化的二进制尖峰,实现高速视觉感知与前所未有的时间分辨率。然而,尖峰流的独特特性对重建密集的3D场景表示提出了重大挑战,特别是在动态环境和非理想照明条件下。在本文中,我们介绍了DSNeRF,这是第一种从spike相机数据中获得基于nerf的体积场景表示的方法。我们的方法利用NeRF的多视图一致性来建立强大的自我监督,有效地消除了错误的测量,并在不同的现实世界照明场景中发现了非常嘈杂的输入中的连贯结构。我们提出了一种新的从像素射线到脉冲域的映射,将脉冲生成过程直接集成到NeRF训练中。具体来说,DSNeRF引入了一个集成和激活神经元层,该层模拟非理想情况以捕获固有的相机噪声,包括随机和固定模式的尖峰噪声,从而提高场景保真度。此外,我们提出了一个运动引导的峰值神经元层和一个长期渲染的光度损失,以更好地对齐动态峰值流,确保准确的场景几何。我们的方法优化了神经辐射场,从连续的脉冲流中呈现出逼真的新视图,在某些场景中显示出比其他视觉传感器的优势。对真实和模拟序列的经验评估验证了我们方法的有效性。数据集和源代码将在https://github.com/BIT-Vision/DSNeRF上发布。
{"title":"DSNeRF: Dynamic View Synthesis for Ultra-Fast Scenes from Continuous Spike Streams.","authors":"Lin Zhu,Kangmin Jia,Yifan Zhao,Yunshan Qi,Lizhi Wang,Hua Huang","doi":"10.1109/tpami.2026.3656825","DOIUrl":"https://doi.org/10.1109/tpami.2026.3656825","url":null,"abstract":"Spike cameras generate binary spikes in response to light intensity changes, enabling high-speed visual perception with unprecedented temporal resolution. However, the unique characteristics of spike stream present significant challenges for reconstructing dense 3D scene representations, particularly in dynamic environments and under non-ideal lighting conditions. In this paper, we introduce DSNeRF, the first method to derive a NeRF-based volumetric scene representation from spike camera data. Our approach leverages NeRF's multi-view consistency to establish robust self-supervision, effectively eliminating erroneous measurements and uncovering coherent structures within exceedingly noisy input amidst diverse real-world illumination scenarios. We propose a novel mapping from pixel rays to the spike domain, integrating the spike generation process directly into NeRF training. Specifically, DSNeRF introduces an integrate-and-fire neuron layer that models non-idealities to capture intrinsic camera noise, including both random and fixed-pattern spike noise, thereby enhancing scene fidelity. Additionally, we propose a motion-guided spiking neuron layer and a long-term rendering photometric loss to better align dynamic spike streams, ensuring accurate scene geometry. Our method optimizes neural radiance fields to render photorealistic novel views from continuous spike streams, demonstrating advantages over other vision sensors in certain scenes. Empirical evaluations on both real and simulated sequences validate the effectiveness of our approach. The dataset and source code will be released at https://github.com/BIT-Vision/DSNeRF.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"3 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unifying Multi-modal Hair Editing via Proxy Feature Blending. 通过代理特征混合统一多模态头发编辑。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1109/tpami.2026.3656763
Tianyi Wei,Dongdong Chen,Wenbo Zhou,Jing Liao,Can Wang,Weiming Zhang,Gang Hua,Nenghai Yu
Hair editing is a long-standing problem in computer vision that demands both fine-grained local control and intuitive user interactions across diverse modalities. Despite the remarkable progress of GANs and diffusion models, existing methods still lack a unified framework that simultaneously supports arbitrary interaction modes (e.g., text, sketch, mask, and reference image) while ensuring precise editing and faithful preservation of irrelevant attributes. In this work, we introduce a novel paradigm that reformulates hair editing as proxy-based hair transfer. Specifically, we leverage the dense and semantically disentangled latent space of StyleGAN for precise manipulation and exploit its feature space for disentangled attribute preservation, thereby decoupling the objectives of editing and preservation. Our framework unifies different modalities by converting editing conditions into distinct transfer proxies, whose features are seamlessly blended to achieve global or local edits. Beyond 2D, we extend our paradigm to 3D-aware settings by incorporating EG3D and PanoHead, where we propose a multi-view boosted hair feature localization strategy together with 3D-tailored proxy generation methods that exploit the inherent properties of 3D-aware generative models. Extensive experiments demonstrate that our method consistently outperforms prior approaches in editing effects, attribute preservation, visual naturalness, and multi-view consistency, while offering unprecedented support for multimodal and mixed-modal interactions.
头发编辑是计算机视觉中一个长期存在的问题,它既需要细粒度的局部控制,又需要跨多种模式的直观用户交互。尽管gan和扩散模型取得了显著进展,但现有方法仍然缺乏一个统一的框架,既能同时支持任意交互模式(如文本、草图、蒙版和参考图像),又能确保对无关属性的精确编辑和忠实保存。在这项工作中,我们引入了一种新的范式,将头发编辑重新定义为基于代理的头发转移。具体而言,我们利用StyleGAN的密集和语义解纠缠的潜在空间进行精确操作,并利用其特征空间进行解纠缠的属性保存,从而将编辑和保存的目标解耦。我们的框架通过将编辑条件转换为不同的传输代理来统一不同的模式,这些代理的功能无缝混合以实现全局或本地编辑。在2D之外,我们通过结合EG3D和PanoHead将我们的范例扩展到3d感知设置,在那里我们提出了一个多视图增强的头发特征定位策略,以及利用3d感知生成模型固有属性的3d定制代理生成方法。大量的实验表明,我们的方法在编辑效果、属性保存、视觉自然性和多视图一致性方面始终优于先前的方法,同时为多模态和混合模态交互提供了前所未有的支持。
{"title":"Unifying Multi-modal Hair Editing via Proxy Feature Blending.","authors":"Tianyi Wei,Dongdong Chen,Wenbo Zhou,Jing Liao,Can Wang,Weiming Zhang,Gang Hua,Nenghai Yu","doi":"10.1109/tpami.2026.3656763","DOIUrl":"https://doi.org/10.1109/tpami.2026.3656763","url":null,"abstract":"Hair editing is a long-standing problem in computer vision that demands both fine-grained local control and intuitive user interactions across diverse modalities. Despite the remarkable progress of GANs and diffusion models, existing methods still lack a unified framework that simultaneously supports arbitrary interaction modes (e.g., text, sketch, mask, and reference image) while ensuring precise editing and faithful preservation of irrelevant attributes. In this work, we introduce a novel paradigm that reformulates hair editing as proxy-based hair transfer. Specifically, we leverage the dense and semantically disentangled latent space of StyleGAN for precise manipulation and exploit its feature space for disentangled attribute preservation, thereby decoupling the objectives of editing and preservation. Our framework unifies different modalities by converting editing conditions into distinct transfer proxies, whose features are seamlessly blended to achieve global or local edits. Beyond 2D, we extend our paradigm to 3D-aware settings by incorporating EG3D and PanoHead, where we propose a multi-view boosted hair feature localization strategy together with 3D-tailored proxy generation methods that exploit the inherent properties of 3D-aware generative models. Extensive experiments demonstrate that our method consistently outperforms prior approaches in editing effects, attribute preservation, visual naturalness, and multi-view consistency, while offering unprecedented support for multimodal and mixed-modal interactions.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"95 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAKGE: Context-aware Adaptive Learning for Dynamic Knowledge Graph Embeddings. 动态知识图嵌入的上下文感知自适应学习。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1109/tpami.2026.3655896
Zongsheng Cao,Qianqian Xu,Zhiyong Yang,Xiaochun Cao,Qingming Huang
Knowledge graph embeddings (KGE) are effective for representing factual data for numerous applications. However, real-world facts continually evolve, necessitating ongoing updates to knowledge graphs as new information emerges. Under these circumstances, existing KGE models in transductive, inductive, and continual learning settings are prone to catastrophic forgetting or require costly retraining to integrate new information. To address these challenges, we propose a novel model called the Context-aware Adaptive learning model for Knowledge Graph Embeddings (CAKGE). Our model first identifies semantic-relevant entities and uncovers latent relational paths to facilitate the acquisition of new knowledge. To ensure the paths are semantically aligned with the query, we employ a context-aware fusion module, which leverages multiple specialized expert networks to assess and integrate the relevance of these relational paths. Building on this, we introduce an adaptive message aggregation module that incorporates a knowledge replay strategy, enabling the model to integrate both new and existing knowledge efficiently, without retraining the knowledge graph. Additionally, to mitigate catastrophic forgetting, we reformulate the challenge of aligning new with existing knowledge as a graph-matching task using the Fused Gromov-Wasserstein distance, enabling the alignment of old and new knowledge from both semantic and topological perspectives. Furthermore, we provide theoretical guarantees for the expressiveness and reasoning ability of CAKGE, showing that it is the first unified framework tackling transductive, inductive, and continual settings. Extensive experiments show that CAKGE achieves state-of-the-art performance, demonstrating its effectiveness in dynamic KGE modeling.
知识图嵌入(Knowledge graph embeddings, KGE)在许多应用中都是表示事实数据的有效方法。然而,现实世界的事实在不断发展,随着新信息的出现,需要不断更新知识图谱。在这种情况下,现有的KGE模型在转导式、归纳式和持续学习设置中容易发生灾难性的遗忘,或者需要昂贵的再培训来整合新信息。为了解决这些挑战,我们提出了一种新的模型,称为知识图嵌入的上下文感知自适应学习模型(CAKGE)。我们的模型首先识别语义相关的实体,并揭示潜在的关系路径,以促进新知识的获取。为了确保路径在语义上与查询保持一致,我们采用了上下文感知融合模块,该模块利用多个专门的专家网络来评估和集成这些关系路径的相关性。在此基础上,我们引入了一个包含知识重播策略的自适应消息聚合模块,使模型能够有效地集成新知识和现有知识,而无需重新训练知识图。此外,为了减轻灾难性遗忘,我们使用融合的Gromov-Wasserstein距离将新知识与现有知识对齐的挑战重新表述为图形匹配任务,从而从语义和拓扑的角度实现新旧知识的对齐。此外,我们为CAKGE的表达能力和推理能力提供了理论保证,表明它是第一个处理转换、归纳和连续设置的统一框架。大量的实验表明,CAKGE达到了最先进的性能,证明了其在动态KGE建模中的有效性。
{"title":"CAKGE: Context-aware Adaptive Learning for Dynamic Knowledge Graph Embeddings.","authors":"Zongsheng Cao,Qianqian Xu,Zhiyong Yang,Xiaochun Cao,Qingming Huang","doi":"10.1109/tpami.2026.3655896","DOIUrl":"https://doi.org/10.1109/tpami.2026.3655896","url":null,"abstract":"Knowledge graph embeddings (KGE) are effective for representing factual data for numerous applications. However, real-world facts continually evolve, necessitating ongoing updates to knowledge graphs as new information emerges. Under these circumstances, existing KGE models in transductive, inductive, and continual learning settings are prone to catastrophic forgetting or require costly retraining to integrate new information. To address these challenges, we propose a novel model called the Context-aware Adaptive learning model for Knowledge Graph Embeddings (CAKGE). Our model first identifies semantic-relevant entities and uncovers latent relational paths to facilitate the acquisition of new knowledge. To ensure the paths are semantically aligned with the query, we employ a context-aware fusion module, which leverages multiple specialized expert networks to assess and integrate the relevance of these relational paths. Building on this, we introduce an adaptive message aggregation module that incorporates a knowledge replay strategy, enabling the model to integrate both new and existing knowledge efficiently, without retraining the knowledge graph. Additionally, to mitigate catastrophic forgetting, we reformulate the challenge of aligning new with existing knowledge as a graph-matching task using the Fused Gromov-Wasserstein distance, enabling the alignment of old and new knowledge from both semantic and topological perspectives. Furthermore, we provide theoretical guarantees for the expressiveness and reasoning ability of CAKGE, showing that it is the first unified framework tackling transductive, inductive, and continual settings. Extensive experiments show that CAKGE achieves state-of-the-art performance, demonstrating its effectiveness in dynamic KGE modeling.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"278 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
First-Order Cross-Domain Meta Learning for Few-Shot Remote Sensing Object Classification. 基于一阶跨域元学习的少拍遥感目标分类。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1109/tpami.2026.3656494
Wenda Zhao,Yunxiang Li,Haipeng Wang,Huchuan Lu
Remote sensing images exhibit intrinsic domain complexity arising from multi-source sensor variances, which heterogeneity fundamentally challenges conventional cross-domain few-shot methods that assume simple distribution shifts. Addressing this, we propose a first-order Cross-Domain Meta Learning (CDML) for few-shot remote sensing object classification. CDML implements a dual-stage domain adaptation task as the fundamental meta-learning unit, and includes a cross-domain meta-train phase (CDMTrain) and a cross-domain meta-test phase (CDMTest). In CDMTrain, we propose an inner-loop multi-domain few-shot task sampling, which enables a teacher model encapsulate both cross-category discriminative features and authentic inter-domain distributional divergence. This alternating cyclic learning paradigm captures genuine domain shifts, with each update direction progressively guiding the model toward parameters that balance multi-domain performance. In CDMTest, we evaluate a domain diversity enhancement by transferring teacher parameters to the student model for cross-domain capability assessment on the reserved pseudo-unseen domain. The task-level design progressively improves domain generalization through iterative domain adaptive task learning. Meanwhile, to mitigate the conflicts and inadequacies caused by multi-domain scenarios, we propose a learnable affine transformation model. It adaptively learns affine transformation parameters through intermediate layer features to fine-tune the update direction. Extensive experiments on five remote sensing classification benchmarks demonstrate a superior performance of the proposed method compared with the state-of-the-art methods. The code will be released at: https://github.com/lyxdlut/CDML.
由于多源传感器的差异,遥感图像表现出固有的域复杂性,这种异质性从根本上挑战了传统的跨域少拍方法,即假设简单的分布位移。针对这一问题,我们提出了一种一阶跨域元学习(CDML)方法来进行少镜头遥感目标分类。CDML实现双阶段域适应任务作为基本元学习单元,包括跨域元训练阶段(CDMTrain)和跨域元测试阶段(CDMTest)。在CDMTrain中,我们提出了一种内循环的多域少镜头任务采样,使教师模型既可以封装跨类别的判别特征,又可以封装真实的域间分布发散。这种交替的循环学习范式捕获了真正的领域转移,每个更新方向都逐步将模型导向平衡多领域性能的参数。在CDMTest中,我们通过将教师参数转移到学生模型中,在保留的伪不可见域上进行跨域能力评估,从而评估了域多样性增强。任务级设计通过迭代的领域自适应任务学习逐步提高领域泛化能力。同时,为了缓解多域场景带来的冲突和不足,我们提出了一种可学习的仿射变换模型。通过中间层特征自适应学习仿射变换参数,微调更新方向。在五个遥感分类基准上进行的大量实验表明,与目前的方法相比,所提出的方法具有优越的性能。代码将在https://github.com/lyxdlut/CDML上发布。
{"title":"First-Order Cross-Domain Meta Learning for Few-Shot Remote Sensing Object Classification.","authors":"Wenda Zhao,Yunxiang Li,Haipeng Wang,Huchuan Lu","doi":"10.1109/tpami.2026.3656494","DOIUrl":"https://doi.org/10.1109/tpami.2026.3656494","url":null,"abstract":"Remote sensing images exhibit intrinsic domain complexity arising from multi-source sensor variances, which heterogeneity fundamentally challenges conventional cross-domain few-shot methods that assume simple distribution shifts. Addressing this, we propose a first-order Cross-Domain Meta Learning (CDML) for few-shot remote sensing object classification. CDML implements a dual-stage domain adaptation task as the fundamental meta-learning unit, and includes a cross-domain meta-train phase (CDMTrain) and a cross-domain meta-test phase (CDMTest). In CDMTrain, we propose an inner-loop multi-domain few-shot task sampling, which enables a teacher model encapsulate both cross-category discriminative features and authentic inter-domain distributional divergence. This alternating cyclic learning paradigm captures genuine domain shifts, with each update direction progressively guiding the model toward parameters that balance multi-domain performance. In CDMTest, we evaluate a domain diversity enhancement by transferring teacher parameters to the student model for cross-domain capability assessment on the reserved pseudo-unseen domain. The task-level design progressively improves domain generalization through iterative domain adaptive task learning. Meanwhile, to mitigate the conflicts and inadequacies caused by multi-domain scenarios, we propose a learnable affine transformation model. It adaptively learns affine transformation parameters through intermediate layer features to fine-tune the update direction. Extensive experiments on five remote sensing classification benchmarks demonstrate a superior performance of the proposed method compared with the state-of-the-art methods. The code will be released at: https://github.com/lyxdlut/CDML.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"222 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Momentor++: Advancing Video Large Language Models With Fine-Grained Long Video Reasoning. 使用细粒度长视频推理的先进视频大型语言模型。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1109/tpami.2026.3656169
Juncheng Li,Minghe Gao,Xiangnan He,Siliang Tang,Weishi Zheng,Jun Xiao,Meng Wang,Tat-Seng Chua,Yueting Zhuang
Large Language Models (LLMs) exhibit remarkable proficiency in understanding and managing text-based tasks. Many works try to transfer these capabilities to the video domain, which are referred to as Video-LLMs. However, current Video-LLMs can only grasp the coarse-grained semantics and are unable to efficiently handle tasks involving the comprehension or localization of specific video segments. To address these challenges, we propose Momentor, a Video-LLM designed to perform fine-grained temporal understanding tasks. To facilitate the training of Momentor, we develop an automatic data generation engine to build Moment-10 M, a large-scale video instruction dataset with segment-level instruction data. Building upon the foundation of the previously published Momentor and the Moment-10 M dataset, we further extend this work by introducing a Spatio-Temporal Token Consolidation (STTC) method, which can merge redundant visual tokens spatio-temporally in a parameter-free manner, thereby significantly promoting computational efficiency while preserving fine-grained visual details. We integrate STTC with Momentor to develop Momentor++ and validate its performance on various benchmarks. Momentor demonstrates robust capabilities in fine-grained temporal understanding and localization. Further, Momentor++ excels in efficiently processing and analyzing extended videos with complex events, showcasing marked advancements in handling extensive temporal contexts.
大型语言模型(llm)在理解和管理基于文本的任务方面表现出非凡的熟练程度。许多工作试图将这些功能转移到视频领域,这被称为视频法学硕士。然而,目前的video - llm只能掌握粗粒度语义,无法有效地处理涉及特定视频片段的理解或定位的任务。为了应对这些挑战,我们提出了Momentor,这是一个视频法学硕士,旨在执行细粒度的时间理解任务。为了便于对Momentor进行训练,我们开发了一个自动数据生成引擎,构建了一个具有段级指令数据的大规模视频指令数据集moment - 10m。在之前发布的Momentor和moment - 10m数据集的基础上,我们通过引入时空令牌合并(STTC)方法进一步扩展了这项工作,该方法可以以无参数的方式在时空上合并冗余的视觉令牌,从而在保留细粒度视觉细节的同时显着提高计算效率。我们将STTC与Momentor集成,开发了momentor++,并在各种基准测试中验证了其性能。Momentor在细粒度时间理解和定位方面展示了强大的能力。此外,momentor++在有效处理和分析包含复杂事件的扩展视频方面表现出色,在处理广泛的时间背景方面表现出显著的进步。
{"title":"Momentor++: Advancing Video Large Language Models With Fine-Grained Long Video Reasoning.","authors":"Juncheng Li,Minghe Gao,Xiangnan He,Siliang Tang,Weishi Zheng,Jun Xiao,Meng Wang,Tat-Seng Chua,Yueting Zhuang","doi":"10.1109/tpami.2026.3656169","DOIUrl":"https://doi.org/10.1109/tpami.2026.3656169","url":null,"abstract":"Large Language Models (LLMs) exhibit remarkable proficiency in understanding and managing text-based tasks. Many works try to transfer these capabilities to the video domain, which are referred to as Video-LLMs. However, current Video-LLMs can only grasp the coarse-grained semantics and are unable to efficiently handle tasks involving the comprehension or localization of specific video segments. To address these challenges, we propose Momentor, a Video-LLM designed to perform fine-grained temporal understanding tasks. To facilitate the training of Momentor, we develop an automatic data generation engine to build Moment-10 M, a large-scale video instruction dataset with segment-level instruction data. Building upon the foundation of the previously published Momentor and the Moment-10 M dataset, we further extend this work by introducing a Spatio-Temporal Token Consolidation (STTC) method, which can merge redundant visual tokens spatio-temporally in a parameter-free manner, thereby significantly promoting computational efficiency while preserving fine-grained visual details. We integrate STTC with Momentor to develop Momentor++ and validate its performance on various benchmarks. Momentor demonstrates robust capabilities in fine-grained temporal understanding and localization. Further, Momentor++ excels in efficiently processing and analyzing extended videos with complex events, showcasing marked advancements in handling extensive temporal contexts.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"30 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable Egocentric Task Verification Via Cross-Modal Hybrid Hypergraph Matching 基于跨模态混合超图匹配的概化自中心任务验证
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-19 DOI: 10.1109/tpami.2026.3655147
Xun Jiang, Xing Xu, Zheng Wang, Jingkuan Song, Fumin Shen, Heng Tao Shen
{"title":"Generalizable Egocentric Task Verification Via Cross-Modal Hybrid Hypergraph Matching","authors":"Xun Jiang, Xing Xu, Zheng Wang, Jingkuan Song, Fumin Shen, Heng Tao Shen","doi":"10.1109/tpami.2026.3655147","DOIUrl":"https://doi.org/10.1109/tpami.2026.3655147","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"359 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146000902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1