Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3665097
Jie Wen, Lian Zhao, Xiaohuan Lu, Chengliang Liu, Li Shen, Chao Huang, Yong Xu
As a prominent research topic, multi-view multi-label classification (MvMlC) aims to assign multiple labels to samples by integrating information from various perspectives. However, in real-world scenarios, MvMlC frequently faces the learning challenge of data with missing views and labels, typically resulting from sensor malfunctions, or the costly and time-consuming process of manual annotation. In addition, learning robust representations that are both consistent across views and specific to individual views remains a challenge. To address these issues, we propose a novel double incomplete multi-view multi-label classification framework based on Disentangling Consistent and Specific Information (DCSI). Specifically, we employ a dual-channel encoder with identical architecture but distinct objectives to extract cross-view consistent information and view-specific unique information from all views, respectively. Meanwhile, a view discriminator is constructed to decouple these two types of information, facilitating the extraction of pure consistent and specific information. Moreover, we meticulously design fusion strategies tailored to each representation type. Regarding consistent representations, we propose a dynamic-confidence-aware fusion mechanism that assesses the reliability of each view's representations in relation to the classification task, enabling the model to prioritize information from trustworthy representations. For specific representations, in light of their complementary rather than redundant property, we suggest treating such representations from each view equally to ensure fairness. Through experimental validation on five datasets, the results demonstrate that our method outperforms existing state-of-the-art methods.
多视图多标签分类(multi-view multi-label classification, mvlc)是一个突出的研究课题,其目的是通过整合不同角度的信息,为样本分配多个标签。然而,在现实场景中,MvMlC经常面临缺少视图和标签的数据的学习挑战,这通常是由于传感器故障或手动注释的昂贵且耗时的过程造成的。此外,学习跨视图一致且特定于单个视图的鲁棒表示仍然是一个挑战。为了解决这些问题,我们提出了一种新的基于Disentangling Consistent and Specific Information (DCSI)的双不完全多视图多标签分类框架。具体来说,我们采用具有相同架构但目标不同的双通道编码器,分别从所有视图中提取跨视图一致信息和特定于视图的唯一信息。同时,构造了一个视图鉴别器来解耦这两类信息,便于提取纯粹的一致性信息和特定信息。此外,我们精心设计了适合每种表示类型的融合策略。关于一致性表示,我们提出了一种动态置信度感知融合机制,该机制评估每个视图表示与分类任务相关的可靠性,使模型能够优先考虑来自可信表示的信息。对于具体的表述,鉴于其互补而非冗余的性质,我们建议从每个角度平等地对待这些表述,以确保公平。通过在五个数据集上的实验验证,结果表明我们的方法优于现有的最先进的方法。
{"title":"Disentangling Consistent and Specific Information for Double Incomplete Multi-View Multi-Label Classification.","authors":"Jie Wen, Lian Zhao, Xiaohuan Lu, Chengliang Liu, Li Shen, Chao Huang, Yong Xu","doi":"10.1109/TPAMI.2026.3665097","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3665097","url":null,"abstract":"<p><p>As a prominent research topic, multi-view multi-label classification (MvMlC) aims to assign multiple labels to samples by integrating information from various perspectives. However, in real-world scenarios, MvMlC frequently faces the learning challenge of data with missing views and labels, typically resulting from sensor malfunctions, or the costly and time-consuming process of manual annotation. In addition, learning robust representations that are both consistent across views and specific to individual views remains a challenge. To address these issues, we propose a novel double incomplete multi-view multi-label classification framework based on Disentangling Consistent and Specific Information (DCSI). Specifically, we employ a dual-channel encoder with identical architecture but distinct objectives to extract cross-view consistent information and view-specific unique information from all views, respectively. Meanwhile, a view discriminator is constructed to decouple these two types of information, facilitating the extraction of pure consistent and specific information. Moreover, we meticulously design fusion strategies tailored to each representation type. Regarding consistent representations, we propose a dynamic-confidence-aware fusion mechanism that assesses the reliability of each view's representations in relation to the classification task, enabling the model to prioritize information from trustworthy representations. For specific representations, in light of their complementary rather than redundant property, we suggest treating such representations from each view equally to ensure fairness. Through experimental validation on five datasets, the results demonstrate that our method outperforms existing state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3666860
Qisen Wang, Yifan Zhao, Jia Li
The majority of standard diffusion models employ pixel-wise degradations while neglecting multi-scale characteristics of images. Recently, generalized diffusion models with Positive Semi-definite Degradations (PSD), such as heat dissipation and blurring, have been proposed to solve it, but suffering from problems of low generation quality due to incomplete optimization analysis and non-adaptiveness to the training process and different data distributions with hand-crafted and fixed inductive biases. In this paper, we present a comprehensive theoretical analysis of the optimization process in frequency domain for PSD-based generalized diffusion models, which implies the forward process of PSD frequency domain non-isotropic degradation implicitly acting on the inductive biases of the Variational Lower Bound non-isotropic weighting in the optimization reverse process. Based on this insight, we propose the Frequency Inductive Biases Bootstrapping Optimization (FIBBO) method, which parameterizes the forward process and learns distinct frequency degradation-generation trajectories iteratively. To tackle the problem of PSD hand-crafted and fixed inductive biases, FIBBO dynamically modifies the non-isotropic Gaussian kernel of the forward degradation process so that the inductive biases introduced can be adjusted adaptively during training. Experiments on public datasets show that FIBBO makes significant improvements in the generation quality of PSD-based generalized diffusion models. The code will be publicly available.
{"title":"Beyond Heat Dissipation: Optimizing Diffusion Models in Frequency Domain.","authors":"Qisen Wang, Yifan Zhao, Jia Li","doi":"10.1109/TPAMI.2026.3666860","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3666860","url":null,"abstract":"<p><p>The majority of standard diffusion models employ pixel-wise degradations while neglecting multi-scale characteristics of images. Recently, generalized diffusion models with Positive Semi-definite Degradations (PSD), such as heat dissipation and blurring, have been proposed to solve it, but suffering from problems of low generation quality due to incomplete optimization analysis and non-adaptiveness to the training process and different data distributions with hand-crafted and fixed inductive biases. In this paper, we present a comprehensive theoretical analysis of the optimization process in frequency domain for PSD-based generalized diffusion models, which implies the forward process of PSD frequency domain non-isotropic degradation implicitly acting on the inductive biases of the Variational Lower Bound non-isotropic weighting in the optimization reverse process. Based on this insight, we propose the Frequency Inductive Biases Bootstrapping Optimization (FIBBO) method, which parameterizes the forward process and learns distinct frequency degradation-generation trajectories iteratively. To tackle the problem of PSD hand-crafted and fixed inductive biases, FIBBO dynamically modifies the non-isotropic Gaussian kernel of the forward degradation process so that the inductive biases introduced can be adjusted adaptively during training. Experiments on public datasets show that FIBBO makes significant improvements in the generation quality of PSD-based generalized diffusion models. The code will be publicly available.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3667002
Zhicheng Cai, Hao Zhu, Qiu Shen, Xinran Wang, Xun Cao
Representing signals using coordinate networks dominates the area of inverse problems recently, and is widely applied in various scientific computing tasks. Still, there exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components. This problem is caused by the pathological distribution of the neural tangent kernel's (NTK's) eigenvalues of coordinate networks. We find that, this pathological distribution could be improved using classical normalization techniques (batch normalization and layer normalization), which are commonly used in convolutional neural networks but rarely used in coordinate networks. We prove that normalization techniques greatly reduces the maximum and variance of NTK's eigenvalues while slightly modifies the mean value, considering the max eigenvalue is much larger than the most, this variance change results in a shift of eigenvalues' distribution from a lower one to a higher one, therefore the spectral bias could be alleviated (see Fig. 1). Furthermore, we propose two new normalization techniques by combining these two techniques in different ways. The efficacy of these normalization techniques is substantiated by the significant improvements and new state-of-the-arts achieved by applying normalization-based coordinate networks to various tasks, including the image compression, computed tomography reconstruction, shape representation, magnetic resonance imaging, novel view synthesis and multi-view stereo reconstruction.
{"title":"Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks.","authors":"Zhicheng Cai, Hao Zhu, Qiu Shen, Xinran Wang, Xun Cao","doi":"10.1109/TPAMI.2026.3667002","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667002","url":null,"abstract":"<p><p>Representing signals using coordinate networks dominates the area of inverse problems recently, and is widely applied in various scientific computing tasks. Still, there exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components. This problem is caused by the pathological distribution of the neural tangent kernel's (NTK's) eigenvalues of coordinate networks. We find that, this pathological distribution could be improved using classical normalization techniques (batch normalization and layer normalization), which are commonly used in convolutional neural networks but rarely used in coordinate networks. We prove that normalization techniques greatly reduces the maximum and variance of NTK's eigenvalues while slightly modifies the mean value, considering the max eigenvalue is much larger than the most, this variance change results in a shift of eigenvalues' distribution from a lower one to a higher one, therefore the spectral bias could be alleviated (see Fig. 1). Furthermore, we propose two new normalization techniques by combining these two techniques in different ways. The efficacy of these normalization techniques is substantiated by the significant improvements and new state-of-the-arts achieved by applying normalization-based coordinate networks to various tasks, including the image compression, computed tomography reconstruction, shape representation, magnetic resonance imaging, novel view synthesis and multi-view stereo reconstruction.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3667072
Yajiao Xiong, Xiaoyu Zhou, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
We present DrivingGaussian++, an efficient and effective framework for realistic reconstruction and controllable editing of surrounding dynamic autonomous driving scenes. DrivingGaussian++ models the static background with incremental 3D Gaussians and reconstructs moving objects with a composite dynamic Gaussian graph, ensuring accurate positions and occlusions. By integrating a LiDAR prior, it achieves detailed and consistent scene reconstruction, outperforming existing methods in dynamic scene reconstruction and photorealistic surround-view synthesis. DrivingGaussian++ supports training-free controllable editing for dynamic driving scenes, including texture modification, weather simulation, and object manipulation, leveraging multi-view images and depth priors. By integrating large language models (LLMs) and controllable editing, our method can automatically generate dynamic object motion trajectories and enhance their realism during the optimization process. DrivingGaussian++ demonstrates consistent and realistic editing results and generates dynamic multi-view driving scenarios, while significantly enhancing scene diversity.
{"title":"DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes.","authors":"Yajiao Xiong, Xiaoyu Zhou, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang","doi":"10.1109/TPAMI.2026.3667072","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667072","url":null,"abstract":"<p><p>We present DrivingGaussian++, an efficient and effective framework for realistic reconstruction and controllable editing of surrounding dynamic autonomous driving scenes. DrivingGaussian++ models the static background with incremental 3D Gaussians and reconstructs moving objects with a composite dynamic Gaussian graph, ensuring accurate positions and occlusions. By integrating a LiDAR prior, it achieves detailed and consistent scene reconstruction, outperforming existing methods in dynamic scene reconstruction and photorealistic surround-view synthesis. DrivingGaussian++ supports training-free controllable editing for dynamic driving scenes, including texture modification, weather simulation, and object manipulation, leveraging multi-view images and depth priors. By integrating large language models (LLMs) and controllable editing, our method can automatically generate dynamic object motion trajectories and enhance their realism during the optimization process. DrivingGaussian++ demonstrates consistent and realistic editing results and generates dynamic multi-view driving scenarios, while significantly enhancing scene diversity.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3667180
Ruiyang Xia, Dawei Zhou, Lin Yuan, Jie Li, Nannan Wang, Xinbo Gao
The rapid development of generative AI techniques enables the synthesis of highly realistic facial images, posing significant challenges for the accurate detection of face forgeries. In contrast to solely elevating detector awareness, proactively reducing the intrinsic difficulty of forgery detection can streamline detector complexity while improving both generalization and robustness. This insight motivates our defense strategy to make face forgery clues more evident. Specifically, a novel proactive approach dubbed Self-Steganographic Detection (SSD) is proposed to imperceptibly embed facial images into themselves as a form of detection evidence. The recovery process is designed to remain robust under normal manipulations while exhibiting deliberate degradation under malicious manipulations, thereby clearly revealing potential forgeries. Unlike embedding bit-level vectors, pixel-level images are informative to ensure the generalization of our approach. Due to the similarity between the protected and embedded images, SSD performs detection without storing any embedded information in advance. To support practical deployment, our approach incorporates a dual detection scheme that aims to identify unprotected images and determine the authenticity of protected images. Extensive experiments using 8 face forgery techniques demonstrate the effectiveness of our approach compared to state-of-the-art methods.
{"title":"SSD: Making Face Forgery Clues Evident Again With Self-Steganographic Detection.","authors":"Ruiyang Xia, Dawei Zhou, Lin Yuan, Jie Li, Nannan Wang, Xinbo Gao","doi":"10.1109/TPAMI.2026.3667180","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667180","url":null,"abstract":"<p><p>The rapid development of generative AI techniques enables the synthesis of highly realistic facial images, posing significant challenges for the accurate detection of face forgeries. In contrast to solely elevating detector awareness, proactively reducing the intrinsic difficulty of forgery detection can streamline detector complexity while improving both generalization and robustness. This insight motivates our defense strategy to make face forgery clues more evident. Specifically, a novel proactive approach dubbed Self-Steganographic Detection (SSD) is proposed to imperceptibly embed facial images into themselves as a form of detection evidence. The recovery process is designed to remain robust under normal manipulations while exhibiting deliberate degradation under malicious manipulations, thereby clearly revealing potential forgeries. Unlike embedding bit-level vectors, pixel-level images are informative to ensure the generalization of our approach. Due to the similarity between the protected and embedded images, SSD performs detection without storing any embedded information in advance. To support practical deployment, our approach incorporates a dual detection scheme that aims to identify unprotected images and determine the authenticity of protected images. Extensive experiments using 8 face forgery techniques demonstrate the effectiveness of our approach compared to state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep $t$ ("time indexing"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly together with predicting the frames, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed "distance indexing". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. We further observed that, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames (i.e., halfway in-between), due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly sharper outputs and superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing without requiring extra computation. Furthermore, we demonstrate that if additional latency is acceptable, a continuous map estimator can be employed to compute a pixel-wise dense distance indexing using multiple nearby frames. Combined with efficient multi-frame refinement, this extension can further disambiguate complex motion, thus enhancing performance both qualitatively and quantitatively. Additionally, the ability to manually specify distance indexing allows for independent temporal manipulation of each object, providing a novel tool for video editing tasks such as re-timing. The code is available at https://zzh-tech.github.io/InterpAny-Clearer/.
{"title":"Velocity Disambiguation for Video Frame Interpolation.","authors":"Zhihang Zhong, Yiming Zhang, Wei Wang, Xiao Sun, Yu Qiao, Gurunandan Krishnan, Sizhuo Ma, Jian Wang","doi":"10.1109/TPAMI.2026.3667437","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667437","url":null,"abstract":"<p><p>Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep $t$ (\"time indexing\"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly together with predicting the frames, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed \"distance indexing\". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. We further observed that, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames (i.e., halfway in-between), due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly sharper outputs and superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing without requiring extra computation. Furthermore, we demonstrate that if additional latency is acceptable, a continuous map estimator can be employed to compute a pixel-wise dense distance indexing using multiple nearby frames. Combined with efficient multi-frame refinement, this extension can further disambiguate complex motion, thus enhancing performance both qualitatively and quantitatively. Additionally, the ability to manually specify distance indexing allows for independent temporal manipulation of each object, providing a novel tool for video editing tasks such as re-timing. The code is available at https://zzh-tech.github.io/InterpAny-Clearer/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3667397
Shiyu Chen, Cencheng Shen, Youngser Park, Carey E Priebe
Graph neural networks (GNNs) have emerged as a powerful framework for a wide range of node-level graph learning tasks. However, their performance typically depends on random or minimally informed initial feature representations, where poor initialization can lead to slower convergence and increased training instability. In this paper, we address this limitation by leveraging a statistically grounded one-hot graph encoder embedding (GEE) as a high-quality, structure-aware initialization for node features. Integrating GEE into standard GNNs yields the GEE-powered GNN (GG) framework. Across extensive simulations and real-world benchmarks, GG provides consistent and substantial performance gains in both unsupervised and supervised settings. For node classification, we further introduce GG-C, which concatenates the outputs of GG and GEE and outperforms competing methods, achieving roughly 10-50% accuracy improvements across most datasets. These results demonstrate the importance of principled, structure-aware initialization for improving the efficiency, stability, and overall performance of graph neural network architecture, enabling models to better exploit graph topology from the outset.
{"title":"Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning.","authors":"Shiyu Chen, Cencheng Shen, Youngser Park, Carey E Priebe","doi":"10.1109/TPAMI.2026.3667397","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667397","url":null,"abstract":"<p><p>Graph neural networks (GNNs) have emerged as a powerful framework for a wide range of node-level graph learning tasks. However, their performance typically depends on random or minimally informed initial feature representations, where poor initialization can lead to slower convergence and increased training instability. In this paper, we address this limitation by leveraging a statistically grounded one-hot graph encoder embedding (GEE) as a high-quality, structure-aware initialization for node features. Integrating GEE into standard GNNs yields the GEE-powered GNN (GG) framework. Across extensive simulations and real-world benchmarks, GG provides consistent and substantial performance gains in both unsupervised and supervised settings. For node classification, we further introduce GG-C, which concatenates the outputs of GG and GEE and outperforms competing methods, achieving roughly 10-50% accuracy improvements across most datasets. These results demonstrate the importance of principled, structure-aware initialization for improving the efficiency, stability, and overall performance of graph neural network architecture, enabling models to better exploit graph topology from the outset.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3660046
Kutalmls Coskun, Borahan Tumer, Bjarne C Hiller, Martin Becker
Markov chains are simple yet powerful mathematical structures to model temporally dependent processes. They generally assume stationary data, i.e., fixed transition probabilities between observations/states. However, live, real-world processes, like in the context of activity tracking, biological time series, or industrial monitoring, often switch behavior over time. Such behavior switches can be modeled as transitions between higher-level modes (e.g., running, walking, etc.). Yet all modes are usually not previously known, often exhibit vastly differing transition probabilities, and can switch unpredictably. Thus, to track behavior changes of live, real-world processes, this study proposes an online and efficient method to construct Evolving Markov chains (EMCs). EMCs adaptively track transition probabilities, automatically discover modes, and detect mode switches in an online manner. In contrast to previous work, EMCs are of arbitrary order, the proposed update scheme does not rely on tracking windows, only updates the relevant region of the probability tensor, and enjoys geometric convergence of the expected estimates. Our evaluation of synthetic data and real-world applications on human activity recognition, electric motor condition monitoring, and eye-state recognition from electroencephalography (EEG) measurements illustrates the versatility of the approach and points to the potential of EMCs to efficiently track, model, and understand live, real-world processes.
{"title":"Evolving Markov Chains: Online Mode Discovery and Recognition from Data Streams.","authors":"Kutalmls Coskun, Borahan Tumer, Bjarne C Hiller, Martin Becker","doi":"10.1109/TPAMI.2026.3660046","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3660046","url":null,"abstract":"<p><p>Markov chains are simple yet powerful mathematical structures to model temporally dependent processes. They generally assume stationary data, i.e., fixed transition probabilities between observations/states. However, live, real-world processes, like in the context of activity tracking, biological time series, or industrial monitoring, often switch behavior over time. Such behavior switches can be modeled as transitions between higher-level modes (e.g., running, walking, etc.). Yet all modes are usually not previously known, often exhibit vastly differing transition probabilities, and can switch unpredictably. Thus, to track behavior changes of live, real-world processes, this study proposes an online and efficient method to construct Evolving Markov chains (EMCs). EMCs adaptively track transition probabilities, automatically discover modes, and detect mode switches in an online manner. In contrast to previous work, EMCs are of arbitrary order, the proposed update scheme does not rely on tracking windows, only updates the relevant region of the probability tensor, and enjoys geometric convergence of the expected estimates. Our evaluation of synthetic data and real-world applications on human activity recognition, electric motor condition monitoring, and eye-state recognition from electroencephalography (EEG) measurements illustrates the versatility of the approach and points to the potential of EMCs to efficiently track, model, and understand live, real-world processes.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TPAMI.2026.3667409
Qingyuan Zheng, Yue Liu, Yangbo He
Causality plays a pivotal role in various fields of study. Based on the framework of causal graphical models, previous works have proposed identifying whether a variable is a cause or non-cause of another variable in every Markov equivalent graph by learning only the local structure. However, the presence of prior knowledge, often represented as a partially known causal graph, is common in many causal modeling applications. Leveraging this prior knowledge enables further identification of causal relations. In this paper, we first propose a method for learning the local structure by incorporating several types of causal background knowledge, including direct causal, non-ancestral, and ancestral information. Then we introduce sufficient and necessary conditions for identifying causal relations based solely on the local structure in the presence of prior knowledge. The effectiveness and efficiency of our method are demonstrated through experiments on local structure learning, causal relation identification, and its application to fair machine learning.
{"title":"Local Causal Discovery with Background Knowledge.","authors":"Qingyuan Zheng, Yue Liu, Yangbo He","doi":"10.1109/TPAMI.2026.3667409","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3667409","url":null,"abstract":"<p><p>Causality plays a pivotal role in various fields of study. Based on the framework of causal graphical models, previous works have proposed identifying whether a variable is a cause or non-cause of another variable in every Markov equivalent graph by learning only the local structure. However, the presence of prior knowledge, often represented as a partially known causal graph, is common in many causal modeling applications. Leveraging this prior knowledge enables further identification of causal relations. In this paper, we first propose a method for learning the local structure by incorporating several types of causal background knowledge, including direct causal, non-ancestral, and ancestral information. Then we introduce sufficient and necessary conditions for identifying causal relations based solely on the local structure in the presence of prior knowledge. The effectiveness and efficiency of our method are demonstrated through experiments on local structure learning, causal relation identification, and its application to fair machine learning.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that biasing backpropagation to favor gradients from skip connections-while suppressing those from residual modules via a decay factor-allows one to craft adversarial examples with high transferability. Based on this insight, we propose the Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs), models with varying-length paths, and other domains such as natural language processing. We conduct comprehensive transfer-based attacks against diverse model families, including ResNets, Transformers, Inceptions, Neural Architecture Search-based models, and Large Language Models (LLMs). The results demonstrate that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, we demonstrate that SGM can still be effective under more challenging settings such as ensemble-based attacks, targeted attacks, and against defense equipped models. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design.
{"title":"On the Adversarial Transferability of Generalized \"Skip Connections\".","authors":"Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin","doi":"10.1109/TPAMI.2026.3666165","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3666165","url":null,"abstract":"<p><p>Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that biasing backpropagation to favor gradients from skip connections-while suppressing those from residual modules via a decay factor-allows one to craft adversarial examples with high transferability. Based on this insight, we propose the Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs), models with varying-length paths, and other domains such as natural language processing. We conduct comprehensive transfer-based attacks against diverse model families, including ResNets, Transformers, Inceptions, Neural Architecture Search-based models, and Large Language Models (LLMs). The results demonstrate that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, we demonstrate that SGM can still be effective under more challenging settings such as ensemble-based attacks, targeted attacks, and against defense equipped models. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}