Pub Date : 2026-03-12DOI: 10.1109/tpami.2026.3673238
Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
{"title":"Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-Coupled Analysis","authors":"Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo","doi":"10.1109/tpami.2026.3673238","DOIUrl":"https://doi.org/10.1109/tpami.2026.3673238","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"96 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147439822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-12DOI: 10.1109/tpami.2026.3673525
Banglei Guan,Ji Zhao,Laurent Kneip
In recent years, affine correspondences (ACs) have emerged as widely adopted alternative to point correspondences (PCs) in geometric problems in computer vision. An AC is composed of a PC across two different views plus an affine transformation between the small patches around this PC. Prior studies have shown that a single affine correspondence (AC) generally yields three independent constraints for estimating relative pose. This work addresses relative pose estimation in multi-perspective camera systems, a relevant problem given their prevalence in modern technologies such as autonomous vehicles and augmented reality. More specifically, we introduce the first comprehensive suite of minimal solvers for 6DoF relative pose estimation across multiple cameras using only two ACs, which is notably valuable for robust model fitting scenarios. We analyze all possible configurations of two ACs in two views, and present minimal solvers covering all identified minimal cases. We make use of the hidden variable technique to eliminate the translation parameters, and represent rotation using either Cayley parameters or quaternions. We furthermore introduce novel constraints on the generalized relative pose problem that are beneficial in deriving more compact solvers with fewer solutions. Comprehensive experiments on synthetic and real-world data show that the proposed affine correspondence-based solvers are highly effective and computationally efficient.
{"title":"A Complete Solution to Generalized Relative Pose Estimation from Affine Correspondences.","authors":"Banglei Guan,Ji Zhao,Laurent Kneip","doi":"10.1109/tpami.2026.3673525","DOIUrl":"https://doi.org/10.1109/tpami.2026.3673525","url":null,"abstract":"In recent years, affine correspondences (ACs) have emerged as widely adopted alternative to point correspondences (PCs) in geometric problems in computer vision. An AC is composed of a PC across two different views plus an affine transformation between the small patches around this PC. Prior studies have shown that a single affine correspondence (AC) generally yields three independent constraints for estimating relative pose. This work addresses relative pose estimation in multi-perspective camera systems, a relevant problem given their prevalence in modern technologies such as autonomous vehicles and augmented reality. More specifically, we introduce the first comprehensive suite of minimal solvers for 6DoF relative pose estimation across multiple cameras using only two ACs, which is notably valuable for robust model fitting scenarios. We analyze all possible configurations of two ACs in two views, and present minimal solvers covering all identified minimal cases. We make use of the hidden variable technique to eliminate the translation parameters, and represent rotation using either Cayley parameters or quaternions. We furthermore introduce novel constraints on the generalized relative pose problem that are beneficial in deriving more compact solvers with fewer solutions. Comprehensive experiments on synthetic and real-world data show that the proposed affine correspondence-based solvers are highly effective and computationally efficient.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"12 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147439269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although noisy-label learning is often approached with discriminative methods for simplicity and speed, generative modeling offers a principled alternative by capturing the joint mechanism that produces features, clean labels, and corrupted observations. However, prior work typically (i) introduces extra latent variables and heavy image generators that bias training toward reconstruction, (ii) fixes a single data-generating direction (Y →X or X →Y), limiting adaptability, and (iii) assumes a uniform prior over clean labels, ignoring instance-level uncertainty. Here, we propose a single-stage, EM-style framework for generative noisy-label learning that is direction-agnostic and avoids explicit image synthesis. First, we derive a single Expectation Maximization (EM) objective whose E-step specializes to either causal orientation without changing the overall optimization objective. Second, we replace the intractable p(X | Y) with a dataset-normalized discriminative proxy computed using a discriminative classifier on the finite training set, retaining the structural benefits of generative modeling at much lower cost. Third, we introduce Partial-Label Supervision (PLS), an instance specific prior over clean labels that balances coverage and uncertainty, improving data-dependent regularization. Across standard vision and natural language processing (NLP) noisy label benchmarks, our method achieves state-of-the-art accuracy, lower transition-matrix estimation error, and substantially less training computation than current generative and discriminative baselines. Code: https://github.com/lfb-1/GNL.
{"title":"Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation.","authors":"Fengbei Liu,Chong Wang,Yuanhong Chen,Yuyuan Liu,Gustavo Carneiro","doi":"10.1109/tpami.2026.3673244","DOIUrl":"https://doi.org/10.1109/tpami.2026.3673244","url":null,"abstract":"Although noisy-label learning is often approached with discriminative methods for simplicity and speed, generative modeling offers a principled alternative by capturing the joint mechanism that produces features, clean labels, and corrupted observations. However, prior work typically (i) introduces extra latent variables and heavy image generators that bias training toward reconstruction, (ii) fixes a single data-generating direction (Y →X or X →Y), limiting adaptability, and (iii) assumes a uniform prior over clean labels, ignoring instance-level uncertainty. Here, we propose a single-stage, EM-style framework for generative noisy-label learning that is direction-agnostic and avoids explicit image synthesis. First, we derive a single Expectation Maximization (EM) objective whose E-step specializes to either causal orientation without changing the overall optimization objective. Second, we replace the intractable p(X | Y) with a dataset-normalized discriminative proxy computed using a discriminative classifier on the finite training set, retaining the structural benefits of generative modeling at much lower cost. Third, we introduce Partial-Label Supervision (PLS), an instance specific prior over clean labels that balances coverage and uncertainty, improving data-dependent regularization. Across standard vision and natural language processing (NLP) noisy label benchmarks, our method achieves state-of-the-art accuracy, lower transition-matrix estimation error, and substantially less training computation than current generative and discriminative baselines. Code: https://github.com/lfb-1/GNL.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"28 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147393776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1109/tpami.2026.3672463
Fabian Kahl,Philipp Wegner,Maximilian Kapsecker,Leon Nissen,Jennifer Faber,Stephan M Jonas,Lara Marie Reimer
In human pose estimation, a comprehensive evaluation of state-of-the-art frameworks is necessary to advance both research and practical applications. This paper presents a thorough review of state-of-the-art 2D and 3D human pose estimation frameworks, analyzing 118 papers and four GitHub repositories, with a focus on frameworks made since 2019. The following frameworks are chosen based on predefined inclusion criteria: AlphaPose, Detectron2, MediaPipe, MeTRAbs, MHFormer, MMPose, MoveNet, OpenPifPaf, OpenPifPaf-vita, OpenPose, PoseFormerV2, rtmlib, StridedTransformer-Pose3D, ultralytics (YOLOv8), ViTPose, and YOLOv7. This paper evaluates these 16 frameworks on an existing, unpublished dataset consisting of exercise videos recorded with a monocular RGB camera and synchronized gold-standard motion capture data. The dataset includes videos of nine individuals performing eight exercises, recorded from two camera views with different planar angles. The analysis evaluates joint angle performance of the frameworks using weighted mean absolute error and weighted intraclass correlation coefficient as quantitative metrics. MeTRAbs emerged as the best overall framework, while AlphaPose, rtmlib, and YOLOv7 were the top 2D performers.
{"title":"Comparative Assessment of Accuracy in Video-based Monocular Human Pose Estimation Frameworks.","authors":"Fabian Kahl,Philipp Wegner,Maximilian Kapsecker,Leon Nissen,Jennifer Faber,Stephan M Jonas,Lara Marie Reimer","doi":"10.1109/tpami.2026.3672463","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672463","url":null,"abstract":"In human pose estimation, a comprehensive evaluation of state-of-the-art frameworks is necessary to advance both research and practical applications. This paper presents a thorough review of state-of-the-art 2D and 3D human pose estimation frameworks, analyzing 118 papers and four GitHub repositories, with a focus on frameworks made since 2019. The following frameworks are chosen based on predefined inclusion criteria: AlphaPose, Detectron2, MediaPipe, MeTRAbs, MHFormer, MMPose, MoveNet, OpenPifPaf, OpenPifPaf-vita, OpenPose, PoseFormerV2, rtmlib, StridedTransformer-Pose3D, ultralytics (YOLOv8), ViTPose, and YOLOv7. This paper evaluates these 16 frameworks on an existing, unpublished dataset consisting of exercise videos recorded with a monocular RGB camera and synchronized gold-standard motion capture data. The dataset includes videos of nine individuals performing eight exercises, recorded from two camera views with different planar angles. The analysis evaluates joint angle performance of the frameworks using weighted mean absolute error and weighted intraclass correlation coefficient as quantitative metrics. MeTRAbs emerged as the best overall framework, while AlphaPose, rtmlib, and YOLOv7 were the top 2D performers.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"67 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensuring the privacy of local datasets has emerged as an important concern in decentralized learning. However, the inherent privacy-utility tradeoff remains a fundamental challenge for privacy preserving decentralized algorithms. To address this issue, we introduce Positive-Incentive Noise Generator (PING), a novel mechanism designed to eliminate negative impact of privacy noise on convergence while defending against powerful colluding inference attacks. PING leverages network topologies and lightweight encryption-decryption operations to generate correlated noise. Building upon PING, we propose PP-DPIN, a privacy preserving stochastic algorithm tailored for decentralized learning. By integrating differential privacy and differential information entropy, we provide a comprehensive privacy quantification for PP-DPIN, with at least half nodes achieving arbitrarily strong privacy guarantees. Furthermore, convergence rate of PP-DPIN is established under stochastic convex and nonconvex settings, which characterizes the impact of privacy noise and demonstrates the linear speedup relative to the network size. Experiments on computer vision tasks validate PP-DPIN's superior performance and robustness against attacks compared to state-of-the-art methods.
{"title":"Privacy Preserving Decentralized Learning with Positive-Incentive Noise.","authors":"Luqing Wang,Shaofu Yang,Yifan Wan,Wenying Xu,Min-Ling Zhang","doi":"10.1109/tpami.2026.3672569","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672569","url":null,"abstract":"Ensuring the privacy of local datasets has emerged as an important concern in decentralized learning. However, the inherent privacy-utility tradeoff remains a fundamental challenge for privacy preserving decentralized algorithms. To address this issue, we introduce Positive-Incentive Noise Generator (PING), a novel mechanism designed to eliminate negative impact of privacy noise on convergence while defending against powerful colluding inference attacks. PING leverages network topologies and lightweight encryption-decryption operations to generate correlated noise. Building upon PING, we propose PP-DPIN, a privacy preserving stochastic algorithm tailored for decentralized learning. By integrating differential privacy and differential information entropy, we provide a comprehensive privacy quantification for PP-DPIN, with at least half nodes achieving arbitrarily strong privacy guarantees. Furthermore, convergence rate of PP-DPIN is established under stochastic convex and nonconvex settings, which characterizes the impact of privacy noise and demonstrates the linear speedup relative to the network size. Experiments on computer vision tasks validate PP-DPIN's superior performance and robustness against attacks compared to state-of-the-art methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1109/tpami.2026.3672726
Yan V G Ferreira,Igor B Lima,Pedro H G Mapa S,Felipe V Campos,Antonio P Braga
Most machine learning methods assume fixed probability distributions, limiting their applicability in nonstationary real-world scenarios. While continual learning methods address this issue, current approaches often rely on black-box models or require extensive user intervention for interpretability. We propose SyMPLER (Systems Modeling through Piecewise Linear Evolving Regression), an explainable model for time series forecasting in nonstationary environments based on dynamic piecewise-linear approximations. Unlike other locally linear models, SyMPLER uses generalization bounds from Statistical Learning Theory to automatically determine when to add new local models based on prediction errors, eliminating the need for explicit clustering of the data. Experiments show that SyMPLER can achieve comparable performance to both black-box and existing explainable models while maintaining a human-interpretable structure that reveals insights about the system's behavior. In this sense, our approach conciliates accuracy and interpretability, offering a transparent and adaptive solution for forecasting nonstationary time series.
大多数机器学习方法假设固定的概率分布,限制了它们在非平稳现实场景中的适用性。虽然持续学习方法解决了这个问题,但当前的方法通常依赖于黑盒模型,或者需要大量的用户干预来实现可解释性。我们提出了SyMPLER (Systems Modeling through Piecewise Linear evolutionary Regression),这是一个基于动态分段线性近似的非平稳环境下时间序列预测的可解释模型。与其他局部线性模型不同,SyMPLER使用统计学习理论中的泛化界限来根据预测误差自动确定何时添加新的局部模型,从而消除了对数据进行显式聚类的需要。实验表明,SyMPLER可以达到与黑盒模型和现有可解释模型相当的性能,同时保持人类可解释的结构,揭示有关系统行为的见解。从这个意义上说,我们的方法调和了准确性和可解释性,为预测非平稳时间序列提供了透明和自适应的解决方案。
{"title":"Locally Linear Continual Learning for Time Series based on VC-Theoretical Generalization Bounds.","authors":"Yan V G Ferreira,Igor B Lima,Pedro H G Mapa S,Felipe V Campos,Antonio P Braga","doi":"10.1109/tpami.2026.3672726","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672726","url":null,"abstract":"Most machine learning methods assume fixed probability distributions, limiting their applicability in nonstationary real-world scenarios. While continual learning methods address this issue, current approaches often rely on black-box models or require extensive user intervention for interpretability. We propose SyMPLER (Systems Modeling through Piecewise Linear Evolving Regression), an explainable model for time series forecasting in nonstationary environments based on dynamic piecewise-linear approximations. Unlike other locally linear models, SyMPLER uses generalization bounds from Statistical Learning Theory to automatically determine when to add new local models based on prediction errors, eliminating the need for explicit clustering of the data. Experiments show that SyMPLER can achieve comparable performance to both black-box and existing explainable models while maintaining a human-interpretable structure that reveals insights about the system's behavior. In this sense, our approach conciliates accuracy and interpretability, offering a transparent and adaptive solution for forecasting nonstationary time series.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"77 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision Transformer (ViT) has shown impressive performance in image restoration due to its ability to capture a large receptive field. However, its complexity grows quadratically with input resolution, limiting its applicability for high-resolution images. In contrast, Convolutional Neural Networks (CNNs) are computationally efficient but are constrained by their inherently local receptive fields, which limit their ability to capture long-range pixel relationships. To address these challenges, we propose StarIR, which possesses the efficiency of CNNs while also capturing a large receptive field, similar to Transformers. StarIR incorporates two key innovations: 1) a dual-domain representation learning framework, with one branch processing spatial details and the other focusing on mesoscale interactions in the frequency domain; and 2) a high-dimensional feature fusion mechanism, the Star operation, which fuses information from both domains through element-wise multiplication, thereby enhancing representational capacity without increasing network width and depth. Our Star operation is followed by a channel attention unit to facilitate global feature modeling and enhance channel-wise interactions. Building on our straightforward yet powerful design principles, StarIR achieves state-of-the-art performance across 21 datasets covering six single-degradation image restoration tasks. Furthermore, our model performs favorably against leading algorithms in two all-in-one settings and demonstrates robustness on two composite-degradation datasets. In addition, StarIR extends well to several domain-specific applications, including ultra-high-definition (UHD) imaging, remote sensing, medical imaging, and underwater image enhancement.
{"title":"StarIR: Convolutional Image Restoration With Spatial-Frequency Fusion.","authors":"Yuning Cui,Syed Waqas Zamir,Ming-Hsuan Yang,Alois Knoll,Fahad Shahbaz Khan,Salman Khan","doi":"10.1109/tpami.2026.3672465","DOIUrl":"https://doi.org/10.1109/tpami.2026.3672465","url":null,"abstract":"Vision Transformer (ViT) has shown impressive performance in image restoration due to its ability to capture a large receptive field. However, its complexity grows quadratically with input resolution, limiting its applicability for high-resolution images. In contrast, Convolutional Neural Networks (CNNs) are computationally efficient but are constrained by their inherently local receptive fields, which limit their ability to capture long-range pixel relationships. To address these challenges, we propose StarIR, which possesses the efficiency of CNNs while also capturing a large receptive field, similar to Transformers. StarIR incorporates two key innovations: 1) a dual-domain representation learning framework, with one branch processing spatial details and the other focusing on mesoscale interactions in the frequency domain; and 2) a high-dimensional feature fusion mechanism, the Star operation, which fuses information from both domains through element-wise multiplication, thereby enhancing representational capacity without increasing network width and depth. Our Star operation is followed by a channel attention unit to facilitate global feature modeling and enhance channel-wise interactions. Building on our straightforward yet powerful design principles, StarIR achieves state-of-the-art performance across 21 datasets covering six single-degradation image restoration tasks. Furthermore, our model performs favorably against leading algorithms in two all-in-one settings and demonstrates robustness on two composite-degradation datasets. In addition, StarIR extends well to several domain-specific applications, including ultra-high-definition (UHD) imaging, remote sensing, medical imaging, and underwater image enhancement.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"195 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning With Partial and Noisy Correspondence in Graph Matching","authors":"Yijie Lin, Mouxing Yang, Peng Hu, Jiancheng Lv, Hao Chen, Xi Peng","doi":"10.1109/tpami.2026.3670236","DOIUrl":"https://doi.org/10.1109/tpami.2026.3670236","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"75 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147380837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}