Pub Date : 2026-01-29DOI: 10.1109/TPAMI.2026.3659041
HanQin Cai, Chandra Kundu, Jialin Liu, Wotao Yin
Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fixed-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.
{"title":"Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery.","authors":"HanQin Cai, Chandra Kundu, Jialin Liu, Wotao Yin","doi":"10.1109/TPAMI.2026.3659041","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3659041","url":null,"abstract":"<p><p>Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fixed-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1109/TPAMI.2026.3659164
Yujia Liu, Dingquan Li, Zhixuan Li, Tiejun Huang
No-Reference Image Quality Assessment (NR-IQA) models play an important role in various real-world applications. Recently, adversarial attacks against NR-IQA models have attracted increasing attention, as they provide valuable insights for revealing model vulnerabilities and guiding robust system design. Some effective attacks have been proposed against NR-IQA models in white-box settings, where the attacker has full access to the target model. However, these attacks often suffer from poor transferability to unknown target models in more realistic black-box scenarios, where the target model is inaccessible. This work makes the first attempt to address the challenge of low transferability in attacking NR-IQA models by proposing a transferable Signed Ensemble Gaussian black-box Attack (SEGA). The main idea is to approximate the gradient of the target model by applying Gaussian smoothing to source models and ensembling their smoothed gradients. To ensure the imperceptibility of adversarial perturbations, SEGA further removes inappropriate perturbations using a specially designed perturbation filter mask. Experimental results demonstrate the superior transferability of SEGA, validating its effectiveness in enabling successful transfer-based black-box attacks against NR-IQA models. Code for this paper is available at https://github.com/YogaLYJ/SEGA_IQA.
{"title":"SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack Against No-Reference Image Quality Assessment Models.","authors":"Yujia Liu, Dingquan Li, Zhixuan Li, Tiejun Huang","doi":"10.1109/TPAMI.2026.3659164","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3659164","url":null,"abstract":"<p><p>No-Reference Image Quality Assessment (NR-IQA) models play an important role in various real-world applications. Recently, adversarial attacks against NR-IQA models have attracted increasing attention, as they provide valuable insights for revealing model vulnerabilities and guiding robust system design. Some effective attacks have been proposed against NR-IQA models in white-box settings, where the attacker has full access to the target model. However, these attacks often suffer from poor transferability to unknown target models in more realistic black-box scenarios, where the target model is inaccessible. This work makes the first attempt to address the challenge of low transferability in attacking NR-IQA models by proposing a transferable Signed Ensemble Gaussian black-box Attack (SEGA). The main idea is to approximate the gradient of the target model by applying Gaussian smoothing to source models and ensembling their smoothed gradients. To ensure the imperceptibility of adversarial perturbations, SEGA further removes inappropriate perturbations using a specially designed perturbation filter mask. Experimental results demonstrate the superior transferability of SEGA, validating its effectiveness in enabling successful transfer-based black-box attacks against NR-IQA models. Code for this paper is available at https://github.com/YogaLYJ/SEGA_IQA.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1109/TPAMI.2026.3659110
Bochao Liu, Shiming Ge, Pengju Wang, Shikun Li, Tongliang Liu
While many deep learning models trained on private datasets have been deployed in various practical tasks, they may pose a privacy leakage risk as attackers could recover informative data or label knowledge from models. In this work, we present privacy-preserving model transcription, a data-free model-to-model conversion solution to facilitate model deployment with a privacy guarantee. To this end, we propose a cooperative-competitive learning approach termed differentially private synthetic distillation that learns to convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via a trainable generator without access to private data. The learning collaborates with three players in a unified framework and performs alternate optimization: i) the generator is learned to generate synthetic data, ii) the teacher and student accept the synthetic data and compute differential private labels by flexible data or label noisy perturbation, and iii) the student is updated with noisy labels and the generator is updated by taking the student as a discriminator for adversarial training. We theoretically prove that our approach can guarantee differential privacy and convergence. The transcribed student has good performance and privacy protection, while the resulting generator can generate private synthetic data for downstream tasks. Extensive experiments clearly demonstrate that our approach outperforms 26 state-of-the-arts.
{"title":"Privacy-Preserving Model Transcription With Differentially Private Synthetic Distillation.","authors":"Bochao Liu, Shiming Ge, Pengju Wang, Shikun Li, Tongliang Liu","doi":"10.1109/TPAMI.2026.3659110","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3659110","url":null,"abstract":"<p><p>While many deep learning models trained on private datasets have been deployed in various practical tasks, they may pose a privacy leakage risk as attackers could recover informative data or label knowledge from models. In this work, we present privacy-preserving model transcription, a data-free model-to-model conversion solution to facilitate model deployment with a privacy guarantee. To this end, we propose a cooperative-competitive learning approach termed differentially private synthetic distillation that learns to convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via a trainable generator without access to private data. The learning collaborates with three players in a unified framework and performs alternate optimization: i) the generator is learned to generate synthetic data, ii) the teacher and student accept the synthetic data and compute differential private labels by flexible data or label noisy perturbation, and iii) the student is updated with noisy labels and the generator is updated by taking the student as a discriminator for adversarial training. We theoretically prove that our approach can guarantee differential privacy and convergence. The transcribed student has good performance and privacy protection, while the resulting generator can generate private synthetic data for downstream tasks. Extensive experiments clearly demonstrate that our approach outperforms 26 state-of-the-arts.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adversarial imitation learning (AIL), a prominent approach in imitation learning, has achieved significant practical success powered by neural network approximation. However, existing theoretical analyses of AIL are primarily confined to simplified settings-such as tabular and linear function approximation-and involve complex algorithmic designs that impede practical implementation. This creates a substantial gap between theory and practice. This paper bridges this gap by exploring the theoretical underpinnings of online AIL with general function approximation. We introduce a novel framework called optimization-based AIL (OPT-AIL), which performs online optimization for reward learning coupled with optimism-regularized optimization for policy learning. Within this framework, we develop two concrete methods: model-free OPT-AIL and model-based OPT-AIL. Our theoretical analysis demonstrates that both variants achieve polynomial expert sample complexity and interaction complexity for learning near-expert policies. To the best of our knowledge, they represent the first provably efficient AIL methods under general function approximation. From a practical standpoint, OPT-AIL requires only the approximate optimization of two objectives, thereby facilitating practical implementation. Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods across several challenging tasks.
{"title":"Adversarial Imitation Learning with General Function Approximation: Theoretical Analysis and Practical Algorithms.","authors":"Tian Xu, Zhilong Zhang, Zexuan Chen, Ruishuo Chen, Yihao Sun, Yang Yu","doi":"10.1109/TPAMI.2026.3657578","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3657578","url":null,"abstract":"<p><p>Adversarial imitation learning (AIL), a prominent approach in imitation learning, has achieved significant practical success powered by neural network approximation. However, existing theoretical analyses of AIL are primarily confined to simplified settings-such as tabular and linear function approximation-and involve complex algorithmic designs that impede practical implementation. This creates a substantial gap between theory and practice. This paper bridges this gap by exploring the theoretical underpinnings of online AIL with general function approximation. We introduce a novel framework called optimization-based AIL (OPT-AIL), which performs online optimization for reward learning coupled with optimism-regularized optimization for policy learning. Within this framework, we develop two concrete methods: model-free OPT-AIL and model-based OPT-AIL. Our theoretical analysis demonstrates that both variants achieve polynomial expert sample complexity and interaction complexity for learning near-expert policies. To the best of our knowledge, they represent the first provably efficient AIL methods under general function approximation. From a practical standpoint, OPT-AIL requires only the approximate optimization of two objectives, thereby facilitating practical implementation. Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods across several challenging tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1109/TPAMI.2026.3657354
Lingling Xu, Haoran Xie, S Joe Qin, Xiaohui Tao, Fu Lee Wang
With the continuous growth in the number of parameters of the Transformer-based pretrained language models (PLMs), particularly the emergence of large language models (LLMs) with billions of parameters, many natural language processing (NLP) tasks have demonstrated remarkable success. However, the enormous size and computational demands of these models pose significant challenges for adapting them to specific downstream tasks, especially in environments with limited computational resources. Parameter-Efficient Fine-Tuning (PEFT) offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning. The demands for fine-tuning PLMs, especially LLMs, have led to a surge in the development of PEFT methods, as depicted in Fig. 1. In this paper, we present a comprehensive and systematic review of PEFT methods for PLMs. We summarize these PEFT methods, discuss their applications, and outline future directions. Furthermore, extensive experiments are conducted using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency. By offering insights into the latest advancements and practical applications, this survey serves as an invaluable resource for researchers and practitioners seeking to navigate the challenges and opportunities presented by PEFT in the context of PLMs.
{"title":"Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment.","authors":"Lingling Xu, Haoran Xie, S Joe Qin, Xiaohui Tao, Fu Lee Wang","doi":"10.1109/TPAMI.2026.3657354","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3657354","url":null,"abstract":"<p><p>With the continuous growth in the number of parameters of the Transformer-based pretrained language models (PLMs), particularly the emergence of large language models (LLMs) with billions of parameters, many natural language processing (NLP) tasks have demonstrated remarkable success. However, the enormous size and computational demands of these models pose significant challenges for adapting them to specific downstream tasks, especially in environments with limited computational resources. Parameter-Efficient Fine-Tuning (PEFT) offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning. The demands for fine-tuning PLMs, especially LLMs, have led to a surge in the development of PEFT methods, as depicted in Fig. 1. In this paper, we present a comprehensive and systematic review of PEFT methods for PLMs. We summarize these PEFT methods, discuss their applications, and outline future directions. Furthermore, extensive experiments are conducted using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency. By offering insights into the latest advancements and practical applications, this survey serves as an invaluable resource for researchers and practitioners seeking to navigate the challenges and opportunities presented by PEFT in the context of PLMs.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our approach introduces enhancement across several dimensions: By adopting Shifted Window Attention layer, we achieve cross-window connectivity at higher input resolutions and stabilize early training; We hypothesize that images may contain redundant tokens, and by using similarity to filter out significant tokens, we can not only streamline the token length but also enhance the model's performance. Moreover, by expanding our model's capabilities to encompass text spotting and grounding, and incorporating positional information into responses, we enhance interpretability. Evaluation on 12 benchmarks shows notable improvements: 5.2% in Scene Text-Centric tasks (including STVQA, TextVQA, and OCRVQA), 6.9% in Document-Oriented tasks (such as DocVQA, InfoVQA, ChartVQA, DeepForm, Kleister Charity, and WikiTableQuestions), and 2.8% in Key Information Extraction tasks (comprising FUNSD, SROIE, and POIE). It outperforms in scene text spotting with a 10.9% increase and sets a new standard on OCRBench, a comprehensive benchmark consisting of 29 OCR-related assessments, with a score of 561, surpassing previous open-sourced large multimodal models for document understanding. Code is released at https://github.com/Yuliang-Liu/Monkey.
{"title":"TextMonkey: an OCR-Free Large Multimodal Model for Understanding Document.","authors":"Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai","doi":"10.1109/TPAMI.2026.3653415","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3653415","url":null,"abstract":"<p><p>We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our approach introduces enhancement across several dimensions: By adopting Shifted Window Attention layer, we achieve cross-window connectivity at higher input resolutions and stabilize early training; We hypothesize that images may contain redundant tokens, and by using similarity to filter out significant tokens, we can not only streamline the token length but also enhance the model's performance. Moreover, by expanding our model's capabilities to encompass text spotting and grounding, and incorporating positional information into responses, we enhance interpretability. Evaluation on 12 benchmarks shows notable improvements: 5.2% in Scene Text-Centric tasks (including STVQA, TextVQA, and OCRVQA), 6.9% in Document-Oriented tasks (such as DocVQA, InfoVQA, ChartVQA, DeepForm, Kleister Charity, and WikiTableQuestions), and 2.8% in Key Information Extraction tasks (comprising FUNSD, SROIE, and POIE). It outperforms in scene text spotting with a 10.9% increase and sets a new standard on OCRBench, a comprehensive benchmark consisting of 29 OCR-related assessments, with a score of 561, surpassing previous open-sourced large multimodal models for document understanding. Code is released at https://github.com/Yuliang-Liu/Monkey.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1109/TPAMI.2026.3657778
Haitao Zeng, Xinhang Song, Shuqiang Jiang
Multi-object navigation (MON) tasks involve sequentially locating multiple targets in an unknown environment, requiring global long-term planning under incomplete information. This necessitates that the agent dynamically balance immediate actions and long-term rewards while considering both local adaptability and global foresight. However, current methods overly focus on local path optimization, which leads to slower convergence in sparse reward settings and increases the risk of deadlocks or trap states. The core challenge of MON lies in the deformation of the shared decision space, where independent optimization leads to redundant and overlapping paths. Thus, path planning requires dynamic, cross-task optimization rather than simple subtask aggregation. To minimize overall effort, the optimization process should adaptively balance task contributions through weight adjustment. Thus, we propose the Goal-oriented Dynamic Weight Optimization (GDWO) algorithm. GDWO integrates target-specific value loss functions into a unified optimization framework and dynamically adjusts weights through gradient-based updates. To prevent over-optimization, weights are normalized during training according to navigation success rates, prioritizing more challenging targets. This adaptive mechanism effectively addresses the challenge of sparse rewards and improves convergence efficiency. By leveraging this mechanism, GDWO unifies multiple objectives within a unified decision space, achieving efficient optimization and balancing short-term gains with long-term goals. Additionally, we introduce two auxiliary modules: prior knowledge-based navigation and frontier-aware exploration to further enhance GDWO's performance. Experimental results on the Gibson and Matterport3D datasets demonstrate that GDWO achieves improvements in key metrics for MON tasks. It optimizes path planning, reduces exploration costs, and enhances navigation efficiency, enabling the agent to perform tasks more effectively in complex environments.
{"title":"Goal-oriented Dynamic Weight Optimization for Multi-Object Navigation.","authors":"Haitao Zeng, Xinhang Song, Shuqiang Jiang","doi":"10.1109/TPAMI.2026.3657778","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3657778","url":null,"abstract":"<p><p>Multi-object navigation (MON) tasks involve sequentially locating multiple targets in an unknown environment, requiring global long-term planning under incomplete information. This necessitates that the agent dynamically balance immediate actions and long-term rewards while considering both local adaptability and global foresight. However, current methods overly focus on local path optimization, which leads to slower convergence in sparse reward settings and increases the risk of deadlocks or trap states. The core challenge of MON lies in the deformation of the shared decision space, where independent optimization leads to redundant and overlapping paths. Thus, path planning requires dynamic, cross-task optimization rather than simple subtask aggregation. To minimize overall effort, the optimization process should adaptively balance task contributions through weight adjustment. Thus, we propose the Goal-oriented Dynamic Weight Optimization (GDWO) algorithm. GDWO integrates target-specific value loss functions into a unified optimization framework and dynamically adjusts weights through gradient-based updates. To prevent over-optimization, weights are normalized during training according to navigation success rates, prioritizing more challenging targets. This adaptive mechanism effectively addresses the challenge of sparse rewards and improves convergence efficiency. By leveraging this mechanism, GDWO unifies multiple objectives within a unified decision space, achieving efficient optimization and balancing short-term gains with long-term goals. Additionally, we introduce two auxiliary modules: prior knowledge-based navigation and frontier-aware exploration to further enhance GDWO's performance. Experimental results on the Gibson and Matterport3D datasets demonstrate that GDWO achieves improvements in key metrics for MON tasks. It optimizes path planning, reduces exploration costs, and enhances navigation efficiency, enabling the agent to perform tasks more effectively in complex environments.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1109/TPAMI.2026.3656947
Philip Naumann, Jacob Kauffmann, Gregoire Montavon
Wasserstein distances provide a powerful framework for comparing data distributions. They can be used to analyze processes over time or to detect inhomogeneities within data. However, simply calculating the Wasserstein distance or analyzing the corresponding transport plan (or coupling) may not be sufficient for understanding what factors contribute to a high or low Wasserstein distance. In this work, we propose a novel solution based on Explainable AI that allows us to efficiently and accurately attribute Wasserstein distances to various data components, including data subgroups, input features, or interpretable subspaces. Our method achieves high accuracy across diverse datasets and Wasserstein distance specifications, and its practical utility is demonstrated in three use cases.
{"title":"Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena.","authors":"Philip Naumann, Jacob Kauffmann, Gregoire Montavon","doi":"10.1109/TPAMI.2026.3656947","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3656947","url":null,"abstract":"<p><p>Wasserstein distances provide a powerful framework for comparing data distributions. They can be used to analyze processes over time or to detect inhomogeneities within data. However, simply calculating the Wasserstein distance or analyzing the corresponding transport plan (or coupling) may not be sufficient for understanding what factors contribute to a high or low Wasserstein distance. In this work, we propose a novel solution based on Explainable AI that allows us to efficiently and accurately attribute Wasserstein distances to various data components, including data subgroups, input features, or interpretable subspaces. Our method achieves high accuracy across diverse datasets and Wasserstein distance specifications, and its practical utility is demonstrated in three use cases.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1109/TPAMI.2026.3656670
Fan Shi, Bin Li, Xiangyang Xue
The abstract visual reasoning ability in human intelligence benefits discovering underlying rules in the novel environment. Raven's Progressive Matrix (RPM) is a classic test to realize such ability in machine intelligence by selecting from candidates. Recent studies suggest that solving RPM in an answer-generation way boosts a more in-depth understanding of rules. However, existing generative solvers cannot discover the global concept-changing rules without auxiliary supervision (e.g., rule annotations and distractors in candidate sets). To this end, we propose a deep latent variable model for Concept-changing Rule ABstraction (CRAB) by learning interpretable concepts and parsing concept-changing rules in the latent space. With the iterative learning process, CRAB can automatically abstract global rules shared on the dataset on each concept and form the learnable prior knowledge of global rules. CRAB outperforms the baselines trained without auxiliary supervision in the arbitrary-position answer generation task and achieves comparable and even higher accuracy than the compared models trained with auxiliary supervision. Finally, we conduct experiments to illustrate the interpretability of CRAB in concept learning, answer selection, and global rule abstraction.
{"title":"Abstracting Concept-Changing Rules for Solving Raven's Progressive Matrix Problems.","authors":"Fan Shi, Bin Li, Xiangyang Xue","doi":"10.1109/TPAMI.2026.3656670","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3656670","url":null,"abstract":"<p><p>The abstract visual reasoning ability in human intelligence benefits discovering underlying rules in the novel environment. Raven's Progressive Matrix (RPM) is a classic test to realize such ability in machine intelligence by selecting from candidates. Recent studies suggest that solving RPM in an answer-generation way boosts a more in-depth understanding of rules. However, existing generative solvers cannot discover the global concept-changing rules without auxiliary supervision (e.g., rule annotations and distractors in candidate sets). To this end, we propose a deep latent variable model for Concept-changing Rule ABstraction (CRAB) by learning interpretable concepts and parsing concept-changing rules in the latent space. With the iterative learning process, CRAB can automatically abstract global rules shared on the dataset on each concept and form the learnable prior knowledge of global rules. CRAB outperforms the baselines trained without auxiliary supervision in the arbitrary-position answer generation task and achieves comparable and even higher accuracy than the compared models trained with auxiliary supervision. Finally, we conduct experiments to illustrate the interpretability of CRAB in concept learning, answer selection, and global rule abstraction.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1109/TPAMI.2026.3656742
Rongxuan Peng, Shunquan Tan, Xianbo Mo, Alex C Kot, Jiwu Huang
Recent advances in deep learning have significantly propelled the development of image forgery localization. However, existing models remain highly vulnerable to adversarial attacks: imperceptible noise added to forged images can severely mislead these models. In this paper, we address this challenge with an Adversarial Noise Suppression Module (ANSM) that generates a defensive perturbation to suppress the attack effect of adversarial noise. We observe that forgery-relevant features extracted from adversarial and original forged images exhibit distinct distributions. To bridge this gap, we introduce Forgery-relevant Features Alignment (FFA) as a first-stage training strategy, which reduces distributional discrepancies by minimizing the channel-wise Kullback-Leibler divergence between these features. To further refine the defensive perturbation, we design a second-stage training strategy, termed Mask-guided Refinement (MgR), which incorporates a dual-mask constraint. MgR ensures that the defensive perturbation remains effective for both adversarial and original forged images, recovering forgery localization accuracy to their original level. Extensive experiments across various attack algorithms demonstrate that our method significantly restores the forgery localization model's performance on adversarial images. Notably, when ANSM is applied to original forged images, the performance remains nearly unaffected. To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks. We have released the source code and anti-forensics dataset.
{"title":"Active Adversarial Noise Suppression for Image Forgery Localization.","authors":"Rongxuan Peng, Shunquan Tan, Xianbo Mo, Alex C Kot, Jiwu Huang","doi":"10.1109/TPAMI.2026.3656742","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3656742","url":null,"abstract":"<p><p>Recent advances in deep learning have significantly propelled the development of image forgery localization. However, existing models remain highly vulnerable to adversarial attacks: imperceptible noise added to forged images can severely mislead these models. In this paper, we address this challenge with an Adversarial Noise Suppression Module (ANSM) that generates a defensive perturbation to suppress the attack effect of adversarial noise. We observe that forgery-relevant features extracted from adversarial and original forged images exhibit distinct distributions. To bridge this gap, we introduce Forgery-relevant Features Alignment (FFA) as a first-stage training strategy, which reduces distributional discrepancies by minimizing the channel-wise Kullback-Leibler divergence between these features. To further refine the defensive perturbation, we design a second-stage training strategy, termed Mask-guided Refinement (MgR), which incorporates a dual-mask constraint. MgR ensures that the defensive perturbation remains effective for both adversarial and original forged images, recovering forgery localization accuracy to their original level. Extensive experiments across various attack algorithms demonstrate that our method significantly restores the forgery localization model's performance on adversarial images. Notably, when ANSM is applied to original forged images, the performance remains nearly unaffected. To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks. We have released the source code and anti-forensics dataset.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}