The transferability of adversarial examples across different models has drawn considerable attention recently, particularly in targeted transferability. Prior research has empirically shown that optimizing adversarial perturbations at neighboring points with the highest loss value improves transferability. While effective, such a method requires multiple iterations to reach the local maxima and disregards the local minima of the input loss landscape. In this paper, we theoretically show that enhancing adversarial transferability is attainable by flattening the input loss landscape. This is accomplished through the perturbation optimization at both local maxima and minima. Moreover, we propose the Cost-efficient LandscapE Flattening (CLEF) attack to consider local maxima and minima around current inputs in a cost-efficient way to flatten the loss landscape and improve adversarial transferability. Specifically, we reuse the gradients of the previous attack step to assist current inputs in reaching local maxima, and employ probabilistic modeling to learn the distributional representations of perturbations that assist current inputs in reaching local minima. This probabilistic modeling can be pre-trained on dozens of images from other domains, enabling us to directly sample this type of perturbation from the pre-trained distribution when attacking. Experimental results demonstrate that integrating local maxima and minima into targeted transferable attacks can significantly flatten the loss landscape of the crafted adversarial examples, resulting in improved adversarial transferability.
{"title":"Enhancing Adversarial Transferability with Cost-efficient Landscape Flattening.","authors":"Zhipeng Wei, Jingjing Chen, Feng Han, Yue Yu, Yu-Gang Jiang","doi":"10.1109/TPAMI.2026.3664421","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3664421","url":null,"abstract":"<p><p>The transferability of adversarial examples across different models has drawn considerable attention recently, particularly in targeted transferability. Prior research has empirically shown that optimizing adversarial perturbations at neighboring points with the highest loss value improves transferability. While effective, such a method requires multiple iterations to reach the local maxima and disregards the local minima of the input loss landscape. In this paper, we theoretically show that enhancing adversarial transferability is attainable by flattening the input loss landscape. This is accomplished through the perturbation optimization at both local maxima and minima. Moreover, we propose the Cost-efficient LandscapE Flattening (CLEF) attack to consider local maxima and minima around current inputs in a cost-efficient way to flatten the loss landscape and improve adversarial transferability. Specifically, we reuse the gradients of the previous attack step to assist current inputs in reaching local maxima, and employ probabilistic modeling to learn the distributional representations of perturbations that assist current inputs in reaching local minima. This probabilistic modeling can be pre-trained on dozens of images from other domains, enabling us to directly sample this type of perturbation from the pre-trained distribution when attacking. Experimental results demonstrate that integrating local maxima and minima into targeted transferable attacks can significantly flatten the loss landscape of the crafted adversarial examples, resulting in improved adversarial transferability.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1109/TPAMI.2026.3664613
Zheng Wang, Xing Xu, Lei Zhu, Jingkuan Song, Yang Yang, Heng Tao Shen
Eliminating semantic discrepancy between different modalities is the ultimate goal of image text retrieval. However, most of the existing methods only focus on retrieval of the ground-truth instance while ignoring those semantically similar instances yet unlabeled as positives, which causes the phenomenon of one-to-many correspondence. The mainstream solutions of this research are mainly based on uncertainty learning and the exploration of one-to-many correspondence is still insufficient albeit their significant progress. Therefore, this work develops a novel Distribution-to-Points (termed D2P) matching mechanism for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance. Then each candidate instance in a mini-batch is regarded as a hypergraph node with its mean semantics while a Gaussian query is modeled as a hyperedge to capture the semantic correlations beyond the pair between candidate points and the query. Moreover, an energy-based semantic modeling framework is developed to pull all similar candidates (not only the ground truth) close to their query while pushing those dissimilar ones far away. In the end, distribution-to-points matching is learned based on the similarity measurement over the Mahalanobis distance, which considers semantic variance to perform many-to-one correspondence well. Experimental results on several widely used datasets and under various evaluation metrics confirm our superiority and effectiveness in improving the retrieval ability of the baseline including ground-truth matching and semantic multiplicity for image text retrieval.
{"title":"Distribution-to-Points Matching for Image Text Retrieval.","authors":"Zheng Wang, Xing Xu, Lei Zhu, Jingkuan Song, Yang Yang, Heng Tao Shen","doi":"10.1109/TPAMI.2026.3664613","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3664613","url":null,"abstract":"<p><p>Eliminating semantic discrepancy between different modalities is the ultimate goal of image text retrieval. However, most of the existing methods only focus on retrieval of the ground-truth instance while ignoring those semantically similar instances yet unlabeled as positives, which causes the phenomenon of one-to-many correspondence. The mainstream solutions of this research are mainly based on uncertainty learning and the exploration of one-to-many correspondence is still insufficient albeit their significant progress. Therefore, this work develops a novel Distribution-to-Points (termed D2P) matching mechanism for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance. Then each candidate instance in a mini-batch is regarded as a hypergraph node with its mean semantics while a Gaussian query is modeled as a hyperedge to capture the semantic correlations beyond the pair between candidate points and the query. Moreover, an energy-based semantic modeling framework is developed to pull all similar candidates (not only the ground truth) close to their query while pushing those dissimilar ones far away. In the end, distribution-to-points matching is learned based on the similarity measurement over the Mahalanobis distance, which considers semantic variance to perform many-to-one correspondence well. Experimental results on several widely used datasets and under various evaluation metrics confirm our superiority and effectiveness in improving the retrieval ability of the baseline including ground-truth matching and semantic multiplicity for image text retrieval.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1109/TPAMI.2026.3664488
Lingao Xiao, Yang He
Large-scale dataset distillation requires storing auxiliary soft labels that can be 30-40× (ImageNet-1K) or 200× (ImageNet-21K) larger than the condensed images, undermining the goal of dataset compression. We identify two fundamental issues necessitating such extensive labels: (1) insufficient image diversity, where high within-class similarity in synthetic images requires extensive augmentation, and (2) insufficient supervision diversity, where limited variety in supervisory signals during training leads to performance degradation at high compression rates. To address these challenges, we propose Label Pruning and Quantization for Large-scale Distillation (LPQLD). We enhance image diversity via class-wise batching and BN supervision during synthesis. For supervision diversity, we introduce Label Pruning with Dynamic Knowledge Reuse to enhance label-per-augmentation diversity, and Label Quantization with Calibrated Student-Teacher Alignment to enhance augmentation-per-image diversity. Our approach reduces soft label storage by 78× on ImageNet-1K and 500× on ImageNet-21K while improving accuracy by up to 7.2% and 2.8%, respectively. Extensive experiments validate the superiority of LPQLD across different network architectures and other dataset distillation methods.
{"title":"Soft Label Pruning and Quantization for Large-Scale Dataset Distillation.","authors":"Lingao Xiao, Yang He","doi":"10.1109/TPAMI.2026.3664488","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3664488","url":null,"abstract":"<p><p>Large-scale dataset distillation requires storing auxiliary soft labels that can be 30-40× (ImageNet-1K) or 200× (ImageNet-21K) larger than the condensed images, undermining the goal of dataset compression. We identify two fundamental issues necessitating such extensive labels: (1) insufficient image diversity, where high within-class similarity in synthetic images requires extensive augmentation, and (2) insufficient supervision diversity, where limited variety in supervisory signals during training leads to performance degradation at high compression rates. To address these challenges, we propose Label Pruning and Quantization for Large-scale Distillation (LPQLD). We enhance image diversity via class-wise batching and BN supervision during synthesis. For supervision diversity, we introduce Label Pruning with Dynamic Knowledge Reuse to enhance label-per-augmentation diversity, and Label Quantization with Calibrated Student-Teacher Alignment to enhance augmentation-per-image diversity. Our approach reduces soft label storage by 78× on ImageNet-1K and 500× on ImageNet-21K while improving accuracy by up to 7.2% and 2.8%, respectively. Extensive experiments validate the superiority of LPQLD across different network architectures and other dataset distillation methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1109/TPAMI.2026.3664269
Minchul Kim, Anil Jain, Xiaoming Liu
Over the past five decades, automated face recognition (FR) has progressed from handcrafted geometric and statistical approaches to advanced deep learning architectures that now approach, and in many cases exceed, human performance. This paper traces the historical and technological evolution of FR, encompassing early algorithmic paradigms through to contemporary neural systems trained on extensive real and synthetically generated datasets. We examine pivotal innovations that have driven this progression, including advances in dataset construction, loss function formulation, network architecture design, and feature fusion strategies. Furthermore, we analyze the relationship between data scale, diversity, and model generalization, highlighting how dataset expansion correlates with benchmark performance gains. Recent systems have achieved near-perfect large-scale identification accuracy, with the leading algorithm in the latest NIST FRTE 1:N benchmark reporting a False Negative Identification Rate (FNIR) of 0.15 percent at False Positive Identification Rate (FPIR) of 0.001 on a gallery of over 10 million identities . Larger galleries increase false positive rates and deployments at greater scales will see higher error rates. We delineate key open problems and emerging directions, including scalable training, multi-modal fusion, synthetic data, and interpretable recognition frameworks.
{"title":"50 Years of Automated Face Recognition.","authors":"Minchul Kim, Anil Jain, Xiaoming Liu","doi":"10.1109/TPAMI.2026.3664269","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3664269","url":null,"abstract":"<p><p>Over the past five decades, automated face recognition (FR) has progressed from handcrafted geometric and statistical approaches to advanced deep learning architectures that now approach, and in many cases exceed, human performance. This paper traces the historical and technological evolution of FR, encompassing early algorithmic paradigms through to contemporary neural systems trained on extensive real and synthetically generated datasets. We examine pivotal innovations that have driven this progression, including advances in dataset construction, loss function formulation, network architecture design, and feature fusion strategies. Furthermore, we analyze the relationship between data scale, diversity, and model generalization, highlighting how dataset expansion correlates with benchmark performance gains. Recent systems have achieved near-perfect large-scale identification accuracy, with the leading algorithm in the latest NIST FRTE 1:N benchmark reporting a False Negative Identification Rate (FNIR) of 0.15 percent at False Positive Identification Rate (FPIR) of 0.001 on a gallery of over 10 million identities . Larger galleries increase false positive rates and deployments at greater scales will see higher error rates. We delineate key open problems and emerging directions, including scalable training, multi-modal fusion, synthetic data, and interpretable recognition frameworks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1109/TPAMI.2026.3664388
Yabin Wang, Zhiwu Huang, Zhou Su, Adam Prugel-Bennett, Xiaopeng Hong
The rise of AI-generated images has sparked serious concerns about their potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancements, many current approaches prioritize short-term gains at the expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models for short-term gains on a single AI image dataset, while disregarding the long-term imperative of achieving generalization and knowledge retention. To address this trade-off issue, we propose a novel learning framework (PoundNet) for the generalization of AI-generated image detection on a pre-trained vision-language model. PoundNet incorporates a learnable prompt design and a balanced objective to preserve broad knowledge from upstream tasks (object classification) while enhancing generalization for downstream tasks (AI-generated image detection). We train PoundNet on a single standard AI image dataset, following common practice in the literature. We then evaluate its performance across 10 large-scale public AI-generated image detection datasets with 5 main evaluation metrics, forming the largest benchmark test set for assessing the generalization ability of AI-generated image detection models, to our knowledge. The comprehensive benchmark evaluation demonstrates that PoundNet successfully balances generalization with knowledge retention, achieving a remarkable relative improvement of 19% in AI-generated image detection performance compared to state-of-the-art methods, while maintaining a strong performance of 63% on object classification tasks. The source code and data are available at https://github.com/iamwangyabin/PoundNet.
{"title":"Penny-Wise and Pound-Foolish in AI-Generated Image Detection.","authors":"Yabin Wang, Zhiwu Huang, Zhou Su, Adam Prugel-Bennett, Xiaopeng Hong","doi":"10.1109/TPAMI.2026.3664388","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3664388","url":null,"abstract":"<p><p>The rise of AI-generated images has sparked serious concerns about their potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancements, many current approaches prioritize short-term gains at the expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models for short-term gains on a single AI image dataset, while disregarding the long-term imperative of achieving generalization and knowledge retention. To address this trade-off issue, we propose a novel learning framework (PoundNet) for the generalization of AI-generated image detection on a pre-trained vision-language model. PoundNet incorporates a learnable prompt design and a balanced objective to preserve broad knowledge from upstream tasks (object classification) while enhancing generalization for downstream tasks (AI-generated image detection). We train PoundNet on a single standard AI image dataset, following common practice in the literature. We then evaluate its performance across 10 large-scale public AI-generated image detection datasets with 5 main evaluation metrics, forming the largest benchmark test set for assessing the generalization ability of AI-generated image detection models, to our knowledge. The comprehensive benchmark evaluation demonstrates that PoundNet successfully balances generalization with knowledge retention, achieving a remarkable relative improvement of 19% in AI-generated image detection performance compared to state-of-the-art methods, while maintaining a strong performance of 63% on object classification tasks. The source code and data are available at https://github.com/iamwangyabin/PoundNet.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146196038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the fast progress of deep learning, one standing challenge is the gap of the observed training samples and the underlying true distribution. There are multiple reasons for the causing of this gap e.g., sampling bias, noise etc. In the era of foundation models, we show that when leveraging the off-the-shelf (vision) foundation models (e.g., CLIP, DINOv2) for feature extraction, the geometric shapes of the resulting feature distributions exhibit remarkable transferability across domains and datasets. To verify its practical usefulness, we embody our geometric knowledge-guided distribution calibration framework in two popular and challenging settings: federated learning and long-tailed recognition. In the federated setting, we devise a technique of acquiring the global geometric shape under privacy constraints, then leverage this knowledge to generate new samples for clients, in the aim of bridging the gap between local and global observations. In long-tailed learning, it utilizes the geometric knowledge transferred from sample-rich categories to recover the true distribution for sample-scarce tail classes. Comprehensive experiments show that our proposed geometric knowledge-guided distribution calibration effectively overcomes information deficits caused by data heterogeneity and sample imbalance, with boosted performance across benchmarks. Code published at: https://github.com/WeiDai-David/2025CVPR GGEUR.
{"title":"Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency.","authors":"Yanbiao Ma, Wei Dai, Zhiwu Lu, Bowei Liu, Jiayi Chen, Wenke Huang, Junchi Yan, Guancheng Wan","doi":"10.1109/TPAMI.2026.3662389","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3662389","url":null,"abstract":"<p><p>Despite the fast progress of deep learning, one standing challenge is the gap of the observed training samples and the underlying true distribution. There are multiple reasons for the causing of this gap e.g., sampling bias, noise etc. In the era of foundation models, we show that when leveraging the off-the-shelf (vision) foundation models (e.g., CLIP, DINOv2) for feature extraction, the geometric shapes of the resulting feature distributions exhibit remarkable transferability across domains and datasets. To verify its practical usefulness, we embody our geometric knowledge-guided distribution calibration framework in two popular and challenging settings: federated learning and long-tailed recognition. In the federated setting, we devise a technique of acquiring the global geometric shape under privacy constraints, then leverage this knowledge to generate new samples for clients, in the aim of bridging the gap between local and global observations. In long-tailed learning, it utilizes the geometric knowledge transferred from sample-rich categories to recover the true distribution for sample-scarce tail classes. Comprehensive experiments show that our proposed geometric knowledge-guided distribution calibration effectively overcomes information deficits caused by data heterogeneity and sample imbalance, with boosted performance across benchmarks. Code published at: https://github.com/WeiDai-David/2025CVPR GGEUR.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TPAMI.2026.3661424
Li Sun, Zhenhao Huang, Yujie Wang, Hongbo Lv, Chunyang Liu, Hao Peng, Philip S Yu
Graph clustering is a longstanding topic in machine learning. In recent years, deep learning methods have achieved encouraging results, but they still require predefined cluster numbers $K$, and typically struggle with imbalanced graphs, especially in identifying minority clusters. The limitations motivate us to study a challenging yet practical problem: deep graph clustering without $K$ considering the imbalance in reality. We approach this problem from a fresh perspective of information theory (i.e., structural information). In the literature, structural information has rarely been touched in deep clustering, and the classic definition falls short in its discrete formulation, neglecting node attributes and exhibiting prohibitive complexity. In this paper, we first establish a differentiable structural information, generalizing the discrete formalism to continuous realm, so that we design a hyperbolic deep model (LSEnet) to learn the neural partitioning tree in the Lorentz model of hyperbolic space. Theoretically, we demonstrate its capability in clustering without requiring $K$ and identifying minority clusters in imbalanced graphs. Second, we refine hyperbolic representations of the partitioning tree, enhancing graph semantics, for better clustering. Contrastive learning for tree structures is non-trivial and costs quadratic complexity. Instead, we further advance our theory by discovering an interesting fact that structural entropy indeed bounds the tree contrastive loss. Finally, with an efficient reformulation, we approach graph clustering through a novel augmented structural information learning (ASIL), which offers a simple yet effective objective of augmented structural entropy to seamlessly integrates hyperbolic partitioning tree construction and contrastive learning. With a provable improvement in graph conductance, ASIL achieves effective debiased graph clustering in linear complexity with respect to the graph size. Extensive experiments show the ASIL outperforms 20 strong baselines by an average of $+12.42%$ in NMI on Citeseer dataset.
{"title":"ASIL: Augmented Structural Information Learning for Deep Graph Clustering in Hyperbolic Space.","authors":"Li Sun, Zhenhao Huang, Yujie Wang, Hongbo Lv, Chunyang Liu, Hao Peng, Philip S Yu","doi":"10.1109/TPAMI.2026.3661424","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3661424","url":null,"abstract":"<p><p>Graph clustering is a longstanding topic in machine learning. In recent years, deep learning methods have achieved encouraging results, but they still require predefined cluster numbers $K$, and typically struggle with imbalanced graphs, especially in identifying minority clusters. The limitations motivate us to study a challenging yet practical problem: deep graph clustering without $K$ considering the imbalance in reality. We approach this problem from a fresh perspective of information theory (i.e., structural information). In the literature, structural information has rarely been touched in deep clustering, and the classic definition falls short in its discrete formulation, neglecting node attributes and exhibiting prohibitive complexity. In this paper, we first establish a differentiable structural information, generalizing the discrete formalism to continuous realm, so that we design a hyperbolic deep model (LSEnet) to learn the neural partitioning tree in the Lorentz model of hyperbolic space. Theoretically, we demonstrate its capability in clustering without requiring $K$ and identifying minority clusters in imbalanced graphs. Second, we refine hyperbolic representations of the partitioning tree, enhancing graph semantics, for better clustering. Contrastive learning for tree structures is non-trivial and costs quadratic complexity. Instead, we further advance our theory by discovering an interesting fact that structural entropy indeed bounds the tree contrastive loss. Finally, with an efficient reformulation, we approach graph clustering through a novel augmented structural information learning (ASIL), which offers a simple yet effective objective of augmented structural entropy to seamlessly integrates hyperbolic partitioning tree construction and contrastive learning. With a provable improvement in graph conductance, ASIL achieves effective debiased graph clustering in linear complexity with respect to the graph size. Extensive experiments show the ASIL outperforms 20 strong baselines by an average of $+12.42%$ in NMI on Citeseer dataset.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TPAMI.2026.3661650
Xiaowei Zhao, Linrui Xie, Xiaojun Chang, Feiping Nie, Qiang Zhang
Bipartite graph-based co-clustering is efficient in modeling cluster manifold structures. However, existing methods decouple bipartite graph construction from the learning of pseudo-labels for samples and anchors, often leading to suboptimal clustering performance. Moreover, neglecting local manifold relationships among anchors yields inferior anchor pseudo-labels, which further degrades the quality of sample pseudo-labels. To overcome these limitations, we propose a novel model termed Fast Co-Clustering (FC$^{2}$), which jointly captures both local and global correlations between samples and anchors. Specifically, to model the coupling between the one-hot pseudo-labels of samples and anchors, we construct a bipartite graph with adaptively updated weights during the clustering process. To prevent severely imbalanced cluster assignments, we prove the equivalence between maximizing pseudo-label covariance and balancing cluster proportions, and incorporate a balanced regularization term to enhance the rationality of the resulting clusters. Furthermore, the local smoothness of anchor pseudo-labels is preserved via a low-rank decomposition of a compact anchor similarity graph. These two components jointly ensure that spatially adjacent anchors tend to share similar cluster identities, and that samples and anchors in close proximity are also assigned to similar clusters. We develop an efficient iterative optimization algorithm to update all model variables. Extensive experiments on benchmark and synthetic datasets validate the superior performance and efficiency of the proposed method compared with state-of-the-art approaches. Code is available at https://github.com/Vince-Doit/FC2.
{"title":"FC$^{2}$: Fast Co-Clustering With Small-Scale Similarity Graph and Bipartite Graph Learning.","authors":"Xiaowei Zhao, Linrui Xie, Xiaojun Chang, Feiping Nie, Qiang Zhang","doi":"10.1109/TPAMI.2026.3661650","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3661650","url":null,"abstract":"<p><p>Bipartite graph-based co-clustering is efficient in modeling cluster manifold structures. However, existing methods decouple bipartite graph construction from the learning of pseudo-labels for samples and anchors, often leading to suboptimal clustering performance. Moreover, neglecting local manifold relationships among anchors yields inferior anchor pseudo-labels, which further degrades the quality of sample pseudo-labels. To overcome these limitations, we propose a novel model termed Fast Co-Clustering (FC$^{2}$), which jointly captures both local and global correlations between samples and anchors. Specifically, to model the coupling between the one-hot pseudo-labels of samples and anchors, we construct a bipartite graph with adaptively updated weights during the clustering process. To prevent severely imbalanced cluster assignments, we prove the equivalence between maximizing pseudo-label covariance and balancing cluster proportions, and incorporate a balanced regularization term to enhance the rationality of the resulting clusters. Furthermore, the local smoothness of anchor pseudo-labels is preserved via a low-rank decomposition of a compact anchor similarity graph. These two components jointly ensure that spatially adjacent anchors tend to share similar cluster identities, and that samples and anchors in close proximity are also assigned to similar clusters. We develop an efficient iterative optimization algorithm to update all model variables. Extensive experiments on benchmark and synthetic datasets validate the superior performance and efficiency of the proposed method compared with state-of-the-art approaches. Code is available at https://github.com/Vince-Doit/FC2.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1109/TPAMI.2026.3659200
Yinjian Wang, Wei Li, James E Fowler, Gemine Vivone
The problem of robust matrix completion-the recovery of a low-rank matrix and a sparse matrix from a sampling of their superposition-has been addressed extensively in prior literature. Yet, much of this work has focused exclusively on the case in which the matrix sampling is done at random, as this scenario is amenable to theoretical analysis. In contrast, sampling with an arbitrary deterministic pattern is often more accommodating to hardware implementation; consequently, the problem of robust matrix completion under deterministic sampling is considered. To this end, a restricted approximate isometry property is proposed and used, along with a modified golfing scheme and a slightly strengthened incoherence condition, to prove that the latent low-rank and sparse matrices are uniquely recoverable via convex optimization with asymptotically high probability, providing the first exact-recovery theory for robust matrix completion with arbitrary deterministic sampling. A corresponding convex-optimization algorithm, driven by a traditional nuclear norm, is developed and then subsequently generalized by substituting a convolutional nuclear norm in order to cover a broader range of application scenarios. Empirical experiments on synthetic data verify the proposed theory while a battery of results on real-world images demonstrate the practical efficacy of the generalized algorithm for robust matrix recovery.
{"title":"Robust Matrix Completion With Deterministic Sampling Via Convex Optimization.","authors":"Yinjian Wang, Wei Li, James E Fowler, Gemine Vivone","doi":"10.1109/TPAMI.2026.3659200","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3659200","url":null,"abstract":"<p><p>The problem of robust matrix completion-the recovery of a low-rank matrix and a sparse matrix from a sampling of their superposition-has been addressed extensively in prior literature. Yet, much of this work has focused exclusively on the case in which the matrix sampling is done at random, as this scenario is amenable to theoretical analysis. In contrast, sampling with an arbitrary deterministic pattern is often more accommodating to hardware implementation; consequently, the problem of robust matrix completion under deterministic sampling is considered. To this end, a restricted approximate isometry property is proposed and used, along with a modified golfing scheme and a slightly strengthened incoherence condition, to prove that the latent low-rank and sparse matrices are uniquely recoverable via convex optimization with asymptotically high probability, providing the first exact-recovery theory for robust matrix completion with arbitrary deterministic sampling. A corresponding convex-optimization algorithm, driven by a traditional nuclear norm, is developed and then subsequently generalized by substituting a convolutional nuclear norm in order to cover a broader range of application scenarios. Empirical experiments on synthetic data verify the proposed theory while a battery of results on real-world images demonstrate the practical efficacy of the generalized algorithm for robust matrix recovery.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1109/TPAMI.2026.3659125
Yuanfei Huang, Hua Huang
Reversible image conversion (RIC) suffers from ill-posedness issues due to its forward conversion process being considered an underdetermined system. Despite employing invertible neural networks (INN), existing RIC methods intrinsically remain ill-posed as inevitably introducing uncertainty by incorporating randomly sampled variables. To tackle the ill-posedness dilemma, we focus on developing a reliable approximate left inverse for the underdetermined system by constructing an overdetermined system with a non-zero Gram determinant, thus ensuring a well-posed solution. Based on this principle, we propose a well-posed invertible $1times 1$ convolution (WIC), which eliminates the reliance on random variable sampling and enables the development of well-posed invertible networks. Furthermore, we design two innovative networks, WIN-Naïve and WIN, with the latter incorporating advanced skip-connections to enhance long-term memory. Our methods are evaluated across diverse RIC tasks, including reversible image hiding, image rescaling, and image decolorization, consistently achieving state-of-the-art performance. Extensive experiments validate the effectiveness of our approach, demonstrating its ability to overcome the bottlenecks of existing RIC solutions and setting a new benchmark in the field. Codes are available in https://github.com/BNU-ERC-ITEA/WIN.
{"title":"Tackling Ill-Posedness of Reversible Image Conversion With Well-Posed Invertible Network.","authors":"Yuanfei Huang, Hua Huang","doi":"10.1109/TPAMI.2026.3659125","DOIUrl":"https://doi.org/10.1109/TPAMI.2026.3659125","url":null,"abstract":"<p><p>Reversible image conversion (RIC) suffers from ill-posedness issues due to its forward conversion process being considered an underdetermined system. Despite employing invertible neural networks (INN), existing RIC methods intrinsically remain ill-posed as inevitably introducing uncertainty by incorporating randomly sampled variables. To tackle the ill-posedness dilemma, we focus on developing a reliable approximate left inverse for the underdetermined system by constructing an overdetermined system with a non-zero Gram determinant, thus ensuring a well-posed solution. Based on this principle, we propose a well-posed invertible $1times 1$ convolution (WIC), which eliminates the reliance on random variable sampling and enables the development of well-posed invertible networks. Furthermore, we design two innovative networks, WIN-Naïve and WIN, with the latter incorporating advanced skip-connections to enhance long-term memory. Our methods are evaluated across diverse RIC tasks, including reversible image hiding, image rescaling, and image decolorization, consistently achieving state-of-the-art performance. Extensive experiments validate the effectiveness of our approach, demonstrating its ability to overcome the bottlenecks of existing RIC solutions and setting a new benchmark in the field. Codes are available in https://github.com/BNU-ERC-ITEA/WIN.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}