Multi-label feature selection is an effective approach to mitigate the high-dimensional feature problem in multi-label learning. Most existing multi-label feature selection methods either assume that the data is complete, or that either the features or the labels are incomplete. So far, there are few studies on multi-label data with missing features and labels. In many cases, missing features in instances of multi-label data often lead to missing labels, which is ignored by existing studies. We define this type of data as instance-dependent incomplete multi-label data. In this paper, we propose a feature selection method for instance-dependent incomplete multi-label data. Firstly, we use the positive correlations between features to reconstruct the feature space, thereby recovering missing values and enhancing non-missing values. Secondly, we use fuzzy tolerance relation to guide label recovery, and utilize fuzzy mutual implication granularity to impose structural constraint on the projection matrix. Thirdly, we achieve feature selection by eliminating the impact of incomplete instances and imposing sparse regularization on the projection matrix. Finally, we provide a convergent solution for the proposed feature selection framework. Comparative experiments with existing multi-label feature selection methods show that our method can perform effective feature selection on instance-dependent incomplete multi-label data.
{"title":"Instance-Dependent Incomplete Multi-Label Feature Selection by Fuzzy Tolerance Relation and Fuzzy Mutual Implication Granularity","authors":"Jianhua Dai;Wenxiang Chen;Yuhua Qian;Witold Pedrycz","doi":"10.1109/TKDE.2025.3591461","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591461","url":null,"abstract":"Multi-label feature selection is an effective approach to mitigate the high-dimensional feature problem in multi-label learning. Most existing multi-label feature selection methods either assume that the data is complete, or that either the features or the labels are incomplete. So far, there are few studies on multi-label data with missing features and labels. In many cases, missing features in instances of multi-label data often lead to missing labels, which is ignored by existing studies. We define this type of data as instance-dependent incomplete multi-label data. In this paper, we propose a feature selection method for instance-dependent incomplete multi-label data. Firstly, we use the positive correlations between features to reconstruct the feature space, thereby recovering missing values and enhancing non-missing values. Secondly, we use fuzzy tolerance relation to guide label recovery, and utilize fuzzy mutual implication granularity to impose structural constraint on the projection matrix. Thirdly, we achieve feature selection by eliminating the impact of incomplete instances and imposing sparse regularization on the projection matrix. Finally, we provide a convergent solution for the proposed feature selection framework. Comparative experiments with existing multi-label feature selection methods show that our method can perform effective feature selection on instance-dependent incomplete multi-label data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5994-6008"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-28DOI: 10.1109/TKDE.2025.3591500
Tianchuan Yang;Haiqiang Chen;Haoyan Yang;Man-Sheng Chen;Xiangcheng Li;Youming Sun;Chang-Dong Wang
Efficient incomplete multi-view clustering has received increasing attention due to its ability to handle large-scale and missing data. Although existing methods have promising performance, 1) they typically generate anchors directly from incomplete and noisy raw data, resulting in uncomprehensive anchor coverage and unreliable results; 2) they typically use only sparse regularization to remove noise and overlook outliers; 3) they ignore the inherent consistency of features in a view. To address these issues, we propose a smoothness-induced efficient incomplete multi-view clustering (SEIC) method. SEIC regards available data as natural anchors selected from complete data, and performs matrix decomposition only on them to obtain reliable small-size representation matrices. View-specific representation matrices are constructed as a tensor to capture consensus and guide matrix decomposition. More significantly, we enforce both smoothness and low-rank coupling on the tensor. Smoothness induces continuous variation of the tensor to further eliminate noise and enhance the relation among features. Benefiting from the noise robustness of SEIC, we design an adaptive noise balance parameter that renders SEIC parameter-free. Furthermore, by constructing a sparse anchor graph on the learned tensor, we propose the spectral clustering version SEIC-SC. Experiments on multiple datasets demonstrate the superior performance and efficiency of SEIC and SEIC-SC.
{"title":"Smoothness-Induced Efficient Incomplete Multi-View Clustering","authors":"Tianchuan Yang;Haiqiang Chen;Haoyan Yang;Man-Sheng Chen;Xiangcheng Li;Youming Sun;Chang-Dong Wang","doi":"10.1109/TKDE.2025.3591500","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591500","url":null,"abstract":"Efficient incomplete multi-view clustering has received increasing attention due to its ability to handle large-scale and missing data. Although existing methods have promising performance, 1) they typically generate anchors directly from incomplete and noisy raw data, resulting in uncomprehensive anchor coverage and unreliable results; 2) they typically use only sparse regularization to remove noise and overlook outliers; 3) they ignore the inherent consistency of features in a view. To address these issues, we propose a smoothness-induced efficient incomplete multi-view clustering (SEIC) method. SEIC regards available data as natural anchors selected from complete data, and performs matrix decomposition only on them to obtain reliable small-size representation matrices. View-specific representation matrices are constructed as a tensor to capture consensus and guide matrix decomposition. More significantly, we enforce both smoothness and low-rank coupling on the tensor. Smoothness induces continuous variation of the tensor to further eliminate noise and enhance the relation among features. Benefiting from the noise robustness of SEIC, we design an adaptive noise balance parameter that renders SEIC parameter-free. Furthermore, by constructing a sparse anchor graph on the learned tensor, we propose the spectral clustering version SEIC-SC. Experiments on multiple datasets demonstrate the superior performance and efficiency of SEIC and SEIC-SC.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6173-6188"},"PeriodicalIF":10.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-24DOI: 10.1109/TKDE.2025.3592640
Sanfeng Zhang;Xinyi Liu;Zihao Qi;Xingchen Yan;Wang Yang
When distribution shifts occur between testing and training graph data, out-of-distribution (OOD) samples undermine the performance of graph neural networks (GNNs). To improve adaptive OOD generalization of GNNs, this paper introduces a novel generative invariant graph learning framework, named GI-Graph. It consists of four modules: subgraph extractor, generative environment subgraph augmentation, generative invariant subgraph learning, and query feedback module. The subgraph extractor decomposes a graph sample into an environment subgraph and an invariant subgraph and improves extraction accuracy through query feedback. GI-Graph uses a diffusion model to generate diverse environment subgraphs, augmenting the OOD data. By combining diffusion models, contrastive learning, and attribute prediction networks, GI-Graph also generates augmented invariant subgraphs with significant identically distributed features and consistency of labels. Experimental results demonstrate that the controllable environment subgraph and invariant subgraph augmentation effectively improve the OOD generalization capability of GI-Graph, especially in capturing invariant features and maintaining category consistency across environments. Additionally, the contrastive learning-based fine-tuning method enables GI-Graph to quickly adapt to evolving environments. This paper verifies the effectiveness of the generative invariant graph learning scheme in graph OOD generalization.
{"title":"GI-Graph: A Generative Invariant Graph Learning Scheme Towards Out-of-Distribution Generalization","authors":"Sanfeng Zhang;Xinyi Liu;Zihao Qi;Xingchen Yan;Wang Yang","doi":"10.1109/TKDE.2025.3592640","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592640","url":null,"abstract":"When distribution shifts occur between testing and training graph data, out-of-distribution (OOD) samples undermine the performance of graph neural networks (GNNs). To improve adaptive OOD generalization of GNNs, this paper introduces a novel generative invariant graph learning framework, named GI-Graph. It consists of four modules: subgraph extractor, generative environment subgraph augmentation, generative invariant subgraph learning, and query feedback module. The subgraph extractor decomposes a graph sample into an environment subgraph and an invariant subgraph and improves extraction accuracy through query feedback. GI-Graph uses a diffusion model to generate diverse environment subgraphs, augmenting the OOD data. By combining diffusion models, contrastive learning, and attribute prediction networks, GI-Graph also generates augmented invariant subgraphs with significant identically distributed features and consistency of labels. Experimental results demonstrate that the controllable environment subgraph and invariant subgraph augmentation effectively improve the OOD generalization capability of GI-Graph, especially in capturing invariant features and maintaining category consistency across environments. Additionally, the contrastive learning-based fine-tuning method enables GI-Graph to quickly adapt to evolving environments. This paper verifies the effectiveness of the generative invariant graph learning scheme in graph OOD generalization.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5934-5947"},"PeriodicalIF":10.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Translating users’ natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era.
将用户的自然语言查询(NL)转换为SQL查询(即文本到SQL,又称NL2SQL)可以显著降低访问关系数据库的障碍,并支持各种商业应用程序。随着大型语言模型(Large Language Models, llm)的出现,文本到sql的性能得到了极大的提高。在本调查中,我们全面回顾了由llm提供支持的文本到sql技术,从以下四个方面涵盖了其整个生命周期:(1)模型:文本到sql翻译技术,不仅解决了NL歧义和规范不足,而且还将NL与数据库模式和实例进行了适当的映射;(2)数据:从训练数据的收集,由于训练数据稀缺而进行的数据综合,到Text-to-SQL基准测试;(3)评价:使用不同的度量和粒度从多角度评价Text-to-SQL方法;(4)错误分析:分析文本到sql的错误,找出根本原因,指导文本到sql模型的发展。此外,我们还提供了开发Text-to-SQL解决方案的经验法则。最后,讨论了法学硕士时代文本到sql的研究挑战和有待解决的问题。
{"title":"A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going?","authors":"Xinyu Liu;Shuyu Shen;Boyan Li;Peixian Ma;Runzhi Jiang;Yuxin Zhang;Ju Fan;Guoliang Li;Nan Tang;Yuyu Luo","doi":"10.1109/TKDE.2025.3592032","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592032","url":null,"abstract":"Translating users’ natural language queries (NL) into SQL queries (i.e., Text-to-SQL, <italic>a.k.a.</i> NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) <italic>Model:</i> Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) <italic>Data:</i> From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) <italic>Evaluation:</i> Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) <italic>Error Analysis:</i> analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5735-5754"},"PeriodicalIF":10.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-23DOI: 10.1109/TKDE.2025.3591771
Songwei Zhao;Bo Yu;Sinuo Zhang;Zhejian Yang;Jifeng Hu;Philip S. Yu;Hechang Chen
Graph neural networks (GNNs) have garnered significant attention for their competitive performance on graph-structured data. However, many existing methods are commonly constrained by the homophily assumption, making them overly reliant on the uniform neighbor propagation, which limits their ability to generalize to heterophilous graphs. Although some approaches extend aggregation to multi-hop neighbors, adapting neighborhood sizes on a per-node basis remains a significant challenge. In view of this, we propose an Evolutionary Graph Neural Network (EGNN) with adaptive structure-level aggregation and label smoothing, offering a novel solution to the aforementioned drawback. The core innovation of EGNN lies in assigning each node a personalized neighborhood structure utilizing behavior-level crossover and mutation. Specifically, we first adaptively search for the optimal structure-level neighborhoods for nodes within the solution space, leveraging the exploratory capabilities of evolutionary computation. This approach enhances the exchange of information between the target node and surrounding nodes, achieving a smooth vector representation. Subsequently, we adopt the optimal structure obtained through evolutionary search to perform label smoothing, further boosting the robustness of the framework. We conduct experiments on nine real-world networks with different homophily ratios, where outstanding performance demonstrates that the ability of EGNN can match or surpass SOTA baselines.
{"title":"EGNN: Exploring Structure-Level Neighborhoods in Graphs With Varying Homophily Ratios","authors":"Songwei Zhao;Bo Yu;Sinuo Zhang;Zhejian Yang;Jifeng Hu;Philip S. Yu;Hechang Chen","doi":"10.1109/TKDE.2025.3591771","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591771","url":null,"abstract":"Graph neural networks (GNNs) have garnered significant attention for their competitive performance on graph-structured data. However, many existing methods are commonly constrained by the homophily assumption, making them overly reliant on the uniform neighbor propagation, which limits their ability to generalize to heterophilous graphs. Although some approaches extend aggregation to multi-hop neighbors, adapting neighborhood sizes on a per-node basis remains a significant challenge. In view of this, we propose an Evolutionary Graph Neural Network (EGNN) with adaptive structure-level aggregation and label smoothing, offering a novel solution to the aforementioned drawback. The core innovation of EGNN lies in assigning each node a <italic>personalized</i> neighborhood structure utilizing <italic>behavior-level</i> crossover and mutation. Specifically, we first adaptively search for the optimal structure-level neighborhoods for nodes within the solution space, leveraging the exploratory capabilities of evolutionary computation. This approach enhances the exchange of information between the target node and surrounding nodes, achieving a smooth vector representation. Subsequently, we adopt the optimal structure obtained through evolutionary search to perform label smoothing, further boosting the robustness of the framework. We conduct experiments on nine real-world networks with different homophily ratios, where outstanding performance demonstrates that the ability of EGNN can match or surpass SOTA baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5852-5865"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-23DOI: 10.1109/TKDE.2025.3592074
Amin Vahedian
The Conditional Expectation Function (CEF) is an optimal estimator in real space. Artificial Neural Networks (ANN), as the current state-of-the-art method, lack interpretability. Estimating CEF offers a path to achieve both accuracy and interpretability. Previous attempts to estimate CEF rely on limiting assumptions such as independence and distributional form or perform the expensive nearest neighbor search. We propose Dynamically Ordered Precise Bayes Regression (DO-PBR), a novel method to estimate CEF in discrete space. We prove DO-PBR approaches optimality with increasing number of samples. DO-PBR dynamically learns importance rankings for the predictors, which are region-specific, allowing the importance of a predictor vary across the space. DO-PBR is fully interpretable and makes no assumptions on independence or the distributional form, while requiring minimal parameter setting. In addition, DO-PBR avoids the costly nearest-neighbor search, by using a hierarchy of binary trees. Our experiments confirm our theoretical claims on approaching optimality and show that DO-PBR achieves substantially higher accuracy compared to ANN, when given the same amount of time. Our experiments show that on average, ANN takes 32 times longer to achieve the same level of accuracy as DO-PBR.
{"title":"Precise Bayes Regression: Approaching Optimality, Using Multi-Dimensional Space Partitioning Trees","authors":"Amin Vahedian","doi":"10.1109/TKDE.2025.3592074","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592074","url":null,"abstract":"The Conditional Expectation Function (CEF) is an optimal estimator in real space. Artificial Neural Networks (ANN), as the current state-of-the-art method, lack interpretability. Estimating CEF offers a path to achieve both accuracy and interpretability. Previous attempts to estimate CEF rely on limiting assumptions such as independence and distributional form or perform the expensive nearest neighbor search. We propose Dynamically Ordered Precise Bayes Regression (DO-PBR), a novel method to estimate CEF in discrete space. We prove DO-PBR approaches optimality with increasing number of samples. DO-PBR dynamically learns importance rankings for the predictors, which are region-specific, allowing the importance of a predictor vary across the space. DO-PBR is fully interpretable and makes no assumptions on independence or the distributional form, while requiring minimal parameter setting. In addition, DO-PBR avoids the costly nearest-neighbor search, by using a hierarchy of binary trees. Our experiments confirm our theoretical claims on approaching optimality and show that DO-PBR achieves substantially higher accuracy compared to ANN, when given the same amount of time. Our experiments show that on average, ANN takes 32 times longer to achieve the same level of accuracy as DO-PBR.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6107-6119"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145036801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-23DOI: 10.1109/TKDE.2025.3591827
Xu Zhao;Weibing Wan;Zhijun Fang
Predicting equipment failures plays a pivotal role in minimizing maintenance costs and boosting production efficiency within the industrial sector. This paper introduces a novel approach that integrates Causal Inference with predictive modeling to enhance prediction accuracy, tackling key challenges such as noise interference, insufficient causal validation, and missing data. We first validate the causal connections identified by the Greedy Equivalence Search algorithm using conditional mutual information to strengthen the reliability of the causal graph. An information bottleneck strategy is then employed to isolate essential causal features, effectively filtering out irrelevant noise and refining the causal structure. Crucially, in the actual prediction phase, we propose a recursive causal inference-based imputation method to handle missing data, leveraging the causal graph to iteratively infer and fill gaps, thereby improving data completeness and prediction accuracy. Experimental results demonstrate that the proposed method significantly outperforms existing approaches, exhibiting superior accuracy and robustness in managing complex industrial datasets.
{"title":"IGES-RCI: Improved Greedy Equivalence Search and Recursive Causal Inference for Industrial Equipment Failure Prediction","authors":"Xu Zhao;Weibing Wan;Zhijun Fang","doi":"10.1109/TKDE.2025.3591827","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591827","url":null,"abstract":"Predicting equipment failures plays a pivotal role in minimizing maintenance costs and boosting production efficiency within the industrial sector. This paper introduces a novel approach that integrates Causal Inference with predictive modeling to enhance prediction accuracy, tackling key challenges such as noise interference, insufficient causal validation, and missing data. We first validate the causal connections identified by the Greedy Equivalence Search algorithm using conditional mutual information to strengthen the reliability of the causal graph. An information bottleneck strategy is then employed to isolate essential causal features, effectively filtering out irrelevant noise and refining the causal structure. Crucially, in the actual prediction phase, we propose a recursive causal inference-based imputation method to handle missing data, leveraging the causal graph to iteratively infer and fill gaps, thereby improving data completeness and prediction accuracy. Experimental results demonstrate that the proposed method significantly outperforms existing approaches, exhibiting superior accuracy and robustness in managing complex industrial datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5983-5993"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145051056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of incomplete multi-view clustering (IMvC) aims to partition multi-view data with a lack of completeness into different clusters. The incompleteness can be typically categorized into the case of instance-missing and view-unaligned MvC. However, prior methods either consider each of them or struggle to pursue consistent latent representations among views. In this paper, we propose two forms of contrastive learning paradigms to jointly handle both cases for IMvC. Specifically, we design an instance-oriented contrastive (IOC) learning strategy to achieve intra-class consistency. As negative samples within different datasets can exhibit diverse distributions, we formulate a parameterized boundary for IOC learning to flexibly deal with such differing data modes. To preserve inter-view consistency, we further devise category-oriented contrastive (COC) learning such that data from different views can be seamlessly integrated into a combined semantic space. We also recover the missing instances with the learned latent representations in a reconstructing manner for realigning the incomplete multi-view data to facilitate clustering. Our approach unifies the solution to both incomplete cases into one formulation. To demonstrate the effectiveness of our model, we conduct four types of MvC tasks on six benchmark multi-view datasets and compare our method against state-of-the-art IMvC methods. Extensive experiments show that our method achieves state-of-the-art performance, quantitatively and qualitatively.
{"title":"Learning to Discriminate While Contrasting: Combating False Negative Pairs With Coupled Contrastive Learning for Incomplete Multi-View Clustering","authors":"Yu Ding;Katsuya Hotta;Chunzhi Gu;Ao Li;Jun Yu;Chao Zhang","doi":"10.1109/TKDE.2025.3592126","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3592126","url":null,"abstract":"The task of incomplete multi-view clustering (IMvC) aims to partition multi-view data with a lack of completeness into different clusters. The incompleteness can be typically categorized into the case of instance-missing and view-unaligned MvC. However, prior methods either consider each of them or struggle to pursue consistent latent representations among views. In this paper, we propose two forms of contrastive learning paradigms to jointly handle both cases for IMvC. Specifically, we design an instance-oriented contrastive (IOC) learning strategy to achieve intra-class consistency. As negative samples within different datasets can exhibit diverse distributions, we formulate a parameterized boundary for IOC learning to flexibly deal with such differing data modes. To preserve inter-view consistency, we further devise category-oriented contrastive (COC) learning such that data from different views can be seamlessly integrated into a combined semantic space. We also recover the missing instances with the learned latent representations in a reconstructing manner for realigning the incomplete multi-view data to facilitate clustering. Our approach unifies the solution to both incomplete cases into one formulation. To demonstrate the effectiveness of our model, we conduct four types of MvC tasks on six benchmark multi-view datasets and compare our method against state-of-the-art IMvC methods. Extensive experiments show that our method achieves state-of-the-art performance, quantitatively and qualitatively.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6046-6060"},"PeriodicalIF":10.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-22DOI: 10.1109/TKDE.2025.3591732
Hanxuan Yang;Qingchao Kong;Wenji Mao
Graph representation learning is a fundamental research theme and can be generalized to benefit multiple downstream tasks from the node and link levels to the higher graph level. In practice, it is desirable to develop task-agnostic graph representation learning methods that are typically trained in an unsupervised manner. However, existing unsupervised graph models, represented by the variational graph auto-encoders (VGAEs), can only address node- and link-level tasks while manifesting poor generalizability on the more difficult graph-level tasks because they can only keep low-order isomorphic consistency within the subgraphs of one-hop neighborhoods. To overcome the limitations of existing methods, in this paper, we propose the Isomorphic-Consistent VGAE (IsoC-VGAE) for multi-level task-agnostic graph representation learning. We first devise an unsupervised decoding scheme to provide a theoretical guarantee of keeping the high-order isomorphic consistency within the VGAE framework. We then propose the Inverse Graph Neural Network (Inv-GNN) decoder as its intuitive realization, which trains the model via reconstructing the node embeddings and neighborhood distributions learned by the GNN encoder. Extensive experiments on multi-level graph learning tasks verify that our model achieves superior or comparable performance compared to both the state-of-the-art unsupervised methods and representative supervised methods with distinct advantages on the graph-level tasks.
{"title":"Multi-Level Task-Agnostic Graph Representation Learning With Isomorphic-Consistent Variational Graph Auto-Encoders","authors":"Hanxuan Yang;Qingchao Kong;Wenji Mao","doi":"10.1109/TKDE.2025.3591732","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591732","url":null,"abstract":"Graph representation learning is a fundamental research theme and can be generalized to benefit multiple downstream tasks from the node and link levels to the higher graph level. In practice, it is desirable to develop task-agnostic graph representation learning methods that are typically trained in an unsupervised manner. However, existing unsupervised graph models, represented by the variational graph auto-encoders (VGAEs), can only address node- and link-level tasks while manifesting poor generalizability on the more difficult graph-level tasks because they can only keep low-order <italic>isomorphic consistency</i> within the subgraphs of one-hop neighborhoods. To overcome the limitations of existing methods, in this paper, we propose the Isomorphic-Consistent VGAE (IsoC-VGAE) for multi-level task-agnostic graph representation learning. We first devise an unsupervised decoding scheme to provide a theoretical guarantee of keeping the high-order isomorphic consistency within the VGAE framework. We then propose the Inverse Graph Neural Network (Inv-GNN) decoder as its intuitive realization, which trains the model via reconstructing the node embeddings and neighborhood distributions learned by the GNN encoder. Extensive experiments on multi-level graph learning tasks verify that our model achieves superior or comparable performance compared to both the state-of-the-art unsupervised methods and representative supervised methods with distinct advantages on the graph-level tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6061-6074"},"PeriodicalIF":10.4,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-22DOI: 10.1109/TKDE.2025.3591460
Yurong Cheng;Xiaoxi Cui;Ye Yuan;Xiangmin Zhou;Guoren Wang
With the development of AI, Big Data, and mobile communication, intelligent transportation has become popular in recent years. Path planning is a typical topic of intelligent transportation, attracting significant attention from researchers. However, existing studies only focus on the path planning of a single platform, which may lead to unexpected traffic congestion. This is because multiple platforms can provide route planning services, the optimal planning calculated by one single platform may be not good in practice, since multiple platforms may lead the users to the same roads, which causes unexpected traffic congestion. Although in the view of each platform, the planning is optimal. Fortunately, with the rise of data sharing and cross-platform cooperation, the data silos between different platforms are gradually being broken. Based on this, we propose Cooperative Global Path Planning(CGPP) framework to overcome the above shortcoming. CGPP allows the path planning request target platform to send some queries to cooperative platforms to optimize its path planning results. Such queries should be “easy” enough to answer, and the query frequency should be small. Based on the above principle, we design a query decision model based on multi-agent reinforcement learning in CGPP framework to decide the query range and query frequency. We design action and reward specifically for the CGPP problem. Furthermore, we propose mechanisms to enhance query precision and reduce query overhead. Specifically, the Self-adjusting Query Area(SQA) concept allows refining query parameters, while the Query Reuse Optimization(QRO) algorithm aims to minimize the number of queries. To solve potential overestimation problems in queries, we propose a Distance-based Outer Query (DB-oq) and Distance-Based Vehicle Count Estimation (DB-VCE) Model. To address the issue that the time interval computed by the QRO algorithm might not fully adapt to dynamic traffic environments, we propose the Temporal Sequence Historical Integration for Time Interval Prediction(TSHI-TIP) algorithm. Extensive experiments on real and synthetic datasets confirm the effectiveness and efficiency of our algorithms.
随着人工智能、大数据、移动通信的发展,智能交通在近年来开始流行。路径规划是智能交通的一个典型课题,受到了研究者的广泛关注。然而,现有的研究只关注单一平台的路径规划,这可能会导致意外的交通拥堵。这是因为多个平台可以提供路线规划服务,单个平台计算出的最优规划在实际应用中可能不是很好,因为多个平台可能会将用户引向相同的道路,从而导致意外的交通拥堵。虽然从每个平台的角度来看,规划是最优的。幸运的是,随着数据共享和跨平台合作的兴起,不同平台之间的数据孤岛正在逐渐被打破。基于此,我们提出了协作式全球路径规划(Cooperative Global Path Planning, CGPP)框架来克服上述缺点。CGPP允许路径规划请求目标平台向协作平台发送一些查询,以优化其路径规划结果。这样的查询应该足够“容易”回答,并且查询频率应该很小。基于上述原理,我们设计了基于CGPP框架下的多智能体强化学习的查询决策模型来确定查询范围和查询频率。我们专门针对CGPP问题设计了行动和奖励。此外,我们提出了提高查询精度和减少查询开销的机制。具体来说,自调整查询区域(SQA)概念允许细化查询参数,而查询重用优化(QRO)算法旨在最小化查询数量。为了解决查询中潜在的高估问题,我们提出了基于距离的外部查询(DB-oq)和基于距离的车辆计数估计(DB-VCE)模型。为了解决QRO算法计算的时间间隔可能不能完全适应动态交通环境的问题,我们提出了时间间隔预测的时间序列历史集成(tsi - tip)算法。在真实和合成数据集上的大量实验证实了我们算法的有效性和效率。
{"title":"Enhancing Global Path Planning via Simple Queries Across Multiple Platforms","authors":"Yurong Cheng;Xiaoxi Cui;Ye Yuan;Xiangmin Zhou;Guoren Wang","doi":"10.1109/TKDE.2025.3591460","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3591460","url":null,"abstract":"With the development of AI, Big Data, and mobile communication, intelligent transportation has become popular in recent years. Path planning is a typical topic of intelligent transportation, attracting significant attention from researchers. However, existing studies only focus on the path planning of a single platform, which may lead to unexpected traffic congestion. This is because multiple platforms can provide route planning services, the optimal planning calculated by one single platform may be not good in practice, since multiple platforms may lead the users to the same roads, which causes unexpected traffic congestion. Although in the view of each platform, the planning is optimal. Fortunately, with the rise of data sharing and cross-platform cooperation, the data silos between different platforms are gradually being broken. Based on this, we propose <underline>C</u>ooperative <underline>G</u>lobal <underline>P</u>ath <underline>P</u>lanning(CGPP) framework to overcome the above shortcoming. CGPP allows the path planning request target platform to send some queries to cooperative platforms to optimize its path planning results. Such queries should be “easy” enough to answer, and the query frequency should be small. Based on the above principle, we design a query decision model based on multi-agent reinforcement learning in CGPP framework to decide the query range and query frequency. We design action and reward specifically for the CGPP problem. Furthermore, we propose mechanisms to enhance query precision and reduce query overhead. Specifically, the Self-adjusting Query Area(SQA) concept allows refining query parameters, while the Query Reuse Optimization(QRO) algorithm aims to minimize the number of queries. To solve potential overestimation problems in queries, we propose a Distance-based Outer Query (DB-oq) and Distance-Based Vehicle Count Estimation (DB-VCE) Model. To address the issue that the time interval computed by the QRO algorithm might not fully adapt to dynamic traffic environments, we propose the Temporal Sequence Historical Integration for Time Interval Prediction(TSHI-TIP) algorithm. Extensive experiments on real and synthetic datasets confirm the effectiveness and efficiency of our algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7154-7168"},"PeriodicalIF":10.4,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}