Pub Date : 2025-10-06DOI: 10.1109/TBDATA.2025.3618450
Pu Wang;YinSong Xiong;Zhuoran Zheng
Label enhancement is a novel label shift strategy that aims to integrate the feature space with the logical label space to obtain a high-quality label distribution. This label distribution can serve as a soft target for algorithmic learning, akin to label smoothing, thereby enhancing the performance of various learning paradigms including multi-label learning, single positive multi-label learning, and partial-label learning. However, limited by dataset type and annotation inaccuracy, the same label enhancement algorithm on different datasets struggles to achieve consistent performance, for reasons derived from the following two insights: 1) Differential Contribution of Feature Space and Logical Label Space: The feature space and logical label space of different datasets contribute differently to generating an accurate label distribution; 2) Presence of Noise and Incorrect Labels: Some datasets contain noise and inaccurately labeled samples, leading to divergent outputs for similar inputs. To address these challenges, we propose leveraging CLIP (Contrastive Language-Image Pre-training) as a foundational strategy, treating the feature space and the logical label space as two distinct modalities. By recoding these modalities before applying the label enhancement algorithm, we aim to achieve a fair and robust representation. In addition, we further explained the reasonableness of our motives in the discussion session. Extensive experimental results demonstrate the effectiveness of our approach to help existing label enhancement algorithms improve their performance on several benchmarks.
{"title":"CLIP2LE: A Label Enhancement Fair Representation Method via CLIP","authors":"Pu Wang;YinSong Xiong;Zhuoran Zheng","doi":"10.1109/TBDATA.2025.3618450","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3618450","url":null,"abstract":"Label enhancement is a novel label shift strategy that aims to integrate the feature space with the logical label space to obtain a high-quality label distribution. This label distribution can serve as a soft target for algorithmic learning, akin to label smoothing, thereby enhancing the performance of various learning paradigms including multi-label learning, single positive multi-label learning, and partial-label learning. However, limited by dataset type and annotation inaccuracy, the same label enhancement algorithm on different datasets struggles to achieve consistent performance, for reasons derived from the following two insights: 1) Differential Contribution of Feature Space and Logical Label Space: The feature space and logical label space of different datasets contribute differently to generating an accurate label distribution; 2) Presence of Noise and Incorrect Labels: Some datasets contain noise and inaccurately labeled samples, leading to divergent outputs for similar inputs. To address these challenges, we propose leveraging CLIP (Contrastive Language-Image Pre-training) as a foundational strategy, treating the feature space and the logical label space as two distinct modalities. By recoding these modalities before applying the label enhancement algorithm, we aim to achieve a fair and robust representation. In addition, we further explained the reasonableness of our motives in the discussion session. Extensive experimental results demonstrate the effectiveness of our approach to help existing label enhancement algorithms improve their performance on several benchmarks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"224-235"},"PeriodicalIF":5.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1109/TBDATA.2024.3417057
Xiaowen Chu;Wei Wang;Cong Wang;Yang Liu;Rongfei Zeng;Christopher G. Brinton
{"title":"Guest Editorial Special Issue on Federated Learning for Big Data Applications","authors":"Xiaowen Chu;Wei Wang;Cong Wang;Yang Liu;Rongfei Zeng;Christopher G. Brinton","doi":"10.1109/TBDATA.2024.3417057","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3417057","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2099-2101"},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11149636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/TBDATA.2025.3604175
Chen Jiang;Lei Wang;Changqing Yu;Zhuhong You;Xinfei Wang;Mengmeng Wei;Mianshuo Lu
Circular RNAs (circRNAs) are non-coding RNA molecules that play a crucial role in regulating genes and contributing to disease progression. CircRNAs can function as sponges for microRNAs (miRNAs), thereby regulating gene expression and influencing disease outcomes. Identifying associations between circRNAs and miRNAs through computational methods enhances the understanding of complex disease mechanisms and offers a reliable tool for pre-selecting candidates for experimental validation. Existing models, however, are limited in their ability to capture either global or local node information, the prediction of circRNA and miRNA interactions is still challenging. In order to effectively deal with this problem, we propose a novel framework for predicting circRNA-miRNA interactions (CMIs), known as MuGNet-CMI, which leverages multi-head hybrid graph neural network and global high-order and local low-order information. The model employs the MetaPath2Vec algorithm to generate high-quality node embeddings within the circRNA-miRNA heterogeneous matrix. The multi-head dynamic attention mechanism, combined with GraphSAGE, is incorporated to efficiently capture both global high-order and local low-order node information. Additionally, we integrate neural aggregators into the multi-head dynamic attention mechanism to aggregate feature information from the captured nodes. Validation using three real datasets demonstrates that MuGNet-CMI delivers good performance in predicting CMIs, offering valuable insights to guide experimental research in gene regulation.
{"title":"MuGNet-CMI: Multi-Head Hybrid Graph Neural Network for Predicting circRNA-miRNA Interactions With Global High-Order and Local Low-Order Information","authors":"Chen Jiang;Lei Wang;Changqing Yu;Zhuhong You;Xinfei Wang;Mengmeng Wei;Mianshuo Lu","doi":"10.1109/TBDATA.2025.3604175","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3604175","url":null,"abstract":"Circular RNAs (circRNAs) are non-coding RNA molecules that play a crucial role in regulating genes and contributing to disease progression. CircRNAs can function as sponges for microRNAs (miRNAs), thereby regulating gene expression and influencing disease outcomes. Identifying associations between circRNAs and miRNAs through computational methods enhances the understanding of complex disease mechanisms and offers a reliable tool for pre-selecting candidates for experimental validation. Existing models, however, are limited in their ability to capture either global or local node information, the prediction of circRNA and miRNA interactions is still challenging. In order to effectively deal with this problem, we propose a novel framework for predicting circRNA-miRNA interactions (CMIs), known as MuGNet-CMI, which leverages multi-head hybrid graph neural network and global high-order and local low-order information. The model employs the MetaPath2Vec algorithm to generate high-quality node embeddings within the circRNA-miRNA heterogeneous matrix. The multi-head dynamic attention mechanism, combined with GraphSAGE, is incorporated to efficiently capture both global high-order and local low-order node information. Additionally, we integrate neural aggregators into the multi-head dynamic attention mechanism to aggregate feature information from the captured nodes. Validation using three real datasets demonstrates that MuGNet-CMI delivers good performance in predicting CMIs, offering valuable insights to guide experimental research in gene regulation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"159-173"},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/TBDATA.2025.3604177
K Naveen Kumar;Srinivasa Rao Chalamala;Ajeet Kumar Singh;C Krishna Mohan
Federated learning (FL) has emerged as a promising solution to enable distributed learning without sharing sensitive data. However, FL is vulnerable to data poisoning attacks, where malicious clients inject malicious data during training to compromise the global model. Existing FL defenses suffer from the assumptions of independent and identically distributed (IID) model updates, asymptotic optimal error rate bounds, and strong convexity in the optimization problem. Hence, we propose a novel framework called Federated Learning Optimal Transport (FLOT) that leverages the Wasserstein barycentric technique to obtain a global model from a set of locally trained non-IID models on client devices. In addition, we introduce a loss function-based rejection (LFR) mechanism to suppress malicious updates and a dynamic weighting scheme to optimize the Wasserstein barycentric aggregation function. We provide the theoretical proof of the Byzantine resilience and convergence of FLOT to highlight its efficacy. We evaluate FLOT on four benchmark datasets: GTSRB, KBTS, CIFAR10, and EMNIST. The experimental results underscore the practical significance of FLOT as an effective defense mechanism against data poisoning attacks in FL while maintaining high accuracy and scalability. Also, we observe that FLOT serves as a robust client selection technique under no attack, which demonstrates its effectiveness.
{"title":"Optimal Transport Barycentric Aggregation for Byzantine-Resilient Federated Learning","authors":"K Naveen Kumar;Srinivasa Rao Chalamala;Ajeet Kumar Singh;C Krishna Mohan","doi":"10.1109/TBDATA.2025.3604177","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3604177","url":null,"abstract":"Federated learning (FL) has emerged as a promising solution to enable distributed learning without sharing sensitive data. However, FL is vulnerable to data poisoning attacks, where malicious clients inject malicious data during training to compromise the global model. Existing FL defenses suffer from the assumptions of independent and identically distributed (IID) model updates, asymptotic optimal error rate bounds, and strong convexity in the optimization problem. Hence, we propose a novel framework called Federated Learning Optimal Transport (FLOT) that leverages the Wasserstein barycentric technique to obtain a global model from a set of locally trained non-IID models on client devices. In addition, we introduce a loss function-based rejection (LFR) mechanism to suppress malicious updates and a dynamic weighting scheme to optimize the Wasserstein barycentric aggregation function. We provide the theoretical proof of the Byzantine resilience and convergence of FLOT to highlight its efficacy. We evaluate FLOT on four benchmark datasets: GTSRB, KBTS, CIFAR10, and EMNIST. The experimental results underscore the practical significance of FLOT as an effective defense mechanism against data poisoning attacks in FL while maintaining high accuracy and scalability. Also, we observe that FLOT serves as a robust client selection technique under no attack, which demonstrates its effectiveness.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"174-185"},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/TBDATA.2025.3604171
Owen Randall;Luke Schultz;Paul Lu
Variable-sized, content-defined deduplication is a technique to find and eliminate redundant chunks of data for efficient data backups, reduced data transfers, and reduced data-storage overheads. For big datasets, especially with incremental updates over time such as backups and gathered data, deduplication makes data management faster and more efficient. While many existing deduplication systems use default expected chunk lengths such as 4 KB or 8 KB, they are suboptimal. Poorly optimized deduplication systems can significantly increase storage costs and network usage, making large datasets prohibitively expensive to manage. We present the design, implementation, and an empirical validation of our Deduplication Change-Estimation Analytical Model (DCAM) which predicts the performance of sliding window-based deduplication parameters on any given dataset, to be used for parameter optimization. Our empirical evaluation includes workloads based on source code (Linux kernel, Kubernetes, TensorFlow), open-research datasets (CORD-19), and articles (Wikipedia). Validated using both our system and the Destor deduplication system, a DCAM-based search finds deduplication parameters that require up to 3.8× less storage relative to a common baseline. DCAM Search optimizes parameters up to 19.8× faster than previously possible, and the size of the resulting deduplicated datasets are all within 5.15% of the best results found by searching using actual deduplication.
{"title":"Optimizing Deduplication Parameters via a Change-Estimation Analytical Model","authors":"Owen Randall;Luke Schultz;Paul Lu","doi":"10.1109/TBDATA.2025.3604171","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3604171","url":null,"abstract":"Variable-sized, content-defined deduplication is a technique to find and eliminate redundant chunks of data for efficient data backups, reduced data transfers, and reduced data-storage overheads. For big datasets, especially with incremental updates over time such as backups and gathered data, deduplication makes data management faster and more efficient. While many existing deduplication systems use default expected chunk lengths such as 4 KB or 8 KB, they are suboptimal. Poorly optimized deduplication systems can significantly increase storage costs and network usage, making large datasets prohibitively expensive to manage. We present the design, implementation, and an empirical validation of our Deduplication Change-Estimation Analytical Model (DCAM) which predicts the performance of sliding window-based deduplication parameters on any given dataset, to be used for parameter optimization. Our empirical evaluation includes workloads based on source code (Linux kernel, Kubernetes, TensorFlow), open-research datasets (CORD-19), and articles (Wikipedia). Validated using both our system and the Destor deduplication system, a DCAM-based search finds deduplication parameters that require up to 3.8× less storage relative to a common baseline. DCAM Search optimizes parameters up to 19.8× faster than previously possible, and the size of the resulting deduplicated datasets are all within 5.15% of the best results found by searching using actual deduplication.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"135-146"},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid advancement of scRNA-seq has generated massive data for cell type annotation. However, current automated annotation methods remain limited: most approaches separately model either cell-cell similarities or gene-gene relationships, neglecting their synergistic effects, which leads to suboptimal accuracy and poor biological interpretability. To address this, we propose scProGraph, a prototype-guided graph neural network that jointly models cell type classification and functional gene subgraph discovery. By constructing a cell similarity graph and incorporating cell-type prototypes as prior anchors, our method simultaneously optimizes classification boundaries and the interpretability of gene subgraphs. Experiments on seven independent datasets spanning three disease categories demonstrate that scProGraph achieves over 90% accuracy on four datasets and exceeds 80% on six datasets, outperforming state-of-the-art methods. Further analysis reveals that the gene subgraphs extracted by scProGraph for Macrophage, Fibroblast, and Monocyte cover 26.92%, 26.83%, and 22.22% of a protein-protein interaction networks dataset, respectively, validating the biological relevance of the identified gene modules. This study not only provides a high-accuracy tool for single-cell annotation but also opens new avenues for discovering novel biomarkers and regulatory mechanisms through gene relationship mining.
{"title":"scProGraph: A Cell Bagging Strategy for Cell Type Annotation With Gene Interaction-Aware Explainability","authors":"Xinyuan Li;Yue-Chao Li;Hai-Ru You;Xuequn Shang;Leon Wong;Zhi-An Huang;Zhu-Hong You;Yu-An Huang","doi":"10.1109/TBDATA.2025.3604169","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3604169","url":null,"abstract":"The rapid advancement of scRNA-seq has generated massive data for cell type annotation. However, current automated annotation methods remain limited: most approaches separately model either cell-cell similarities or gene-gene relationships, neglecting their synergistic effects, which leads to suboptimal accuracy and poor biological interpretability. To address this, we propose scProGraph, a prototype-guided graph neural network that jointly models cell type classification and functional gene subgraph discovery. By constructing a cell similarity graph and incorporating cell-type prototypes as prior anchors, our method simultaneously optimizes classification boundaries and the interpretability of gene subgraphs. Experiments on seven independent datasets spanning three disease categories demonstrate that scProGraph achieves over 90% accuracy on four datasets and exceeds 80% on six datasets, outperforming state-of-the-art methods. Further analysis reveals that the gene subgraphs extracted by scProGraph for Macrophage, Fibroblast, and Monocyte cover 26.92%, 26.83%, and 22.22% of a protein-protein interaction networks dataset, respectively, validating the biological relevance of the identified gene modules. This study not only provides a high-accuracy tool for single-cell annotation but also opens new avenues for discovering novel biomarkers and regulatory mechanisms through gene relationship mining.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"147-158"},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-28DOI: 10.1109/TBDATA.2025.3603433
Mohammad Wali Ur Rahman;Ric Nevarez;Lamia Tasnim Mim;Salim Hariri
The high dimensional and semantically complex nature of textual Big data presents significant challenges for text clustering, which frequently lead to suboptimal groupings when using conventional techniques like k-means or hierarchical clustering. This work presents Semantic Deep Embedded Clustering (SDEC), an unsupervised text clustering framework that combines an improved autoencoder with transformer-based embeddings to overcome these challenges. This novel method preserves semantic relationships during data reconstruction by combining Mean Squared Error (MSE) and Cosine Similarity Loss (CSL) within an autoencoder. Furthermore, a semantic refinement stage that takes advantage of the contextual richness of transformer embeddings is used by SDEC to further improve a clustering layer with soft cluster assignments and distributional loss. The capabilities of SDEC are demonstrated by extensive testing on five benchmark datasets: AG News, Yahoo! Answers, DBPedia, Reuters 2, and Reuters 5. The framework not only outperformed existing methods with a clustering accuracy of 85.7% on AG News and set a new benchmark of 53.63% on Yahoo! Answers, but also showed robust performance across other diverse text corpora. These findings highlight the significant improvements in accuracy and semantic comprehension of text data provided by SDEC's advances in unsupervised text clustering.
{"title":"SDEC: Semantic Deep Embedded Clustering","authors":"Mohammad Wali Ur Rahman;Ric Nevarez;Lamia Tasnim Mim;Salim Hariri","doi":"10.1109/TBDATA.2025.3603433","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3603433","url":null,"abstract":"The high dimensional and semantically complex nature of textual Big data presents significant challenges for text clustering, which frequently lead to suboptimal groupings when using conventional techniques like k-means or hierarchical clustering. This work presents Semantic Deep Embedded Clustering (SDEC), an unsupervised text clustering framework that combines an improved autoencoder with transformer-based embeddings to overcome these challenges. This novel method preserves semantic relationships during data reconstruction by combining Mean Squared Error (MSE) and Cosine Similarity Loss (CSL) within an autoencoder. Furthermore, a semantic refinement stage that takes advantage of the contextual richness of transformer embeddings is used by SDEC to further improve a clustering layer with soft cluster assignments and distributional loss. The capabilities of SDEC are demonstrated by extensive testing on five benchmark datasets: <italic>AG News, Yahoo! Answers, DBPedia, Reuters 2,</i> and <italic>Reuters 5</i>. The framework not only outperformed existing methods with a clustering accuracy of 85.7% on <italic>AG News</i> and set a new benchmark of 53.63% on <italic>Yahoo! Answers</i>, but also showed robust performance across other diverse text corpora. These findings highlight the significant improvements in accuracy and semantic comprehension of text data provided by SDEC's advances in unsupervised text clustering.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"119-134"},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A community-imbalanced graph refers to a graph containing multiple communities with large differences in node and edge scales. Graph sampling is a widely used graph reduction technique to accelerate graph computations and simplify graph visualizations. However, existing graph sampling algorithms may encounter several problems, including the loss of small communities, disconnections between communities, and distortions of community scale distribution, on maintaining the community structures in a community-imbalanced graph. In this work, a new quality indicator is proposed to determine if a graph can be regarded as a community-imbalanced graph. A community-imbalanced graph sampling (CIGS) algorithm is proposed to address the community-imbalanced graph sampling problems. Three new evaluation metrics are proposed to assess the performance of community structure maintenance of graph sampling. An algorithm performance experiment and a user study are conducted to evaluate the effectiveness of the proposed CIGS.
{"title":"Community-Imbalanced Graph Sampling","authors":"Ying Zhao;Genghuai Bai;Yusheng Qiu;Yiwen Liu;Chuhan Zhang;Chi Han;Yitao Wu;Kehua Guo;Jian Zhang;Fangfang Zhou","doi":"10.1109/TBDATA.2025.3600032","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3600032","url":null,"abstract":"A community-imbalanced graph refers to a graph containing multiple communities with large differences in node and edge scales. Graph sampling is a widely used graph reduction technique to accelerate graph computations and simplify graph visualizations. However, existing graph sampling algorithms may encounter several problems, including the loss of small communities, disconnections between communities, and distortions of community scale distribution, on maintaining the community structures in a community-imbalanced graph. In this work, a new quality indicator is proposed to determine if a graph can be regarded as a community-imbalanced graph. A community-imbalanced graph sampling (CIGS) algorithm is proposed to address the community-imbalanced graph sampling problems. Three new evaluation metrics are proposed to assess the performance of community structure maintenance of graph sampling. An algorithm performance experiment and a user study are conducted to evaluate the effectiveness of the proposed CIGS.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"105-118"},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge-based Visual Question Answering (VQA) involves answering questions based not only on the given image, but also on external knowledge. Existing methods for knowledge-based VQA can be classified into two main categories: those that rely on external knowledge bases, and those that use Large Language Models (LLMs) as implicit knowledge engines. However, the former approach heavily relies on the quality of information retrieval, introducing additional information bias to the entire system. And the latter approach suffers from the extremely high computational cost and the loss of image information. To address these issues, we propose a novel framework called TAG that reformulates knowledge-based VQA as a contrastive learning problem. We innovatively propose a triple asymmetric paradigm, which aligns a lightweight text encoder to the image space with an extremely low training cost (0.0152B trainable parameters), and enhance its understanding ability on semantic granularity. TAG is both computation-efficient and effective, and we evaluate it on the knowledge-based VQA datasets, A-OKVQA, OK-VQA and VCR. The results show that TAG (0.387B) achieves the state-of-the-art performance when compared to methods using less than 1B parameters. Besides, TAG still shows competitive performance when compared to methods with LLM.
{"title":"TAG: Triple Alignment With Rationale Generation for Knowledge-Based Visual Question Answering","authors":"Sihang Cai;Xuan Lin;Wenqiang Xu;Jingtong Wu;Tao Jin;Zhou Zhao;Fei Wu;Jun Yu","doi":"10.1109/TBDATA.2025.3600012","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3600012","url":null,"abstract":"Knowledge-based Visual Question Answering (VQA) involves answering questions based not only on the given image, but also on external knowledge. Existing methods for knowledge-based VQA can be classified into two main categories: those that rely on external knowledge bases, and those that use Large Language Models (LLMs) as implicit knowledge engines. However, the former approach heavily relies on the quality of information retrieval, introducing additional information bias to the entire system. And the latter approach suffers from the extremely high computational cost and the loss of image information. To address these issues, we propose a novel framework called TAG that reformulates knowledge-based VQA as a contrastive learning problem. We innovatively propose a triple asymmetric paradigm, which aligns a lightweight text encoder to the image space with an extremely low training cost (0.0152B trainable parameters), and enhance its understanding ability on semantic granularity. TAG is both computation-efficient and effective, and we evaluate it on the knowledge-based VQA datasets, A-OKVQA, OK-VQA and VCR. The results show that TAG (0.387B) achieves the state-of-the-art performance when compared to methods using less than 1B parameters. Besides, TAG still shows competitive performance when compared to methods with LLM.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"47-61"},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.
{"title":"Can GNNs Learn Link Heuristics? a Concise Review and Evaluation of Link Prediction Methods","authors":"Shuming Liang;Yu Ding;Zhidong Li;Bin Liang;Siqi Zhang;Yang Wang;Fang Chen","doi":"10.1109/TBDATA.2025.3600031","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3600031","url":null,"abstract":"This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"12 1","pages":"1-14"},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}