Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533898
Sheng Xiang;Chenhao Xu;Dawei Cheng;Ying Zhang
Graph generation plays an essential role in understanding the formation of complex network structures across various fields, such as biological and social networks. Recent studies have shifted towards employing deep learning methods to grasp the topology of graphs. Yet, most current graph generators fail to adequately capture the community structure, which stands out as a critical and distinctive aspect of graphs. Additionally, these generators are generally limited to smaller graphs because of their inefficiencies and scaling challenges. This paper introduces the Community-Preserving Graph Adversarial Network (CPGAN), designed to effectively simulate graphs. CPGAN leverages graph convolution networks within its encoder and maintains shared parameters during generation to encapsulate community structure data and ensure permutation invariance. We also present the Scalable Community-Preserving Graph Attention Network (SCPGAN), aimed at enhancing the scalability of our model. SCPGAN considerably cuts down on inference and training durations, as well as GPU memory usage, through the use of an ego-graph sampling approach and a short-pipeline autoencoder framework. Tests conducted on six real-world graph datasets reveal that CPGAN manages a beneficial balance between efficiency and simulation quality when compared to leading-edge baselines. Moreover, SCPGAN marks substantial strides in model efficiency and scalability, successfully increasing the size of generated graphs to the 10 million node level while maintaining competitive quality, on par with other advanced learning models.
{"title":"Scalable Learning-Based Community-Preserving Graph Generation","authors":"Sheng Xiang;Chenhao Xu;Dawei Cheng;Ying Zhang","doi":"10.1109/TBDATA.2025.3533898","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533898","url":null,"abstract":"Graph generation plays an essential role in understanding the formation of complex network structures across various fields, such as biological and social networks. Recent studies have shifted towards employing deep learning methods to grasp the topology of graphs. Yet, most current graph generators fail to adequately capture the community structure, which stands out as a critical and distinctive aspect of graphs. Additionally, these generators are generally limited to smaller graphs because of their inefficiencies and scaling challenges. This paper introduces the Community-Preserving Graph Adversarial Network (CPGAN), designed to effectively simulate graphs. CPGAN leverages graph convolution networks within its encoder and maintains shared parameters during generation to encapsulate community structure data and ensure permutation invariance. We also present the Scalable Community-Preserving Graph Attention Network (SCPGAN), aimed at enhancing the scalability of our model. SCPGAN considerably cuts down on inference and training durations, as well as GPU memory usage, through the use of an ego-graph sampling approach and a short-pipeline autoencoder framework. Tests conducted on six real-world graph datasets reveal that CPGAN manages a beneficial balance between efficiency and simulation quality when compared to leading-edge baselines. Moreover, SCPGAN marks substantial strides in model efficiency and scalability, successfully increasing the size of generated graphs to the 10 million node level while maintaining competitive quality, on par with other advanced learning models.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2457-2470"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533924
Liang Zhang;Xingyu Wu;Yuhang Ma;Haibin Kan
As a global virtual environment, the metaverse poses various challenges regarding data storage, sharing, interoperability, and privacy preservation. Typically, a trusted third party (TTP) is considered necessary in these scenarios. However, relying on a single TTP may introduce biases, compromise privacy, or lead to single-point-of-failure problem. To address these challenges and enable secure data exchange in the metaverse, we propose a system based on decentralized TTPs and the Ethereum blockchain. First, we use the threshold ElGamal cryptosystem to create the decentralized TTPs, employing verifiable secret sharing (VSS) to force owners to share data honestly. Second, we leverage the Ethereum blockchain to serve as the public communication channel, automatic verification machine, and smart contract engine. Third, we apply discrete logarithm equality (DLEQ) algorithms to generate non-interactive zero knowledge (NIZK) proofs when encrypted data is uploaded to the blockchain. Fourth, we present an incentive mechanism to benefit data owners and TTPs from data-sharing activities, as well as a penalty policy if malicious behavior is detected. Consequently, we construct a data exchange framework for the metaverse, in which all involved entities are accountable. Finally, we perform comprehensive experiments to demonstrate the feasibility and analyze the properties of the proposed system.
{"title":"Data Exchange for the Metaverse With Accountable Decentralized TTPs and Incentive Mechanisms","authors":"Liang Zhang;Xingyu Wu;Yuhang Ma;Haibin Kan","doi":"10.1109/TBDATA.2025.3533924","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533924","url":null,"abstract":"As a global virtual environment, the metaverse poses various challenges regarding data storage, sharing, interoperability, and privacy preservation. Typically, a trusted third party (TTP) is considered necessary in these scenarios. However, relying on a single TTP may introduce biases, compromise privacy, or lead to single-point-of-failure problem. To address these challenges and enable secure data exchange in the metaverse, we propose a system based on decentralized TTPs and the Ethereum blockchain. First, we use the threshold ElGamal cryptosystem to create the decentralized TTPs, employing verifiable secret sharing (VSS) to force owners to share data honestly. Second, we leverage the Ethereum blockchain to serve as the public communication channel, automatic verification machine, and smart contract engine. Third, we apply discrete logarithm equality (DLEQ) algorithms to generate non-interactive zero knowledge (NIZK) proofs when encrypted data is uploaded to the blockchain. Fourth, we present an incentive mechanism to benefit data owners and TTPs from data-sharing activities, as well as a penalty policy if malicious behavior is detected. Consequently, we construct a data exchange framework for the metaverse, in which all involved entities are accountable. Finally, we perform comprehensive experiments to demonstrate the feasibility and analyze the properties of the proposed system.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2431-2442"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533908
Simon Nandwa Anjiri;Derui Ding;Yan Song;Ying Sun
Within the scope of location-based services and personalized recommendations, the challenges of recommending new and unvisited points of interest (POIs) to mobile users are compounded by the sparsity of check-in data. Traditional recommendation models often overlook user and POI attributes, which exacerbates data sparsity and cold-start problems. To address this issue, a novel multiplex hypergraph attribute-based graph collaborative filtering is proposed for POI recommendation to create a robust recommendation system capable of handling sparse data and cold-start scenarios. Specifically, a multiplex network hypergraph is first constructed to capture complex relationships between users, POIs, and attributes based on the similarities of attributes, visit frequencies, and preferences. Then, an adaptive variational graph auto-encoder adversarial network is developed to accurately infer the users’/POIs’ preference embeddings from their attribute distributions, which reflect complex attribute dependencies and latent structures within the data. Moreover, a dual graph neural network variant based on both Graphsage K-nearest neighbor networks and gated recurrent units are created to effectively capture various attributes of different modalities in a neighborhood, including temporal dependencies in user preferences and spatial attributes of POIs. Finally, experiments conducted on Foursquare and Yelp datasets reveal the superiority and robustness of the developed model compared to some typical state-of-the-art approaches and adequately illustrate the effectiveness of the issues with cold-start users and POIs.
{"title":"A Multiplex Hypergraph Attribute-Based Graph Collaborative Filtering for Cold-Start POI Recommendation","authors":"Simon Nandwa Anjiri;Derui Ding;Yan Song;Ying Sun","doi":"10.1109/TBDATA.2025.3533908","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533908","url":null,"abstract":"Within the scope of location-based services and personalized recommendations, the challenges of recommending new and unvisited points of interest (POIs) to mobile users are compounded by the sparsity of check-in data. Traditional recommendation models often overlook user and POI attributes, which exacerbates data sparsity and cold-start problems. To address this issue, a novel multiplex hypergraph attribute-based graph collaborative filtering is proposed for POI recommendation to create a robust recommendation system capable of handling sparse data and cold-start scenarios. Specifically, a multiplex network hypergraph is first constructed to capture complex relationships between users, POIs, and attributes based on the similarities of attributes, visit frequencies, and preferences. Then, an adaptive variational graph auto-encoder adversarial network is developed to accurately infer the users’/POIs’ preference embeddings from their attribute distributions, which reflect complex attribute dependencies and latent structures within the data. Moreover, a dual graph neural network variant based on both Graphsage K-nearest neighbor networks and gated recurrent units are created to effectively capture various attributes of different modalities in a neighborhood, including temporal dependencies in user preferences and spatial attributes of POIs. Finally, experiments conducted on Foursquare and Yelp datasets reveal the superiority and robustness of the developed model compared to some typical state-of-the-art approaches and adequately illustrate the effectiveness of the issues with cold-start users and POIs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2401-2416"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533891
Ezekiel B. Ouedraogo;Ammar Hawbani;Xingfu Wang;Zhi Liu;Liang Zhao;Mohammed A. A. Al-qaness;Saeed Hamood Alsamhi
Digital Twins are virtual representations of physical assets and systems that rely on effective Data Management to integrate, process, and analyze diverse data sources. This article comprehensively examines Data Management challenges, architectures, techniques, and applications in the context of Digital Twins. It explores key issues such as data heterogeneity, quality assurance, scalability, security, and interoperability. The paper outlines architectural approaches like centralized, distributed, cloud-based, and blockchain solutions and Data Management techniques for modeling, integration, fusion, quality management, and visualization. Domain-specific considerations across manufacturing, smart cities, healthcare, and other sectors are discussed. Finally, open research challenges related to standards, real-time data processing, intelligent Data Management, and ethical aspects are highlighted. By synthesizing the state-of-the-art, this review serves as a valuable reference for developing robust Data Management strategies that enable Digital Twin deployments.
{"title":"Digital Twin Data Management: A Comprehensive Review","authors":"Ezekiel B. Ouedraogo;Ammar Hawbani;Xingfu Wang;Zhi Liu;Liang Zhao;Mohammed A. A. Al-qaness;Saeed Hamood Alsamhi","doi":"10.1109/TBDATA.2025.3533891","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533891","url":null,"abstract":"Digital Twins are virtual representations of physical assets and systems that rely on effective Data Management to integrate, process, and analyze diverse data sources. This article comprehensively examines Data Management challenges, architectures, techniques, and applications in the context of Digital Twins. It explores key issues such as data heterogeneity, quality assurance, scalability, security, and interoperability. The paper outlines architectural approaches like centralized, distributed, cloud-based, and blockchain solutions and Data Management techniques for modeling, integration, fusion, quality management, and visualization. Domain-specific considerations across manufacturing, smart cities, healthcare, and other sectors are discussed. Finally, open research challenges related to standards, real-time data processing, intelligent Data Management, and ethical aspects are highlighted. By synthesizing the state-of-the-art, this review serves as a valuable reference for developing robust Data Management strategies that enable Digital Twin deployments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2224-2243"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1109/TBDATA.2025.3526356
{"title":"2024 Reviewers List*","authors":"","doi":"10.1109/TBDATA.2025.3526356","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3526356","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"310-313"},"PeriodicalIF":7.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10843074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-14DOI: 10.1109/TBDATA.2025.3528727
Jing Wang;Dehui Kong;Baocai Yin
Weakly supervised temporal action localization (WTAL) aims to precisely locate action instances in given videos by video-level classification supervision, which is partly related to action classification. Most existing localization works directly utilize feature encoders pre-trained for video classification tasks to extract video features, resulting in non-targeted features that lead to incomplete or over-complete action localization. Therefore, we propose Generalized Contrast Learning Network (GCLNet), in which two novel strategies are proposed to improve the pre-trained features. First, to address the issue of over-completeness, GCLNet introduces text information with good context independence and category separability to enrich the expression of video features, as well as proposes a novel generalized contrastive learning approach for similarity metrics, which facilitates pulling closer the features belonging to the same category while pushing farther apart those from different categories. Consequently, it enables more compact intra-class feature learning and ensures accurate action localization. Second, to tackle the problem of incomplete, we exploit the respective advantages of RGB and Flow features in scene appearance and temporal motion expression, designing a hybrid attention strategy in GCLNet to enhance each channel features mutually. This process greatly improves the features through establishing cross-channel consensus. Finally, we conduct extensive experiments on THUMOS14 and ActivityNet1.2, respectively, and the results show that our proposed GCLNet can produce more representative action localization features.
{"title":"GCLNet: Generalized Contrastive Learning for Weakly Supervised Temporal Action Localization","authors":"Jing Wang;Dehui Kong;Baocai Yin","doi":"10.1109/TBDATA.2025.3528727","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3528727","url":null,"abstract":"Weakly supervised temporal action localization (WTAL) aims to precisely locate action instances in given videos by video-level classification supervision, which is partly related to action classification. Most existing localization works directly utilize feature encoders pre-trained for video classification tasks to extract video features, resulting in non-targeted features that lead to incomplete or over-complete action localization. Therefore, we propose Generalized Contrast Learning Network (GCLNet), in which two novel strategies are proposed to improve the pre-trained features. First, to address the issue of over-completeness, GCLNet introduces text information with good context independence and category separability to enrich the expression of video features, as well as proposes a novel generalized contrastive learning approach for similarity metrics, which facilitates pulling closer the features belonging to the same category while pushing farther apart those from different categories. Consequently, it enables more compact intra-class feature learning and ensures accurate action localization. Second, to tackle the problem of incomplete, we exploit the respective advantages of RGB and Flow features in scene appearance and temporal motion expression, designing a hybrid attention strategy in GCLNet to enhance each channel features mutually. This process greatly improves the features through establishing cross-channel consensus. Finally, we conduct extensive experiments on THUMOS14 and ActivityNet1.2, respectively, and the results show that our proposed GCLNet can produce more representative action localization features.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2365-2375"},"PeriodicalIF":5.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge combination prediction involves analyzing current knowledge elements and their relationships, then forecasting how these elements, drawn from various fields, can be creatively combined to form new, innovative solutions. This process is critical for countries and businesses to understand future technology trends and promote innovation in an era of rapid scientific and technological advancement. Existing methods often overlook the integration of knowledge combinations from multiple views, along with their inherent heterophily and the dual “many-to-one” property, where a single knowledge combination can include multiple elements, and a single element may belong to various combinations. To this end, we propose a novel framework named Multi-view Heterogeneous HyperGNN for Heterophilic Knowledge Combination Prediction (H3KCP). Specifically, H3KCP first constructs a hypergraph reflecting the dual “many-to-one” property of knowledge combinations, where each hyperedge may contain several nodes and each node can also belong to multiple hyperedges. Next, the framework employs a multi-view fusion approach to model knowledge combinations, considering heterophily and integrating insights from co-occurrence, co-citation, and hierarchical structure-based views. Furthermore, our analysis of H3KCP from a spectral graph perspective offers insights into its rationality. Finally, extensive experiments on real-world patent datasets and the Open Academic Graph dataset validate the effectiveness and efficiency of our approach, yielding significant insights into knowledge combinations.
{"title":"Multi-View Heterogeneous HyperGNN for Heterophilic Knowledge Combination Prediction","authors":"Huijie Liu;Shulan Ruan;Han Wu;Zhenya Huang;Defu Lian;Qi Liu;Enhong Chen","doi":"10.1109/TBDATA.2025.3527216","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527216","url":null,"abstract":"Knowledge combination prediction involves analyzing current knowledge elements and their relationships, then forecasting how these elements, drawn from various fields, can be creatively combined to form new, innovative solutions. This process is critical for countries and businesses to understand future technology trends and promote innovation in an era of rapid scientific and technological advancement. Existing methods often overlook the integration of knowledge combinations from multiple views, along with their inherent heterophily and the dual “many-to-one” property, where a single knowledge combination can include multiple elements, and a single element may belong to various combinations. To this end, we propose a novel framework named Multi-view <underline>H</u>eterogeneous <underline>H</u>yperGNN for <underline>H</u>eterophilic <underline>K</u>nowledge <underline>C</u>ombination <underline>P</u>rediction (H3KCP). Specifically, H3KCP first constructs a hypergraph reflecting the dual “many-to-one” property of knowledge combinations, where each hyperedge may contain several nodes and each node can also belong to multiple hyperedges. Next, the framework employs a multi-view fusion approach to model knowledge combinations, considering heterophily and integrating insights from co-occurrence, co-citation, and hierarchical structure-based views. Furthermore, our analysis of H3KCP from a spectral graph perspective offers insights into its rationality. Finally, extensive experiments on real-world patent datasets and the Open Academic Graph dataset validate the effectiveness and efficiency of our approach, yielding significant insights into knowledge combinations.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2321-2337"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous FL and summarize the research challenges in FL in terms of five aspects: data, model, task, device and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of FL, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous FL environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous FL.
{"title":"Advances in Robust Federated Learning: A Survey With Heterogeneity Considerations","authors":"Chuan Chen;Tianchi Liao;Xiaojun Deng;Zihou Wu;Sheng Huang;Zibin Zheng","doi":"10.1109/TBDATA.2025.3527202","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527202","url":null,"abstract":"In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous FL and summarize the research challenges in FL in terms of five aspects: data, model, task, device and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of FL, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous FL environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous FL.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1548-1567"},"PeriodicalIF":7.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08DOI: 10.1109/TBDATA.2025.3527230
Junwei Yin;Min Gao;Kai Shu;Zehua Zhao;Yinqiu Huang;Jia Wang
The wide dissemination of fake news has affected our lives in many aspects, making fake news detection important and attracting increasing attention. Existing approaches make substantial contributions in this field by modeling news from a single-modal or multi-modal perspective. However, these modal-based methods can result in sub-optimal outcomes as they ignore reader behaviors in news consumption and authenticity verification. For instance, they haven't taken into consideration the component-by-component reading process: from the headline, images, comments, to the body, which is essential for modeling news with more granularity. To this end, we propose an approach of Emulating the behaviors of readers (Ember) for fake news detection on social media, incorporating readers’ reading and verificating process to model news from the component perspective thoroughly. Specifically, we first construct intra-component feature extractors to emulate the behaviors of semantic analyzing on each component. Then, we design a module that comprises inter-component feature extractors and a sequence-based aggregator. This module mimics the process of verifying the correlation between components and the overall reading and verification sequence. Thus, Ember can handle the news with various components by emulating corresponding sequences. We conduct extensive experiments on nine real-world datasets, and the results demonstrate the superiority of Ember.
{"title":"Emulating Reader Behaviors for Fake News Detection","authors":"Junwei Yin;Min Gao;Kai Shu;Zehua Zhao;Yinqiu Huang;Jia Wang","doi":"10.1109/TBDATA.2025.3527230","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527230","url":null,"abstract":"The wide dissemination of fake news has affected our lives in many aspects, making fake news detection important and attracting increasing attention. Existing approaches make substantial contributions in this field by modeling news from a single-modal or multi-modal perspective. However, these modal-based methods can result in sub-optimal outcomes as they ignore reader behaviors in news consumption and authenticity verification. For instance, they haven't taken into consideration the component-by-component reading process: from the headline, images, comments, to the body, which is essential for modeling news with more granularity. To this end, we propose an approach of <underline>Em</u>ulating the <underline>be</u>haviors of <underline>r</u>eaders (Ember) for fake news detection on social media, incorporating readers’ reading and verificating process to model news from the component perspective thoroughly. Specifically, we first construct intra-component feature extractors to emulate the behaviors of semantic analyzing on each component. Then, we design a module that comprises inter-component feature extractors and a sequence-based aggregator. This module mimics the process of verifying the correlation between components and the overall reading and verification sequence. Thus, Ember can handle the news with various components by emulating corresponding sequences. We conduct extensive experiments on nine real-world datasets, and the results demonstrate the superiority of Ember.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2353-2364"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08DOI: 10.1109/TBDATA.2025.3527200
Zuodong Jin;Peng Qi;Muyan Yao;Dan Tao
With the widespread application of Big Data and intelligent information systems, the tenant has become the main form of most scenarios. As a data mining technique, the portrait has been widely used to provide targeted services. Therefore, we transfer the traditional user-driven portrait into tenant driven for churn prediction. To achieve it, this paper first proposes a three-layer architecture and defines the fine-grained features for creating portraits from the perspective of tenants. In a large-scale telecommunication industry dataset of 100,000 tenants, we construct the tenant portrait through the proposed framework, and analyze the influences of the defined features on churn possibility. Then, considering the information missing caused by privacy concerns, we come up with the CrossMatch, a portrait completion model based on semi-supervised and graph convolution, which combines the relation characteristics among tenants for recovering missing information. On this basis, we design the tenant churn prediction method based on a directed attention network. Moreover, we recover missing information on three public node datasets with CrossMatch, achieving around 1-2$%$ improvement. We then apply the directed attention network for churn prediction and achieve an Accuracy of 75.06$%$, Precision of 77.78$%$, and F1-score of 71.43$%$, which outperforms all the baselines.
{"title":"Portraying Fine-Grained Tenant Portrait for Churn Prediction Using Semi-Supervised Graph Convolution and Attention Network","authors":"Zuodong Jin;Peng Qi;Muyan Yao;Dan Tao","doi":"10.1109/TBDATA.2025.3527200","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527200","url":null,"abstract":"With the widespread application of Big Data and intelligent information systems, the tenant has become the main form of most scenarios. As a data mining technique, the portrait has been widely used to provide targeted services. Therefore, we transfer the traditional user-driven portrait into tenant driven for churn prediction. To achieve it, this paper first proposes a three-layer architecture and defines the fine-grained features for creating portraits from the perspective of tenants. In a large-scale telecommunication industry dataset of 100,000 tenants, we construct the tenant portrait through the proposed framework, and analyze the influences of the defined features on churn possibility. Then, considering the information missing caused by privacy concerns, we come up with the <i>CrossMatch</i>, a portrait completion model based on semi-supervised and graph convolution, which combines the relation characteristics among tenants for recovering missing information. On this basis, we design the tenant churn prediction method based on a directed attention network. Moreover, we recover missing information on three public node datasets with <i>CrossMatch</i>, achieving around 1-2<inline-formula><tex-math>$%$</tex-math></inline-formula> improvement. We then apply the directed attention network for churn prediction and achieve an Accuracy of 75.06<inline-formula><tex-math>$%$</tex-math></inline-formula>, Precision of 77.78<inline-formula><tex-math>$%$</tex-math></inline-formula>, and F1-score of 71.43<inline-formula><tex-math>$%$</tex-math></inline-formula>, which outperforms all the baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2296-2307"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}