首页 > 最新文献

IEEE Transactions on Big Data最新文献

英文 中文
Digital Twin Data Management: A Comprehensive Review 数字孪生数据管理:全面回顾
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-27 DOI: 10.1109/TBDATA.2025.3533891
Ezekiel B. Ouedraogo;Ammar Hawbani;Xingfu Wang;Zhi Liu;Liang Zhao;Mohammed A. A. Al-qaness;Saeed Hamood Alsamhi
Digital Twins are virtual representations of physical assets and systems that rely on effective Data Management to integrate, process, and analyze diverse data sources. This article comprehensively examines Data Management challenges, architectures, techniques, and applications in the context of Digital Twins. It explores key issues such as data heterogeneity, quality assurance, scalability, security, and interoperability. The paper outlines architectural approaches like centralized, distributed, cloud-based, and blockchain solutions and Data Management techniques for modeling, integration, fusion, quality management, and visualization. Domain-specific considerations across manufacturing, smart cities, healthcare, and other sectors are discussed. Finally, open research challenges related to standards, real-time data processing, intelligent Data Management, and ethical aspects are highlighted. By synthesizing the state-of-the-art, this review serves as a valuable reference for developing robust Data Management strategies that enable Digital Twin deployments.
数字孪生是物理资产和系统的虚拟表示,依赖于有效的数据管理来集成、处理和分析不同的数据源。本文全面研究了数字孪生环境中的数据管理挑战、体系结构、技术和应用程序。它探讨了数据异构、质量保证、可伸缩性、安全性和互操作性等关键问题。本文概述了体系结构方法,如集中式、分布式、基于云的和区块链解决方案,以及用于建模、集成、融合、质量管理和可视化的数据管理技术。讨论了制造业、智能城市、医疗保健和其他行业的特定领域考虑事项。最后,强调了与标准、实时数据处理、智能数据管理和伦理方面相关的开放研究挑战。通过综合最新技术,本综述为开发健壮的数据管理策略提供了有价值的参考,从而实现数字孪生部署。
{"title":"Digital Twin Data Management: A Comprehensive Review","authors":"Ezekiel B. Ouedraogo;Ammar Hawbani;Xingfu Wang;Zhi Liu;Liang Zhao;Mohammed A. A. Al-qaness;Saeed Hamood Alsamhi","doi":"10.1109/TBDATA.2025.3533891","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533891","url":null,"abstract":"Digital Twins are virtual representations of physical assets and systems that rely on effective Data Management to integrate, process, and analyze diverse data sources. This article comprehensively examines Data Management challenges, architectures, techniques, and applications in the context of Digital Twins. It explores key issues such as data heterogeneity, quality assurance, scalability, security, and interoperability. The paper outlines architectural approaches like centralized, distributed, cloud-based, and blockchain solutions and Data Management techniques for modeling, integration, fusion, quality management, and visualization. Domain-specific considerations across manufacturing, smart cities, healthcare, and other sectors are discussed. Finally, open research challenges related to standards, real-time data processing, intelligent Data Management, and ethical aspects are highlighted. By synthesizing the state-of-the-art, this review serves as a valuable reference for developing robust Data Management strategies that enable Digital Twin deployments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2224-2243"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2024 Reviewers List* 2024审稿人名单*
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-15 DOI: 10.1109/TBDATA.2025.3526356
{"title":"2024 Reviewers List*","authors":"","doi":"10.1109/TBDATA.2025.3526356","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3526356","url":null,"abstract":"","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"310-313"},"PeriodicalIF":7.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10843074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GCLNet: Generalized Contrastive Learning for Weakly Supervised Temporal Action Localization 弱监督时间动作定位的广义对比学习
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-14 DOI: 10.1109/TBDATA.2025.3528727
Jing Wang;Dehui Kong;Baocai Yin
Weakly supervised temporal action localization (WTAL) aims to precisely locate action instances in given videos by video-level classification supervision, which is partly related to action classification. Most existing localization works directly utilize feature encoders pre-trained for video classification tasks to extract video features, resulting in non-targeted features that lead to incomplete or over-complete action localization. Therefore, we propose Generalized Contrast Learning Network (GCLNet), in which two novel strategies are proposed to improve the pre-trained features. First, to address the issue of over-completeness, GCLNet introduces text information with good context independence and category separability to enrich the expression of video features, as well as proposes a novel generalized contrastive learning approach for similarity metrics, which facilitates pulling closer the features belonging to the same category while pushing farther apart those from different categories. Consequently, it enables more compact intra-class feature learning and ensures accurate action localization. Second, to tackle the problem of incomplete, we exploit the respective advantages of RGB and Flow features in scene appearance and temporal motion expression, designing a hybrid attention strategy in GCLNet to enhance each channel features mutually. This process greatly improves the features through establishing cross-channel consensus. Finally, we conduct extensive experiments on THUMOS14 and ActivityNet1.2, respectively, and the results show that our proposed GCLNet can produce more representative action localization features.
弱监督时态动作定位(WTAL)的目的是通过视频级分类监督来精确定位给定视频中的动作实例,这与动作分类有一定的关系。大多数现有的定位工作直接使用针对视频分类任务预先训练的特征编码器来提取视频特征,导致非目标特征导致不完整或过完整的动作定位。因此,我们提出了广义对比学习网络(GCLNet),其中提出了两种新的策略来改进预训练的特征。首先,为了解决过度完备的问题,GCLNet引入了具有良好上下文独立性和类别可分性的文本信息,丰富了视频特征的表达,并提出了一种新的相似度度量的广义对比学习方法,使属于同一类别的特征更接近,而属于不同类别的特征更远离。因此,它可以实现更紧凑的类内特征学习,并确保准确的动作定位。其次,为了解决不完全问题,利用RGB和Flow特征在场景外观和时间运动表达方面的各自优势,在GCLNet中设计了一种混合注意策略,以相互增强各通道特征。这一过程通过建立跨渠道共识,极大地改善了特征。最后,我们分别在THUMOS14和ActivityNet1.2上进行了大量的实验,结果表明我们提出的GCLNet可以产生更多具有代表性的动作定位特征。
{"title":"GCLNet: Generalized Contrastive Learning for Weakly Supervised Temporal Action Localization","authors":"Jing Wang;Dehui Kong;Baocai Yin","doi":"10.1109/TBDATA.2025.3528727","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3528727","url":null,"abstract":"Weakly supervised temporal action localization (WTAL) aims to precisely locate action instances in given videos by video-level classification supervision, which is partly related to action classification. Most existing localization works directly utilize feature encoders pre-trained for video classification tasks to extract video features, resulting in non-targeted features that lead to incomplete or over-complete action localization. Therefore, we propose Generalized Contrast Learning Network (GCLNet), in which two novel strategies are proposed to improve the pre-trained features. First, to address the issue of over-completeness, GCLNet introduces text information with good context independence and category separability to enrich the expression of video features, as well as proposes a novel generalized contrastive learning approach for similarity metrics, which facilitates pulling closer the features belonging to the same category while pushing farther apart those from different categories. Consequently, it enables more compact intra-class feature learning and ensures accurate action localization. Second, to tackle the problem of incomplete, we exploit the respective advantages of RGB and Flow features in scene appearance and temporal motion expression, designing a hybrid attention strategy in GCLNet to enhance each channel features mutually. This process greatly improves the features through establishing cross-channel consensus. Finally, we conduct extensive experiments on THUMOS14 and ActivityNet1.2, respectively, and the results show that our proposed GCLNet can produce more representative action localization features.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2365-2375"},"PeriodicalIF":5.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-View Heterogeneous HyperGNN for Heterophilic Knowledge Combination Prediction 多视图异构HyperGNN的异亲性知识组合预测
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527216
Huijie Liu;Shulan Ruan;Han Wu;Zhenya Huang;Defu Lian;Qi Liu;Enhong Chen
Knowledge combination prediction involves analyzing current knowledge elements and their relationships, then forecasting how these elements, drawn from various fields, can be creatively combined to form new, innovative solutions. This process is critical for countries and businesses to understand future technology trends and promote innovation in an era of rapid scientific and technological advancement. Existing methods often overlook the integration of knowledge combinations from multiple views, along with their inherent heterophily and the dual “many-to-one” property, where a single knowledge combination can include multiple elements, and a single element may belong to various combinations. To this end, we propose a novel framework named Multi-view Heterogeneous HyperGNN for Heterophilic Knowledge Combination Prediction (H3KCP). Specifically, H3KCP first constructs a hypergraph reflecting the dual “many-to-one” property of knowledge combinations, where each hyperedge may contain several nodes and each node can also belong to multiple hyperedges. Next, the framework employs a multi-view fusion approach to model knowledge combinations, considering heterophily and integrating insights from co-occurrence, co-citation, and hierarchical structure-based views. Furthermore, our analysis of H3KCP from a spectral graph perspective offers insights into its rationality. Finally, extensive experiments on real-world patent datasets and the Open Academic Graph dataset validate the effectiveness and efficiency of our approach, yielding significant insights into knowledge combinations.
知识组合预测包括分析现有的知识元素及其关系,然后预测这些元素如何从各个领域汲取,创造性地组合起来,形成新的、创新的解决方案。在科技飞速发展的时代,这一过程对于国家和企业了解未来的技术趋势和促进创新至关重要。现有方法往往忽略了从多个角度对知识组合进行集成,以及其固有的异质性和对偶的“多对一”性质,即单个知识组合可以包含多个元素,单个元素可能属于多个组合。为此,我们提出了一种新的框架,称为多视图异构超gnn,用于异亲知识组合预测(H3KCP)。具体来说,H3KCP首先构建了一个反映知识组合对偶“多对一”属性的超图,其中每个超边可以包含多个节点,每个节点也可以属于多个超边。其次,该框架采用多视图融合方法对知识组合进行建模,考虑了异质性,并整合了来自共现、共引用和基于分层结构的视图的见解。此外,我们从谱图的角度对H3KCP进行了分析,为其合理性提供了见解。最后,在真实世界的专利数据集和开放学术图数据集上进行的大量实验验证了我们方法的有效性和效率,对知识组合产生了重要的见解。
{"title":"Multi-View Heterogeneous HyperGNN for Heterophilic Knowledge Combination Prediction","authors":"Huijie Liu;Shulan Ruan;Han Wu;Zhenya Huang;Defu Lian;Qi Liu;Enhong Chen","doi":"10.1109/TBDATA.2025.3527216","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527216","url":null,"abstract":"Knowledge combination prediction involves analyzing current knowledge elements and their relationships, then forecasting how these elements, drawn from various fields, can be creatively combined to form new, innovative solutions. This process is critical for countries and businesses to understand future technology trends and promote innovation in an era of rapid scientific and technological advancement. Existing methods often overlook the integration of knowledge combinations from multiple views, along with their inherent heterophily and the dual “many-to-one” property, where a single knowledge combination can include multiple elements, and a single element may belong to various combinations. To this end, we propose a novel framework named Multi-view <underline>H</u>eterogeneous <underline>H</u>yperGNN for <underline>H</u>eterophilic <underline>K</u>nowledge <underline>C</u>ombination <underline>P</u>rediction (H3KCP). Specifically, H3KCP first constructs a hypergraph reflecting the dual “many-to-one” property of knowledge combinations, where each hyperedge may contain several nodes and each node can also belong to multiple hyperedges. Next, the framework employs a multi-view fusion approach to model knowledge combinations, considering heterophily and integrating insights from co-occurrence, co-citation, and hierarchical structure-based views. Furthermore, our analysis of H3KCP from a spectral graph perspective offers insights into its rationality. Finally, extensive experiments on real-world patent datasets and the Open Academic Graph dataset validate the effectiveness and efficiency of our approach, yielding significant insights into knowledge combinations.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2321-2337"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Robust Federated Learning: A Survey With Heterogeneity Considerations 鲁棒联邦学习的研究进展:考虑异质性的综述
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527202
Chuan Chen;Tianchi Liao;Xiaojun Deng;Zihou Wu;Sheng Huang;Zibin Zheng
In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous FL and summarize the research challenges in FL in terms of five aspects: data, model, task, device and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of FL, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous FL environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous FL.
在异构联邦学习(FL)领域,关键的挑战是如何跨多个具有不同数据分布、模型结构、任务目标、计算能力和通信资源的客户端高效协作地训练模型。这种多样性导致了显著的异质性,从而增加了模型训练的复杂性。本文首先概述了异构语音识别的基本概念,并从数据、模型、任务、设备和通信五个方面总结了异构语音识别的研究挑战。此外,我们探讨了现有的最先进的方法如何应对FL的异质性,并在三个不同的层次上对这些方法进行了分类和回顾:数据级、模型级和体系结构级。随后,本文广泛讨论了异构FL环境下的隐私保护策略。最后,讨论了当前存在的问题和未来的研究方向,旨在促进异质FL的进一步发展。
{"title":"Advances in Robust Federated Learning: A Survey With Heterogeneity Considerations","authors":"Chuan Chen;Tianchi Liao;Xiaojun Deng;Zihou Wu;Sheng Huang;Zibin Zheng","doi":"10.1109/TBDATA.2025.3527202","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527202","url":null,"abstract":"In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous FL and summarize the research challenges in FL in terms of five aspects: data, model, task, device and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of FL, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous FL environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous FL.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1548-1567"},"PeriodicalIF":7.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emulating Reader Behaviors for Fake News Detection 虚假新闻检测的读者行为模拟
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527230
Junwei Yin;Min Gao;Kai Shu;Zehua Zhao;Yinqiu Huang;Jia Wang
The wide dissemination of fake news has affected our lives in many aspects, making fake news detection important and attracting increasing attention. Existing approaches make substantial contributions in this field by modeling news from a single-modal or multi-modal perspective. However, these modal-based methods can result in sub-optimal outcomes as they ignore reader behaviors in news consumption and authenticity verification. For instance, they haven't taken into consideration the component-by-component reading process: from the headline, images, comments, to the body, which is essential for modeling news with more granularity. To this end, we propose an approach of Emulating the behaviors of readers (Ember) for fake news detection on social media, incorporating readers’ reading and verificating process to model news from the component perspective thoroughly. Specifically, we first construct intra-component feature extractors to emulate the behaviors of semantic analyzing on each component. Then, we design a module that comprises inter-component feature extractors and a sequence-based aggregator. This module mimics the process of verifying the correlation between components and the overall reading and verification sequence. Thus, Ember can handle the news with various components by emulating corresponding sequences. We conduct extensive experiments on nine real-world datasets, and the results demonstrate the superiority of Ember.
假新闻的广泛传播已经在很多方面影响了我们的生活,使得假新闻的检测变得越来越重要,越来越受到人们的关注。现有的方法通过从单模态或多模态的角度对新闻进行建模,在这一领域做出了重大贡献。然而,这些基于模式的方法可能会导致次优结果,因为它们忽略了读者在新闻消费和真实性验证中的行为。例如,他们没有考虑到一个组件一个组件的阅读过程:从标题、图片、评论到正文,这对于用更多粒度建模新闻是必不可少的。为此,我们提出了一种模拟读者行为(Ember)的方法来检测社交媒体上的假新闻,将读者的阅读和验证过程结合起来,从组件的角度对新闻进行彻底的建模。具体而言,我们首先构建组件内特征提取器来模拟每个组件上的语义分析行为。然后,我们设计了一个包含组件间特征提取器和基于序列的聚合器的模块。该模块模拟了验证组件之间相关性的过程以及整体读取和验证顺序。因此,Ember可以通过模拟相应的序列来处理具有各种组件的新闻。我们在9个真实数据集上进行了大量的实验,结果证明了Ember的优越性。
{"title":"Emulating Reader Behaviors for Fake News Detection","authors":"Junwei Yin;Min Gao;Kai Shu;Zehua Zhao;Yinqiu Huang;Jia Wang","doi":"10.1109/TBDATA.2025.3527230","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527230","url":null,"abstract":"The wide dissemination of fake news has affected our lives in many aspects, making fake news detection important and attracting increasing attention. Existing approaches make substantial contributions in this field by modeling news from a single-modal or multi-modal perspective. However, these modal-based methods can result in sub-optimal outcomes as they ignore reader behaviors in news consumption and authenticity verification. For instance, they haven't taken into consideration the component-by-component reading process: from the headline, images, comments, to the body, which is essential for modeling news with more granularity. To this end, we propose an approach of <underline>Em</u>ulating the <underline>be</u>haviors of <underline>r</u>eaders (Ember) for fake news detection on social media, incorporating readers’ reading and verificating process to model news from the component perspective thoroughly. Specifically, we first construct intra-component feature extractors to emulate the behaviors of semantic analyzing on each component. Then, we design a module that comprises inter-component feature extractors and a sequence-based aggregator. This module mimics the process of verifying the correlation between components and the overall reading and verification sequence. Thus, Ember can handle the news with various components by emulating corresponding sequences. We conduct extensive experiments on nine real-world datasets, and the results demonstrate the superiority of Ember.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2353-2364"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Portraying Fine-Grained Tenant Portrait for Churn Prediction Using Semi-Supervised Graph Convolution and Attention Network 利用半监督图卷积和注意力网络描绘细粒度租户画像用于客户流失预测
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527200
Zuodong Jin;Peng Qi;Muyan Yao;Dan Tao
With the widespread application of Big Data and intelligent information systems, the tenant has become the main form of most scenarios. As a data mining technique, the portrait has been widely used to provide targeted services. Therefore, we transfer the traditional user-driven portrait into tenant driven for churn prediction. To achieve it, this paper first proposes a three-layer architecture and defines the fine-grained features for creating portraits from the perspective of tenants. In a large-scale telecommunication industry dataset of 100,000 tenants, we construct the tenant portrait through the proposed framework, and analyze the influences of the defined features on churn possibility. Then, considering the information missing caused by privacy concerns, we come up with the CrossMatch, a portrait completion model based on semi-supervised and graph convolution, which combines the relation characteristics among tenants for recovering missing information. On this basis, we design the tenant churn prediction method based on a directed attention network. Moreover, we recover missing information on three public node datasets with CrossMatch, achieving around 1-2$%$ improvement. We then apply the directed attention network for churn prediction and achieve an Accuracy of 75.06$%$, Precision of 77.78$%$, and F1-score of 71.43$%$, which outperforms all the baselines.
随着大数据和智能信息系统的广泛应用,租户已经成为大多数场景的主要形式。作为一种数据挖掘技术,画像已被广泛用于提供有针对性的服务。因此,我们将传统的用户驱动画像转换为租户驱动的流失预测。为了实现这一目标,本文首先提出了一个三层架构,并定义了从租户角度创建肖像的细粒度特征。在10万租户的大型电信行业数据集中,我们通过提出的框架构建了租户画像,并分析了定义的特征对流失可能性的影响。然后,考虑到隐私问题导致的信息缺失,我们提出了一种基于半监督和图卷积的画像补全模型CrossMatch,该模型结合租户之间的关系特征来恢复缺失的信息。在此基础上,设计了基于定向注意力网络的租户流失预测方法。此外,我们使用CrossMatch在三个公共节点数据集上恢复了缺失的信息,实现了大约1- 2%的改进。然后,我们将定向注意力网络应用于流失预测,并获得了75.06美元的准确度,77.78美元的精确度和71.43美元的f1分数,优于所有基线。
{"title":"Portraying Fine-Grained Tenant Portrait for Churn Prediction Using Semi-Supervised Graph Convolution and Attention Network","authors":"Zuodong Jin;Peng Qi;Muyan Yao;Dan Tao","doi":"10.1109/TBDATA.2025.3527200","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527200","url":null,"abstract":"With the widespread application of Big Data and intelligent information systems, the tenant has become the main form of most scenarios. As a data mining technique, the portrait has been widely used to provide targeted services. Therefore, we transfer the traditional user-driven portrait into tenant driven for churn prediction. To achieve it, this paper first proposes a three-layer architecture and defines the fine-grained features for creating portraits from the perspective of tenants. In a large-scale telecommunication industry dataset of 100,000 tenants, we construct the tenant portrait through the proposed framework, and analyze the influences of the defined features on churn possibility. Then, considering the information missing caused by privacy concerns, we come up with the <i>CrossMatch</i>, a portrait completion model based on semi-supervised and graph convolution, which combines the relation characteristics among tenants for recovering missing information. On this basis, we design the tenant churn prediction method based on a directed attention network. Moreover, we recover missing information on three public node datasets with <i>CrossMatch</i>, achieving around 1-2<inline-formula><tex-math>$%$</tex-math></inline-formula> improvement. We then apply the directed attention network for churn prediction and achieve an Accuracy of 75.06<inline-formula><tex-math>$%$</tex-math></inline-formula>, Precision of 77.78<inline-formula><tex-math>$%$</tex-math></inline-formula>, and F1-score of 71.43<inline-formula><tex-math>$%$</tex-math></inline-formula>, which outperforms all the baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2296-2307"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big Data-Driven Advancements and Future Directions in Vehicle Perception Technologies: From Autonomous Driving to Modular Buses 车辆感知技术的大数据驱动进展与未来方向:从自动驾驶到模块化公交车
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527208
Hongyi Lin;Yang Liu;Liang Wang;Xiaobo Qu
The rapid development of Big Data and artificial intelligence (AI) is revolutionizing the automotive and transportation industries, leading to the creation of the Autonomous Modular Bus (AMB). Designed to address the key challenges of modern public transportation systems, the AMB adopts a modular dynamic assembly approach. However, existing research on the AMB predominantly focuses on operational aspects, whereas in-transit docking remains the primary obstacle to its commercial deployment. This challenge stems from the fact that current perception accuracy in autonomous vehicles is limited to the decimeter level, with insufficient capability to manage adverse weather and complex traffic conditions. To enable AMBs to achieve full-scenario autonomous driving capabilities, this paper reviews current perception technologies from three perspectives: single-vehicle single-sensor perception, multi-sensor fusion perception, and cooperative perception. It examines the characteristics of existing perception solutions and evaluates their applicability to AMB-specific requirements. Furthermore, considering the unique challenges of in-transit docking, this paper identifies and proposes four future research directions for advancing AMB perception systems as well as general autonomous driving technologies.
大数据和人工智能(AI)的快速发展正在彻底改变汽车和运输行业,导致自主模块化客车(AMB)的诞生。为了解决现代公共交通系统的主要挑战,AMB采用模块化动态组装方法。然而,现有的AMB研究主要集中在操作方面,而在轨对接仍然是其商业部署的主要障碍。这一挑战源于这样一个事实,即目前自动驾驶汽车的感知精度仅限于分米级别,没有足够的能力来应对恶劣天气和复杂的交通状况。为了使amb能够实现全场景自动驾驶能力,本文从单车单传感器感知、多传感器融合感知和协同感知三个方面综述了当前的感知技术。它检查了现有感知解决方案的特征,并评估了它们对特定于amb的需求的适用性。此外,考虑到过境对接的独特挑战,本文确定并提出了未来推进AMB感知系统以及通用自动驾驶技术的四个研究方向。
{"title":"Big Data-Driven Advancements and Future Directions in Vehicle Perception Technologies: From Autonomous Driving to Modular Buses","authors":"Hongyi Lin;Yang Liu;Liang Wang;Xiaobo Qu","doi":"10.1109/TBDATA.2025.3527208","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527208","url":null,"abstract":"The rapid development of Big Data and artificial intelligence (AI) is revolutionizing the automotive and transportation industries, leading to the creation of the Autonomous Modular Bus (AMB). Designed to address the key challenges of modern public transportation systems, the AMB adopts a modular dynamic assembly approach. However, existing research on the AMB predominantly focuses on operational aspects, whereas in-transit docking remains the primary obstacle to its commercial deployment. This challenge stems from the fact that current perception accuracy in autonomous vehicles is limited to the decimeter level, with insufficient capability to manage adverse weather and complex traffic conditions. To enable AMBs to achieve full-scenario autonomous driving capabilities, this paper reviews current perception technologies from three perspectives: single-vehicle single-sensor perception, multi-sensor fusion perception, and cooperative perception. It examines the characteristics of existing perception solutions and evaluates their applicability to AMB-specific requirements. Furthermore, considering the unique challenges of in-transit docking, this paper identifies and proposes four future research directions for advancing AMB perception systems as well as general autonomous driving technologies.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1568-1587"},"PeriodicalIF":7.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tucker-Based High-Accuracy Multi-Modal Clustering for Social Information Network 基于tucker的社会信息网络高精度多模态聚类
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2024.3524830
Ren Li;Huazhong Liu;Xiaotong Zhou;Jiawei Wang;Jihong Ding;Laurence T. Yang;Hua Li;Yunfan Zhang
With the explosion of social media platforms, a substantial amount of data is generated from social information network. Tensor-based multi-modal clustering methods have been widely applied in various scenarios of social information network by mining potential correlative relationships from large-scale heterogeneous data. Nevertheless, the accuracy and efficiency of tensor-based multi-modal clustering methods are seriously restricted by noise data and the curse of dimensionality. Therefore, this paper presents a Tucker-based multi-modal clustering (TuMC) and an improved TuMC (ITuMC) to enhance the accuracy and efficiency of multi-modal clustering. First, we propose two Tucker-based attribute weight ranking learning approaches to calculate weight tensor efficiently. Then, we present a calculation approach for Tucker-based selective weighted tensor distance (SWTD) and a TuMC method. Meanwhile, an ITuMC method is explored by optimizing the calculation efficiency of the SWTD to further improve clustering speed. Finally, we present a Tucker-based multi-modal clustering and service framework for social information network. Extensive experimental results based on social Geolife GPS trajectory and electricity consumption datasets demonstrate that the TuMC and ITuMC methods can cluster multi-source heterogeneous data with both higher accuracy and efficiency under complex social information network by DVI, AR and execution time measurement.
随着社交媒体平台的爆炸式增长,社交信息网络产生了大量的数据。基于张量的多模态聚类方法通过从大规模异构数据中挖掘潜在的关联关系,已广泛应用于社会信息网络的各种场景中。然而,基于张量的多模态聚类方法的精度和效率受到噪声数据和维数缺陷的严重制约。为此,本文提出了基于tucker的多模态聚类(TuMC)和改进的多模态聚类(ITuMC),以提高多模态聚类的精度和效率。首先,我们提出了两种基于tucker的属性权重排序学习方法来高效地计算权重张量。然后,我们提出了一种基于tucker的选择性加权张量距离(SWTD)的计算方法和一种TuMC方法。同时,通过优化SWTD的计算效率,探索ITuMC方法,进一步提高聚类速度。最后,提出了一种基于tucker的多模态聚类服务框架。基于社会Geolife GPS轨迹和电力消耗数据集的大量实验结果表明,通过DVI、AR和执行时间测量,TuMC和ITuMC方法可以在复杂的社会信息网络下以更高的精度和效率聚类多源异构数据。
{"title":"Tucker-Based High-Accuracy Multi-Modal Clustering for Social Information Network","authors":"Ren Li;Huazhong Liu;Xiaotong Zhou;Jiawei Wang;Jihong Ding;Laurence T. Yang;Hua Li;Yunfan Zhang","doi":"10.1109/TBDATA.2024.3524830","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3524830","url":null,"abstract":"With the explosion of social media platforms, a substantial amount of data is generated from social information network. Tensor-based multi-modal clustering methods have been widely applied in various scenarios of social information network by mining potential correlative relationships from large-scale heterogeneous data. Nevertheless, the accuracy and efficiency of tensor-based multi-modal clustering methods are seriously restricted by noise data and the curse of dimensionality. Therefore, this paper presents a Tucker-based multi-modal clustering (TuMC) and an improved TuMC (ITuMC) to enhance the accuracy and efficiency of multi-modal clustering. First, we propose two Tucker-based attribute weight ranking learning approaches to calculate weight tensor efficiently. Then, we present a calculation approach for Tucker-based selective weighted tensor distance (SWTD) and a TuMC method. Meanwhile, an ITuMC method is explored by optimizing the calculation efficiency of the SWTD to further improve clustering speed. Finally, we present a Tucker-based multi-modal clustering and service framework for social information network. Extensive experimental results based on social Geolife GPS trajectory and electricity consumption datasets demonstrate that the TuMC and ITuMC methods can cluster multi-source heterogeneous data with both higher accuracy and efficiency under complex social information network by DVI, AR and execution time measurement.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1677-1691"},"PeriodicalIF":7.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relational Clustering-Based Parallel Spaces Construction and Embedding for Dynamic Knowledge Graph 基于关系聚类的动态知识图并行空间构建与嵌入
IF 5.7 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-08 DOI: 10.1109/TBDATA.2025.3527238
Yao Liu;Yongfei Zhang
With the increasing amount of data in various domains, knowledge graphs (KGs) have become powerful tools for representing complex and heterogeneous information in a structured way, and for extracting valuable information from knowledge graphs through embedding techniques to support downstream tasks such as recommendation and Q&A systems. Knowledge graphs consist of triples that are continuously added as knowledge is updated. However, most existing embedding models are designed for static graphs, requiring the entire model to be retrained for each update, which is time-consuming. Existing global dynamic embedding models focus on exploiting the structural and relational information of the whole graph to achieve embedding quality, resulting in reduced dynamic efficiency. To address this problem, we propose a relational clustering-based parallel space model in which knowledge from different domains is embedded in different subspaces, allowing each subspace to focus on the data characteristics of a specific domain, thereby improving the quality of knowledge. Second, the new data only affects some subspaces but not the performance of other spaces, improving the model's adaptability to dynamics. Furthermore, we employ two incremental approaches based on the type of added data to improve the efficiency of dynamic embedding while ensuring that the added data preserves the characteristics of the parallel space. The experimental results show that the dynamic embedding efficiency of our model is improved by an average of 50.3% compared to the SOTA dynamic embedding model for the link prediction task. Particularly on FB15K, our model not only improves the efficiency by 41% but also increases the accuracy by 7.5%, demonstrating the accuracy and efficiency of our model.
随着各领域数据量的不断增加,知识图已成为以结构化方式表示复杂和异构信息的强大工具,并通过嵌入技术从知识图中提取有价值的信息,以支持下游任务,如推荐和问答系统。知识图由三元组组成,随着知识的更新而不断添加。然而,大多数现有的嵌入模型都是为静态图设计的,每次更新都需要对整个模型进行重新训练,这非常耗时。现有的全局动态嵌入模型侧重于利用整个图的结构信息和关系信息来实现嵌入质量,导致动态效率降低。为了解决这一问题,我们提出了一种基于关系聚类的并行空间模型,该模型将不同领域的知识嵌入到不同的子空间中,使每个子空间都能关注特定领域的数据特征,从而提高知识的质量。其次,新数据只影响部分子空间而不影响其他空间的性能,提高了模型的动态适应性。此外,我们采用了两种基于添加数据类型的增量方法,在保证添加数据保持并行空间特征的同时,提高了动态嵌入的效率。实验结果表明,在链路预测任务中,与SOTA动态嵌入模型相比,该模型的动态嵌入效率平均提高了50.3%。特别是在FB15K上,我们的模型不仅提高了41%的效率,而且提高了7.5%的精度,证明了我们的模型的准确性和效率。
{"title":"Relational Clustering-Based Parallel Spaces Construction and Embedding for Dynamic Knowledge Graph","authors":"Yao Liu;Yongfei Zhang","doi":"10.1109/TBDATA.2025.3527238","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3527238","url":null,"abstract":"With the increasing amount of data in various domains, knowledge graphs (KGs) have become powerful tools for representing complex and heterogeneous information in a structured way, and for extracting valuable information from knowledge graphs through embedding techniques to support downstream tasks such as recommendation and Q&A systems. Knowledge graphs consist of triples that are continuously added as knowledge is updated. However, most existing embedding models are designed for static graphs, requiring the entire model to be retrained for each update, which is time-consuming. Existing global dynamic embedding models focus on exploiting the structural and relational information of the whole graph to achieve embedding quality, resulting in reduced dynamic efficiency. To address this problem, we propose a relational clustering-based parallel space model in which knowledge from different domains is embedded in different subspaces, allowing each subspace to focus on the data characteristics of a specific domain, thereby improving the quality of knowledge. Second, the new data only affects some subspaces but not the performance of other spaces, improving the model's adaptability to dynamics. Furthermore, we employ two incremental approaches based on the type of added data to improve the efficiency of dynamic embedding while ensuring that the added data preserves the characteristics of the parallel space. The experimental results show that the dynamic embedding efficiency of our model is improved by an average of 50.3% compared to the SOTA dynamic embedding model for the link prediction task. Particularly on FB15K, our model not only improves the efficiency by 41% but also increases the accuracy by 7.5%, demonstrating the accuracy and efficiency of our model.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2308-2320"},"PeriodicalIF":5.7,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1