IEEE Transactions on Big Data最新文献_第4页

Graph Contrastive Learning for Clustering of Multi-Layer Networks 多层网络聚类的图形对比学习

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-12-14 DOI: 10.1109/TBDATA.2023.3343349

Yifei Yang;Xiaoke Ma

Multi-layer networks precisely model complex systems in society and nature with various types of interactions, and identifying conserved modules that are well-connected in all layers is of great significance for revealing their structure-function relationships. Current algorithms are criticized for either ignoring the intrinsic relations among various layers, or failing to learn discriminative features. To attack these limitations, a novel graph contrastive learning framework for clustering of multi-layer networks is proposed by joining nonnegative matrix factorization and graph contrastive learning (called jNMF-GCL), where the intrinsic structure and discriminative of features are simultaneously addressed. Specifically, features of vertices are first learned by preserving the conserved structure in multi-layer networks with matrix factorization, and then jNMF-GCL learns an affinity structure of vertices by manipulating features of various layers. To enhance quality of features, contrastive learning is executed by selecting the positive and negative samples from the constructed affinity graph, which significantly improves discriminative of features. Finally, jNMF-GCL incorporates feature learning, construction of affinity graph, contrastive learning and clustering into an overall objective, where global and local structural information are seamlessly fused, providing a more effective way to describe structure of multi-layer networks. Extensive experiments conducted on both artificial and real-world networks have shown the superior performance of jNMF-GCL over state-of-the-art models across various metrics.

多层网络可以精确地模拟社会和自然界中具有各种相互作用的复杂系统，而识别各层中连接良好的保守模块对于揭示其结构-功能关系具有重要意义。目前的算法要么忽略了各层之间的内在关系，要么无法学习到辨别特征，因而饱受诟病。为了解决这些局限性，我们提出了一种用于多层网络聚类的新型图对比学习框架，将非负矩阵因式分解和图对比学习结合起来（称为 jNMF-GCL），同时解决固有结构和特征的判别问题。具体来说，首先通过矩阵因式分解保留多层网络中的守恒结构来学习顶点特征，然后 jNMF-GCL 通过处理各层特征来学习顶点的亲和结构。为了提高特征的质量，jNMF-GCL 通过从构建的亲和图中选择正样本和负样本来执行对比学习，这大大提高了特征的判别能力。最后，jNMF-GCL 将特征学习、亲和图构建、对比学习和聚类整合为一个整体目标，将全局和局部结构信息无缝融合，为描述多层网络结构提供了一种更有效的方法。在人工网络和真实世界网络上进行的大量实验表明，jNMF-GCL 在各种指标上都优于最先进的模型。

{"title":"Graph Contrastive Learning for Clustering of Multi-Layer Networks","authors":"Yifei Yang;Xiaoke Ma","doi":"10.1109/TBDATA.2023.3343349","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3343349","url":null,"abstract":"Multi-layer networks precisely model complex systems in society and nature with various types of interactions, and identifying conserved modules that are well-connected in all layers is of great significance for revealing their structure-function relationships. Current algorithms are criticized for either ignoring the intrinsic relations among various layers, or failing to learn discriminative features. To attack these limitations, a novel graph contrastive learning framework for clustering of multi-layer networks is proposed by joining nonnegative matrix factorization and graph contrastive learning (called jNMF-GCL), where the intrinsic structure and discriminative of features are simultaneously addressed. Specifically, features of vertices are first learned by preserving the conserved structure in multi-layer networks with matrix factorization, and then jNMF-GCL learns an affinity structure of vertices by manipulating features of various layers. To enhance quality of features, contrastive learning is executed by selecting the positive and negative samples from the constructed affinity graph, which significantly improves discriminative of features. Finally, jNMF-GCL incorporates feature learning, construction of affinity graph, contrastive learning and clustering into an overall objective, where global and local structural information are seamlessly fused, providing a more effective way to describe structure of multi-layer networks. Extensive experiments conducted on both artificial and real-world networks have shown the superior performance of jNMF-GCL over state-of-the-art models across various metrics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"429-441"},"PeriodicalIF":7.5,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and Privacy-Preserving Aggregate Query Over Public Property Graphs 在公共属性图上进行高效且保护隐私的聚合查询

IF 7.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-12-13 DOI: 10.1109/TBDATA.2023.3342623

Yunguo Guan;Rongxing Lu;Songnian Zhang;Yandong Zheng;Jun Shao;Guiyi Wei

Graph data structures’ ability of representing vertex relationships has made them increasingly popular in recent years. Amid this trend, many property graph datasets have been collected and made public to facilitate a variant of queries such as the aggregate queries that will be extensively exploited in this paper. While cloud deployment of both the datasets and query services is intriguing, it could raise privacy concerns related to user queries and results. In past years, many works on graph privacy have been put forth, however they either do not consider query privacy or cannot be adapted for aggregate queries. Some others consider queries over encrypted graphs but cannot protect access pattern privacy. In particular, when deploying them to handle queries over public graph datasets, the cloud server can infer additional information related to user queries. Aiming at this challenge, we propose a privacy-preserving property graph aggregate query scheme in this paper. Specifically, we first design new privacy-preserving vertex matching and matching update techniques, which securely initialize and update the mapping between vertices in the dataset and the user-specified patterns, respectively. Based on them, we construct our proposed scheme to achieve aggregate queries over public property graphs. Rigid security analysis shows that our proposed scheme can protect the privacy of user queries and results as well as achieve access pattern privacy. In addition, extensive experiments also demonstrate the efficiency of our scheme in terms of computational overheads.

近年来，图数据结构表示顶点关系的能力使其越来越受欢迎。在这种趋势下，人们收集并公开了许多属性图数据集，以方便各种查询，如本文将广泛使用的聚合查询。虽然数据集和查询服务的云部署都很吸引人，但它可能会引发与用户查询和结果相关的隐私问题。在过去几年中，已经有许多关于图隐私的研究成果问世，但它们要么没有考虑查询隐私问题，要么无法适用于聚合查询。还有一些著作考虑了对加密图的查询，但无法保护访问模式隐私。特别是，当部署它们来处理对公共图数据集的查询时，云服务器可以推断出与用户查询相关的其他信息。针对这一挑战，我们在本文中提出了一种保护隐私的属性图聚合查询方案。具体来说，我们首先设计了新的隐私保护顶点匹配和匹配更新技术，分别安全地初始化和更新数据集中的顶点与用户指定模式之间的映射。在此基础上，我们提出了实现公共属性图聚合查询的方案。严格的安全性分析表明，我们提出的方案既能保护用户查询和结果的隐私，又能实现访问模式的隐私保护。此外，大量实验还证明了我们的方案在计算开销方面的效率。

{"title":"Efficient and Privacy-Preserving Aggregate Query Over Public Property Graphs","authors":"Yunguo Guan;Rongxing Lu;Songnian Zhang;Yandong Zheng;Jun Shao;Guiyi Wei","doi":"10.1109/TBDATA.2023.3342623","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342623","url":null,"abstract":"Graph data structures’ ability of representing vertex relationships has made them increasingly popular in recent years. Amid this trend, many property graph datasets have been collected and made public to facilitate a variant of queries such as the aggregate queries that will be extensively exploited in this paper. While cloud deployment of both the datasets and query services is intriguing, it could raise privacy concerns related to user queries and results. In past years, many works on graph privacy have been put forth, however they either do not consider query privacy or cannot be adapted for aggregate queries. Some others consider queries over encrypted graphs but cannot protect access pattern privacy. In particular, when deploying them to handle queries over public graph datasets, the cloud server can infer additional information related to user queries. Aiming at this challenge, we propose a privacy-preserving property graph aggregate query scheme in this paper. Specifically, we first design new privacy-preserving vertex matching and matching update techniques, which securely initialize and update the mapping between vertices in the dataset and the user-specified patterns, respectively. Based on them, we construct our proposed scheme to achieve aggregate queries over public property graphs. Rigid security analysis shows that our proposed scheme can protect the privacy of user queries and results as well as achieve access pattern privacy. In addition, extensive experiments also demonstrate the efficiency of our scheme in terms of computational overheads.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"146-157"},"PeriodicalIF":7.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications 时空大数据分析生态系统调查：资源管理、处理平台和应用

IF 7.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-12-13 DOI: 10.1109/TBDATA.2023.3342619

Huanghuang Liang;Zheng Zhang;Chuang Hu;Yili Gong;Dazhao Cheng

With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth, marking the onset of the STBD era. Recent studies have concentrated on developing algorithms and techniques for the collection, management, storage, processing, analysis, and visualization of STBD. Researchers have made significant advancements by enhancing STBD handling techniques, creating novel systems, and integrating spatio-temporal support into existing systems. However, these studies often neglect resource management and system optimization, crucial factors for enhancing the efficiency of STBD processing and applications. Additionally, the transition of STBD to the innovative Cloud-Edge-End unified computing system needs to be noticed. In this survey, we comprehensively explore the entire ecosystem of STBD analytics systems. We delineate the STBD analytics ecosystem and categorize the technologies used to process GIS data into five modules: STBD, computation resources, processing platform, resource management, and applications. Specifically, we subdivide STBD and its applications into geoscience-oriented and human-social activity-oriented. Within the processing platform module, we further categorize it into the data management layer (DBMS-GIS), data processing layer (BigData-GIS), data analysis layer (AI-GIS), and cloud native layer (Cloud-GIS). The resource management module and each layer in the processing platform are classified into three categories: task-oriented, resource-oriented, and cloud-based. Finally, we propose research agendas for potential future developments.

随着互联网、物联网（IoT）和地理信息系统（GIS）的快速发展，时空大数据（STBD）正在经历指数级增长，标志着 STBD 时代的到来。近期的研究集中于开发时空大数据的收集、管理、存储、处理、分析和可视化算法和技术。研究人员通过增强 STBD 处理技术、创建新型系统以及将时空支持集成到现有系统中，取得了重大进展。然而，这些研究往往忽视了资源管理和系统优化，而这正是提高 STBD 处理和应用效率的关键因素。此外，STBD 向创新的云-边-端统一计算系统的过渡也需要引起注意。在本研究中，我们全面探讨了 STBD 分析系统的整个生态系统。我们划分了 STBD 分析生态系统，并将用于处理 GIS 数据的技术分为五个模块：STBD、计算资源、处理平台、资源管理和应用。具体而言，我们将 STBD 及其应用细分为面向地理科学的应用和面向人类社会活动的应用。在处理平台模块中，我们进一步将其分为数据管理层（DBMS-GIS）、数据处理层（BigData-GIS）、数据分析层（AI-GIS）和云原生层（Cloud-GIS）。资源管理模块和处理平台中的各层分为三类：面向任务的、面向资源的和基于云的。最后，我们提出了未来可能发展的研究议程。

{"title":"A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications","authors":"Huanghuang Liang;Zheng Zhang;Chuang Hu;Yili Gong;Dazhao Cheng","doi":"10.1109/TBDATA.2023.3342619","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342619","url":null,"abstract":"With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth, marking the onset of the STBD era. Recent studies have concentrated on developing algorithms and techniques for the collection, management, storage, processing, analysis, and visualization of STBD. Researchers have made significant advancements by enhancing STBD handling techniques, creating novel systems, and integrating spatio-temporal support into existing systems. However, these studies often neglect resource management and system optimization, crucial factors for enhancing the efficiency of STBD processing and applications. Additionally, the transition of STBD to the innovative Cloud-Edge-End unified computing system needs to be noticed. In this survey, we comprehensively explore the entire ecosystem of STBD analytics systems. We delineate the STBD analytics ecosystem and categorize the technologies used to process GIS data into five modules: STBD, computation resources, processing platform, resource management, and applications. Specifically, we subdivide STBD and its applications into geoscience-oriented and human-social activity-oriented. Within the processing platform module, we further categorize it into the data management layer (DBMS-GIS), data processing layer (BigData-GIS), data analysis layer (AI-GIS), and cloud native layer (Cloud-GIS). The resource management module and each layer in the processing platform are classified into three categories: task-oriented, resource-oriented, and cloud-based. Finally, we propose research agendas for potential future developments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"174-193"},"PeriodicalIF":7.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Dimensional Data Recovery via Feature-Based Fully-Connected Tensor Network Decomposition 通过基于特征的全连接张量网络分解实现多维数据恢复

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-12-13 DOI: 10.1109/TBDATA.2023.3342611

Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Hao Zhang;Yun-Yang Liu

Multi-dimensional data are inevitably corrupted, which hinders subsequent applications (e.g., image segmentation and classification). Recently, due to the powerful ability to characterize the correlation between any two modes of tensors, fully-connected tensor network (FCTN) decomposition has received increasing attention in multi-dimensional data recovery. However, the expressive power of FCTN decomposition in the original pixel domain has yet to be fully leveraged, which can not provide satisfactory results in the recovery of details and textures, especially for low-sampling rates or heavy noise scenarios. In this work, we suggest a feature-based FCTN decomposition model (termed as F-FCTN) for multi-dimensional data recovery, which can faithfully capture the relationship between the spatial-temporal/spectral-feature modes. Compared with the original FCTN decomposition, F-FCTN can more effectively recover the details and textures and be more suitable for the subsequent high-level applications. However, F-FCTN leads to a larger-scale feature tensor as compared with the original tensor, which brings challenges in designing the solving algorithm. To harness the resulting large-scale optimization problem, we develop an efficient leverage score sampling-based proximal alternating minimization (S-PAM) algorithm and theoretically establish its relative error guarantee. Extensive numerical experiments on real-world data illustrate that the proposed method performs favorably against compared methods in data recovery and facilitates subsequent image classification.

多维数据不可避免地会受到破坏，这阻碍了后续应用（如图像分割和分类）。近来，由于全连接张量网络（FCTN）分解具有表征任意两种张量模式之间相关性的强大能力，因此在多维数据恢复领域受到越来越多的关注。然而，全连接张量网络分解在原始像素域的表现力尚未得到充分发挥，在细节和纹理恢复方面无法提供令人满意的结果，尤其是在低采样率或噪声严重的情况下。在这项工作中，我们提出了一种基于特征的 FCTN 分解模型（称为 F-FCTN），用于多维数据恢复，它能忠实地捕捉空间-时间/光谱-特征模式之间的关系。与原始的 FCTN 分解相比，F-FCTN 能更有效地恢复细节和纹理，更适合后续的高级应用。然而，与原始张量相比，F-FCTN 会产生更大尺度的特征张量，这给算法设计带来了挑战。为了解决由此产生的大规模优化问题，我们开发了一种高效的基于杠杆分数采样的近端交替最小化（S-PAM）算法，并从理论上确定了其相对误差保证。在真实世界数据上进行的大量数值实验表明，所提出的方法在数据恢复方面的表现优于同类方法，并有助于后续的图像分类。

{"title":"Multi-Dimensional Data Recovery via Feature-Based Fully-Connected Tensor Network Decomposition","authors":"Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Hao Zhang;Yun-Yang Liu","doi":"10.1109/TBDATA.2023.3342611","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342611","url":null,"abstract":"Multi-dimensional data are inevitably corrupted, which hinders subsequent applications (e.g., image segmentation and classification). Recently, due to the powerful ability to characterize the correlation between any two modes of tensors, fully-connected tensor network (FCTN) decomposition has received increasing attention in multi-dimensional data recovery. However, the expressive power of FCTN decomposition in the original pixel domain has yet to be fully leveraged, which can not provide satisfactory results in the recovery of details and textures, especially for low-sampling rates or heavy noise scenarios. In this work, we suggest a feature-based FCTN decomposition model (termed as F-FCTN) for multi-dimensional data recovery, which can faithfully capture the relationship between the spatial-temporal/spectral-feature modes. Compared with the original FCTN decomposition, F-FCTN can more effectively recover the details and textures and be more suitable for the subsequent high-level applications. However, F-FCTN leads to a larger-scale feature tensor as compared with the original tensor, which brings challenges in designing the solving algorithm. To harness the resulting large-scale optimization problem, we develop an efficient leverage score sampling-based proximal alternating minimization (S-PAM) algorithm and theoretically establish its relative error guarantee. Extensive numerical experiments on real-world data illustrate that the proposed method performs favorably against compared methods in data recovery and facilitates subsequent image classification.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"386-399"},"PeriodicalIF":7.5,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proxy-Based Graph Convolutional Hashing for Cross-Modal Retrieval 基于代理的跨模态检索图卷积哈希算法

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-12-04 DOI: 10.1109/TBDATA.2023.3338951

Yibing Bai;Zhenqiu Shu;Jun Yu;Zhengtao Yu;Xiao-Jun Wu

Cross-modal hashing retrieval approaches have received extensive attention owing to their storage superiority and retrieval efficiency. To achieve better retrieval performances, hashing methods seek to embed more semantic information of multi-modal data into hash codes. Existing deep cross-modal hashing methods typically learn hash functions from the similarity of paired data to generate hash codes. However, such locally-oriented learning methods often suffer from low efficiency and incomplete acquisition of semantic information. To address these challenges, this paper presents a novel deep hashing approach, called Proxy-based Graph Convolutional Hashing (PGCH), for cross-modal retrieval. Specifically, we use global similarity to construct proxy hash codes for two different modalities. This strategy of these proxy hash codes ensures that they include data points with significant distribution differences. It helps to match data from different modalities to different proxy hash codes, which can capture the global similarity of multi-modal hash codes and improve the efficiency of hash code learning. Subsequently, we employ a multi-modal contrastive loss to learn the global similarity. Furthermore, by constructing a proxy hash matrix from the proxy hash codes, we apply graph convolution to efficiently narrow the gap between different modalities, leading to a substantial improvement in retrieval performance for cross-modal retrieval tasks. The comprehensive experiments on four benchmark multimedia datasets demonstrate that our PGCH approach achieves better retrieval performances than a bundle of state-of-the-art hashing approaches.

跨模态散列检索方法因其存储优势和检索效率而受到广泛关注。为了实现更好的检索性能，散列方法试图将多模态数据的更多语义信息嵌入散列代码中。现有的深度跨模态哈希方法通常是从配对数据的相似性中学习哈希函数来生成哈希代码。然而，这种面向局部的学习方法往往存在效率低、语义信息获取不完整等问题。为了应对这些挑战，本文提出了一种用于跨模态检索的新型深度散列方法，称为基于代理的图卷积散列（PGCH）。具体来说，我们利用全局相似性来构建两种不同模态的代理散列码。这些代理散列码的这种策略可确保它们包含具有显著分布差异的数据点。这有助于将不同模态的数据匹配到不同的代理哈希码，从而捕捉到多模态哈希码的全局相似性，提高哈希码学习的效率。随后，我们采用多模态对比损失来学习全局相似性。此外，通过从代理哈希码构建代理哈希矩阵，我们应用图卷积来有效缩小不同模态之间的差距，从而大幅提高跨模态检索任务的检索性能。在四个基准多媒体数据集上进行的综合实验表明，我们的 PGCH 方法比一系列最先进的散列方法取得了更好的检索性能。

{"title":"Proxy-Based Graph Convolutional Hashing for Cross-Modal Retrieval","authors":"Yibing Bai;Zhenqiu Shu;Jun Yu;Zhengtao Yu;Xiao-Jun Wu","doi":"10.1109/TBDATA.2023.3338951","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338951","url":null,"abstract":"Cross-modal hashing retrieval approaches have received extensive attention owing to their storage superiority and retrieval efficiency. To achieve better retrieval performances, hashing methods seek to embed more semantic information of multi-modal data into hash codes. Existing deep cross-modal hashing methods typically learn hash functions from the similarity of paired data to generate hash codes. However, such locally-oriented learning methods often suffer from low efficiency and incomplete acquisition of semantic information. To address these challenges, this paper presents a novel deep hashing approach, called Proxy-based Graph Convolutional Hashing (PGCH), for cross-modal retrieval. Specifically, we use global similarity to construct proxy hash codes for two different modalities. This strategy of these proxy hash codes ensures that they include data points with significant distribution differences. It helps to match data from different modalities to different proxy hash codes, which can capture the global similarity of multi-modal hash codes and improve the efficiency of hash code learning. Subsequently, we employ a multi-modal contrastive loss to learn the global similarity. Furthermore, by constructing a proxy hash matrix from the proxy hash codes, we apply graph convolution to efficiently narrow the gap between different modalities, leading to a substantial improvement in retrieval performance for cross-modal retrieval tasks. The comprehensive experiments on four benchmark multimedia datasets demonstrate that our PGCH approach achieves better retrieval performances than a bundle of state-of-the-art hashing approaches.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"371-385"},"PeriodicalIF":7.5,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Party Federated Recommendation Based on Semi-Supervised Learning 基于半监督学习的多方联合推荐

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-11-30 DOI: 10.1109/TBDATA.2023.3338009

Xin Liu;Jiuluan Lv;Feng Chen;Qingjie Wei;Hangxuan He;Ying Qian

Leveraging multi-party data to provide recommendations remains a challenge, particularly when the party in need of recommendation services possesses only positive samples while other parties just have unlabeled data. To address UDD-PU learning problem, this paper proposes an algorithm VFPU, Vertical Federated learning with Positive and Unlabeled data. VFPU conducts random sampling repeatedly from the multi-party unlabeled data, treating sampled data as negative ones. It hence forms multiple training datasets with balanced positive and negative samples, and multiple testing datasets with those unsampled data. For each training dataset, VFPU trains a base estimator adapted for the vertical federated learning framework iteratively. We use the trained base estimator to generate forecast scores for each sample in the testing dataset. Based on the sum of scores and their frequency of occurrence in the testing datasets, we calculate the probability of being positive for each unlabeled sample. Those with top probabilities are regarded as reliable positive samples. They are then added to the positive samples and subsequently removed from the unlabeled data. This process of sampling, training, and selecting positive samples is iterated repeatedly. Experimental results demonstrated that VFPU performed comparably to its non-federated counterparts and outperformed other federated semi-supervised learning methods.

利用多方数据提供推荐仍然是一项挑战，尤其是当需要推荐服务的一方只拥有正样本而其他各方只有未标记数据时。为解决 UDD-PU 学习问题，本文提出了一种算法 VFPU，即使用正向和未标记数据的垂直联合学习。VFPU 从多方无标记数据中反复进行随机抽样，将抽样数据视为负数据。因此，它形成了多个正负样本均衡的训练数据集，以及多个包含这些未标注数据的测试数据集。对于每个训练数据集，VFPU 都会反复训练一个适合垂直联合学习框架的基础估计器。我们使用训练好的基础估计器为测试数据集中的每个样本生成预测分数。根据分数总和及其在测试数据集中的出现频率，我们计算出每个未标记样本的阳性概率。概率最高的样本被视为可靠的阳性样本。这些样本会被添加到阳性样本中，然后从未标明数据中删除。采样、训练和选择阳性样本的过程反复进行。实验结果表明，VFPU 的性能与非联合式同类方法相当，而且优于其他联合式半监督学习方法。

{"title":"Multi-Party Federated Recommendation Based on Semi-Supervised Learning","authors":"Xin Liu;Jiuluan Lv;Feng Chen;Qingjie Wei;Hangxuan He;Ying Qian","doi":"10.1109/TBDATA.2023.3338009","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338009","url":null,"abstract":"Leveraging multi-party data to provide recommendations remains a challenge, particularly when the party in need of recommendation services possesses only positive samples while other parties just have unlabeled data. To address UDD-PU learning problem, this paper proposes an algorithm VFPU, Vertical Federated learning with Positive and Unlabeled data. VFPU conducts random sampling repeatedly from the multi-party unlabeled data, treating sampled data as negative ones. It hence forms multiple training datasets with balanced positive and negative samples, and multiple testing datasets with those unsampled data. For each training dataset, VFPU trains a base estimator adapted for the vertical federated learning framework iteratively. We use the trained base estimator to generate forecast scores for each sample in the testing dataset. Based on the sum of scores and their frequency of occurrence in the testing datasets, we calculate the probability of being positive for each unlabeled sample. Those with top probabilities are regarded as reliable positive samples. They are then added to the positive samples and subsequently removed from the unlabeled data. This process of sampling, training, and selecting positive samples is iterated repeatedly. Experimental results demonstrated that VFPU performed comparably to its non-federated counterparts and outperformed other federated semi-supervised learning methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"356-370"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FRC-GIF: Frame Ranking-Based Personalized Artistic Media Generation Method for Resource Constrained Devices FRC-GIF：基于帧排序的个性化艺术媒体生成方法，适用于资源受限设备

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-11-30 DOI: 10.1109/TBDATA.2023.3338012

Ghulam Mujtaba;Sunder Ali Khowaja;Muhammad Aslam Jarwar;Jaehyuk Choi;Eun-Seok Ryu

Generating video highlights in the form of animated graphics interchange formats (GIFs) has significantly simplified the process of video browsing. Animated GIFs have paved the way for applications concerning streaming platforms and emerging technologies. Existing studies have led to large computational complexity without considering user personalization. This paper proposes lightweight method to attract users and increase views of videos through personalized artistic media, i.e., static thumbnails and animated GIF generation. The proposed method analyzes lightweight thumbnail containers (LTC) using the computational resources of the client device to recognize personalized events from feature-length sports videos. Next, the thumbnails are then ranked through the frame rank pooling method for their selection. Subsequently, the proposed method processes small video segments rather than considering the whole video for generating artistic media. This makes our approach more computationally efficient compared to existing methods that use the entire video data; thus, the proposed method complies with sustainable development goals. Furthermore, the proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data. Experiments reveal that the computational complexity of our method is 3.73 times lower than that of the state-of-the-art method.

以动画图形交换格式（GIF）的形式生成视频集锦大大简化了视频浏览过程。动画 GIF 为流媒体平台和新兴技术的应用铺平了道路。现有的研究在不考虑用户个性化的情况下，导致了大量的计算复杂性。本文提出了一种轻量级方法，通过个性化艺术媒体（即静态缩略图和 GIF 动画生成）来吸引用户并增加视频的浏览量。所提出的方法利用客户端设备的计算资源分析轻量级缩略图容器（LTC），以识别特征长度体育视频中的个性化事件。然后，通过帧排序池方法对缩略图进行排序，以便进行选择。随后，建议的方法处理小的视频片段，而不是考虑整个视频来生成艺术媒体。与使用整个视频数据的现有方法相比，我们的方法计算效率更高；因此，建议的方法符合可持续发展目标。此外，建议的方法检索和使用缩略图容器和视频片段，从而减少了所需的传输带宽以及本地存储的数据量。实验表明，我们方法的计算复杂度比最先进方法低 3.73 倍。

{"title":"FRC-GIF: Frame Ranking-Based Personalized Artistic Media Generation Method for Resource Constrained Devices","authors":"Ghulam Mujtaba;Sunder Ali Khowaja;Muhammad Aslam Jarwar;Jaehyuk Choi;Eun-Seok Ryu","doi":"10.1109/TBDATA.2023.3338012","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338012","url":null,"abstract":"Generating video highlights in the form of animated graphics interchange formats (GIFs) has significantly simplified the process of video browsing. Animated GIFs have paved the way for applications concerning streaming platforms and emerging technologies. Existing studies have led to large computational complexity without considering user personalization. This paper proposes lightweight method to attract users and increase views of videos through personalized artistic media, i.e., static thumbnails and animated GIF generation. The proposed method analyzes lightweight thumbnail containers (LTC) using the computational resources of the client device to recognize personalized events from feature-length sports videos. Next, the thumbnails are then ranked through the frame rank pooling method for their selection. Subsequently, the proposed method processes small video segments rather than considering the whole video for generating artistic media. This makes our approach more computationally efficient compared to existing methods that use the entire video data; thus, the proposed method complies with sustainable development goals. Furthermore, the proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data. Experiments reveal that the computational complexity of our method is 3.73 times lower than that of the state-of-the-art method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"343-355"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10336393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Label Distribution Learning Based on Horizontal and Vertical Mining of Label Correlations 基于标签相关性横向和纵向挖掘的标签分布学习

IF 7.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-11-30 DOI: 10.1109/TBDATA.2023.3338023

Yaojin Lin;Yulin Li;Chenxi Wang;Lei Guo;Jinkun Chen

Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.

标签分布学习（LDL）是一种新颖的方法，可输出具有不同描述程度的标签。为了提高 LDL 算法的性能，研究人员开发出了全局、局部以及同时全局和局部挖掘标签相关性的不同算法。不过，现有的 LDL 算法在挖掘局部标签相关性时，会大致假设一个聚类中的样本具有相同的标签相关性，但这可能并不适用于所有样本。此外，现有的 LDL 算法将全局和局部标签相关性应用于同一个参数矩阵，无法充分发挥各自的优势。为解决这些问题，本文提出了一种基于横向和纵向标签相关性挖掘的新型 LDL 方法（LDL-HVLC）。该方法首先通过邻近样本的标签分布为每个样本编码一个唯一的局部影响向量。然后，将该向量扩展为辅助预测未知实例的附加特征，并设计一个惩罚项来纠正错误的局部影响向量（水平挖掘）。最后，为了捕捉标签的局部和全局相关性，我们构建了一个新的正则化项来限制输出结果的全局标签相关性（纵向挖掘）。在真实数据集上进行的大量实验证明，所提出的方法能有效解决标签分布问题，并优于目前最先进的方法。

{"title":"Label Distribution Learning Based on Horizontal and Vertical Mining of Label Correlations","authors":"Yaojin Lin;Yulin Li;Chenxi Wang;Lei Guo;Jinkun Chen","doi":"10.1109/TBDATA.2023.3338023","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338023","url":null,"abstract":"Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"275-287"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization 针对 VWAP 策略优化的分层深度强化学习

IF 7.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-11-30 DOI: 10.1109/TBDATA.2023.3338011

Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li

Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.

为长期订单设计以成交量加权平均价（VWAP）为目标的算法交易策略是经纪商的一个重要关注点。传统的基于规则的策略是明确预定的，缺乏有效的适应性，无法在动态市场中实现较低的交易成本。许多研究都试图通过强化学习最大限度地降低交易成本。然而，由于日内流动性模式的变化和奖励信号的稀疏，对长期订单交易策略（如 VWAP 策略）的改进仍然有限。为解决这一问题，我们提出了一种名为 "宏观-宏观-微观交易者 "的联合模型，该模型结合了深度学习和分层强化学习。该模型旨在优化 VWAP 策略中的父订单分配和子订单执行，从而降低长期订单的交易成本。它能有效捕捉市场模式，并在不同时间尺度上执行订单。我们在上海证券交易所上市的股票上进行的实验表明，在 VWAP 滑点方面，我们的方法优于最优基线，最多可节省 2.22 个基点，这验证了进一步将分批订单拆分为多个子目标可有效降低交易成本。

{"title":"Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization","authors":"Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li","doi":"10.1109/TBDATA.2023.3338011","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338011","url":null,"abstract":"Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"288-300"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Balanced Bayesian Classifiers From Labeled and Unlabeled Data 从标记和未标记数据中学习平衡贝叶斯分类器

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2023-11-30 DOI: 10.1109/TBDATA.2023.3338019

Lu Guo;Limin Wang;Qilong Li;Kuo Li

How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.

如何在成本不对称的不平衡数据上训练学习者，已被公认为数据挖掘领域最重要的挑战之一。贝叶斯网络分类器（BNC）提供了一种强大的概率工具，用于编码有向无环图（DAG）中随机变量之间的概率依赖关系。这将导致对条件或联合概率分布的估计出现偏差，最终降低分类准确性。为了解决这个问题，我们建议重新定义信息论指标，以统一表示属性之间或属性值之间的平衡依赖关系。然后引入启发式搜索策略和阈值操作，分别从有标签和无标签数据中学习精炼的 DAG。在 32 个基准数据集上的实验结果表明，所提出的具有高度可扩展性的算法与一些最先进的单学习器和集合学习器相比具有竞争力或更胜一筹。

引用次数: 0