Pub Date : 2023-12-14DOI: 10.1109/TBDATA.2023.3343349
Yifei Yang;Xiaoke Ma
Multi-layer networks precisely model complex systems in society and nature with various types of interactions, and identifying conserved modules that are well-connected in all layers is of great significance for revealing their structure-function relationships. Current algorithms are criticized for either ignoring the intrinsic relations among various layers, or failing to learn discriminative features. To attack these limitations, a novel graph contrastive learning framework for clustering of multi-layer networks is proposed by joining nonnegative matrix factorization and graph contrastive learning (called jNMF-GCL), where the intrinsic structure and discriminative of features are simultaneously addressed. Specifically, features of vertices are first learned by preserving the conserved structure in multi-layer networks with matrix factorization, and then jNMF-GCL learns an affinity structure of vertices by manipulating features of various layers. To enhance quality of features, contrastive learning is executed by selecting the positive and negative samples from the constructed affinity graph, which significantly improves discriminative of features. Finally, jNMF-GCL incorporates feature learning, construction of affinity graph, contrastive learning and clustering into an overall objective, where global and local structural information are seamlessly fused, providing a more effective way to describe structure of multi-layer networks. Extensive experiments conducted on both artificial and real-world networks have shown the superior performance of jNMF-GCL over state-of-the-art models across various metrics.
{"title":"Graph Contrastive Learning for Clustering of Multi-Layer Networks","authors":"Yifei Yang;Xiaoke Ma","doi":"10.1109/TBDATA.2023.3343349","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3343349","url":null,"abstract":"Multi-layer networks precisely model complex systems in society and nature with various types of interactions, and identifying conserved modules that are well-connected in all layers is of great significance for revealing their structure-function relationships. Current algorithms are criticized for either ignoring the intrinsic relations among various layers, or failing to learn discriminative features. To attack these limitations, a novel graph contrastive learning framework for clustering of multi-layer networks is proposed by joining nonnegative matrix factorization and graph contrastive learning (called jNMF-GCL), where the intrinsic structure and discriminative of features are simultaneously addressed. Specifically, features of vertices are first learned by preserving the conserved structure in multi-layer networks with matrix factorization, and then jNMF-GCL learns an affinity structure of vertices by manipulating features of various layers. To enhance quality of features, contrastive learning is executed by selecting the positive and negative samples from the constructed affinity graph, which significantly improves discriminative of features. Finally, jNMF-GCL incorporates feature learning, construction of affinity graph, contrastive learning and clustering into an overall objective, where global and local structural information are seamlessly fused, providing a more effective way to describe structure of multi-layer networks. Extensive experiments conducted on both artificial and real-world networks have shown the superior performance of jNMF-GCL over state-of-the-art models across various metrics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"429-441"},"PeriodicalIF":7.5,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph data structures’ ability of representing vertex relationships has made them increasingly popular in recent years. Amid this trend, many property graph datasets have been collected and made public to facilitate a variant of queries such as the aggregate queries that will be extensively exploited in this paper. While cloud deployment of both the datasets and query services is intriguing, it could raise privacy concerns related to user queries and results. In past years, many works on graph privacy have been put forth, however they either do not consider query privacy or cannot be adapted for aggregate queries. Some others consider queries over encrypted graphs but cannot protect access pattern privacy. In particular, when deploying them to handle queries over public graph datasets, the cloud server can infer additional information related to user queries. Aiming at this challenge, we propose a privacy-preserving property graph aggregate query scheme in this paper. Specifically, we first design new privacy-preserving vertex matching and matching update techniques, which securely initialize and update the mapping between vertices in the dataset and the user-specified patterns, respectively. Based on them, we construct our proposed scheme to achieve aggregate queries over public property graphs. Rigid security analysis shows that our proposed scheme can protect the privacy of user queries and results as well as achieve access pattern privacy. In addition, extensive experiments also demonstrate the efficiency of our scheme in terms of computational overheads.
{"title":"Efficient and Privacy-Preserving Aggregate Query Over Public Property Graphs","authors":"Yunguo Guan;Rongxing Lu;Songnian Zhang;Yandong Zheng;Jun Shao;Guiyi Wei","doi":"10.1109/TBDATA.2023.3342623","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342623","url":null,"abstract":"Graph data structures’ ability of representing vertex relationships has made them increasingly popular in recent years. Amid this trend, many property graph datasets have been collected and made public to facilitate a variant of queries such as the aggregate queries that will be extensively exploited in this paper. While cloud deployment of both the datasets and query services is intriguing, it could raise privacy concerns related to user queries and results. In past years, many works on graph privacy have been put forth, however they either do not consider query privacy or cannot be adapted for aggregate queries. Some others consider queries over encrypted graphs but cannot protect access pattern privacy. In particular, when deploying them to handle queries over public graph datasets, the cloud server can infer additional information related to user queries. Aiming at this challenge, we propose a privacy-preserving property graph aggregate query scheme in this paper. Specifically, we first design new privacy-preserving vertex matching and matching update techniques, which securely initialize and update the mapping between vertices in the dataset and the user-specified patterns, respectively. Based on them, we construct our proposed scheme to achieve aggregate queries over public property graphs. Rigid security analysis shows that our proposed scheme can protect the privacy of user queries and results as well as achieve access pattern privacy. In addition, extensive experiments also demonstrate the efficiency of our scheme in terms of computational overheads.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"146-157"},"PeriodicalIF":7.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth, marking the onset of the STBD era. Recent studies have concentrated on developing algorithms and techniques for the collection, management, storage, processing, analysis, and visualization of STBD. Researchers have made significant advancements by enhancing STBD handling techniques, creating novel systems, and integrating spatio-temporal support into existing systems. However, these studies often neglect resource management and system optimization, crucial factors for enhancing the efficiency of STBD processing and applications. Additionally, the transition of STBD to the innovative Cloud-Edge-End unified computing system needs to be noticed. In this survey, we comprehensively explore the entire ecosystem of STBD analytics systems. We delineate the STBD analytics ecosystem and categorize the technologies used to process GIS data into five modules: STBD, computation resources, processing platform, resource management, and applications. Specifically, we subdivide STBD and its applications into geoscience-oriented and human-social activity-oriented. Within the processing platform module, we further categorize it into the data management layer (DBMS-GIS), data processing layer (BigData-GIS), data analysis layer (AI-GIS), and cloud native layer (Cloud-GIS). The resource management module and each layer in the processing platform are classified into three categories: task-oriented, resource-oriented, and cloud-based. Finally, we propose research agendas for potential future developments.
{"title":"A Survey on Spatio-Temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications","authors":"Huanghuang Liang;Zheng Zhang;Chuang Hu;Yili Gong;Dazhao Cheng","doi":"10.1109/TBDATA.2023.3342619","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342619","url":null,"abstract":"With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth, marking the onset of the STBD era. Recent studies have concentrated on developing algorithms and techniques for the collection, management, storage, processing, analysis, and visualization of STBD. Researchers have made significant advancements by enhancing STBD handling techniques, creating novel systems, and integrating spatio-temporal support into existing systems. However, these studies often neglect resource management and system optimization, crucial factors for enhancing the efficiency of STBD processing and applications. Additionally, the transition of STBD to the innovative Cloud-Edge-End unified computing system needs to be noticed. In this survey, we comprehensively explore the entire ecosystem of STBD analytics systems. We delineate the STBD analytics ecosystem and categorize the technologies used to process GIS data into five modules: STBD, computation resources, processing platform, resource management, and applications. Specifically, we subdivide STBD and its applications into geoscience-oriented and human-social activity-oriented. Within the processing platform module, we further categorize it into the data management layer (DBMS-GIS), data processing layer (BigData-GIS), data analysis layer (AI-GIS), and cloud native layer (Cloud-GIS). The resource management module and each layer in the processing platform are classified into three categories: task-oriented, resource-oriented, and cloud-based. Finally, we propose research agendas for potential future developments.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"174-193"},"PeriodicalIF":7.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-13DOI: 10.1109/TBDATA.2023.3342611
Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Hao Zhang;Yun-Yang Liu
Multi-dimensional data are inevitably corrupted, which hinders subsequent applications (e.g., image segmentation and classification). Recently, due to the powerful ability to characterize the correlation between any two modes of tensors, fully-connected tensor network (FCTN) decomposition has received increasing attention in multi-dimensional data recovery. However, the expressive power of FCTN decomposition in the original pixel domain has yet to be fully leveraged, which can not provide satisfactory results in the recovery of details and textures, especially for low-sampling rates or heavy noise scenarios. In this work, we suggest a feature-based FCTN decomposition model (termed as F-FCTN) for multi-dimensional data recovery, which can faithfully capture the relationship between the spatial-temporal/spectral-feature modes. Compared with the original FCTN decomposition, F-FCTN can more effectively recover the details and textures and be more suitable for the subsequent high-level applications. However, F-FCTN leads to a larger-scale feature tensor as compared with the original tensor, which brings challenges in designing the solving algorithm. To harness the resulting large-scale optimization problem, we develop an efficient leverage score sampling-based proximal alternating minimization (S-PAM) algorithm and theoretically establish its relative error guarantee. Extensive numerical experiments on real-world data illustrate that the proposed method performs favorably against compared methods in data recovery and facilitates subsequent image classification.
{"title":"Multi-Dimensional Data Recovery via Feature-Based Fully-Connected Tensor Network Decomposition","authors":"Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Hao Zhang;Yun-Yang Liu","doi":"10.1109/TBDATA.2023.3342611","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342611","url":null,"abstract":"Multi-dimensional data are inevitably corrupted, which hinders subsequent applications (e.g., image segmentation and classification). Recently, due to the powerful ability to characterize the correlation between any two modes of tensors, fully-connected tensor network (FCTN) decomposition has received increasing attention in multi-dimensional data recovery. However, the expressive power of FCTN decomposition in the original pixel domain has yet to be fully leveraged, which can not provide satisfactory results in the recovery of details and textures, especially for low-sampling rates or heavy noise scenarios. In this work, we suggest a feature-based FCTN decomposition model (termed as F-FCTN) for multi-dimensional data recovery, which can faithfully capture the relationship between the spatial-temporal/spectral-feature modes. Compared with the original FCTN decomposition, F-FCTN can more effectively recover the details and textures and be more suitable for the subsequent high-level applications. However, F-FCTN leads to a larger-scale feature tensor as compared with the original tensor, which brings challenges in designing the solving algorithm. To harness the resulting large-scale optimization problem, we develop an efficient leverage score sampling-based proximal alternating minimization (S-PAM) algorithm and theoretically establish its relative error guarantee. Extensive numerical experiments on real-world data illustrate that the proposed method performs favorably against compared methods in data recovery and facilitates subsequent image classification.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"386-399"},"PeriodicalIF":7.5,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-modal hashing retrieval approaches have received extensive attention owing to their storage superiority and retrieval efficiency. To achieve better retrieval performances, hashing methods seek to embed more semantic information of multi-modal data into hash codes. Existing deep cross-modal hashing methods typically learn hash functions from the similarity of paired data to generate hash codes. However, such locally-oriented learning methods often suffer from low efficiency and incomplete acquisition of semantic information. To address these challenges, this paper presents a novel deep hashing approach, called Proxy-based Graph Convolutional Hashing (PGCH), for cross-modal retrieval. Specifically, we use global similarity to construct proxy hash codes for two different modalities. This strategy of these proxy hash codes ensures that they include data points with significant distribution differences. It helps to match data from different modalities to different proxy hash codes, which can capture the global similarity of multi-modal hash codes and improve the efficiency of hash code learning. Subsequently, we employ a multi-modal contrastive loss to learn the global similarity. Furthermore, by constructing a proxy hash matrix from the proxy hash codes, we apply graph convolution to efficiently narrow the gap between different modalities, leading to a substantial improvement in retrieval performance for cross-modal retrieval tasks. The comprehensive experiments on four benchmark multimedia datasets demonstrate that our PGCH approach achieves better retrieval performances than a bundle of state-of-the-art hashing approaches.
{"title":"Proxy-Based Graph Convolutional Hashing for Cross-Modal Retrieval","authors":"Yibing Bai;Zhenqiu Shu;Jun Yu;Zhengtao Yu;Xiao-Jun Wu","doi":"10.1109/TBDATA.2023.3338951","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338951","url":null,"abstract":"Cross-modal hashing retrieval approaches have received extensive attention owing to their storage superiority and retrieval efficiency. To achieve better retrieval performances, hashing methods seek to embed more semantic information of multi-modal data into hash codes. Existing deep cross-modal hashing methods typically learn hash functions from the similarity of paired data to generate hash codes. However, such locally-oriented learning methods often suffer from low efficiency and incomplete acquisition of semantic information. To address these challenges, this paper presents a novel deep hashing approach, called Proxy-based Graph Convolutional Hashing (PGCH), for cross-modal retrieval. Specifically, we use global similarity to construct proxy hash codes for two different modalities. This strategy of these proxy hash codes ensures that they include data points with significant distribution differences. It helps to match data from different modalities to different proxy hash codes, which can capture the global similarity of multi-modal hash codes and improve the efficiency of hash code learning. Subsequently, we employ a multi-modal contrastive loss to learn the global similarity. Furthermore, by constructing a proxy hash matrix from the proxy hash codes, we apply graph convolution to efficiently narrow the gap between different modalities, leading to a substantial improvement in retrieval performance for cross-modal retrieval tasks. The comprehensive experiments on four benchmark multimedia datasets demonstrate that our PGCH approach achieves better retrieval performances than a bundle of state-of-the-art hashing approaches.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"371-385"},"PeriodicalIF":7.5,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leveraging multi-party data to provide recommendations remains a challenge, particularly when the party in need of recommendation services possesses only positive samples while other parties just have unlabeled data. To address UDD-PU learning problem, this paper proposes an algorithm VFPU, Vertical Federated learning with Positive and Unlabeled data. VFPU conducts random sampling repeatedly from the multi-party unlabeled data, treating sampled data as negative ones. It hence forms multiple training datasets with balanced positive and negative samples, and multiple testing datasets with those unsampled data. For each training dataset, VFPU trains a base estimator adapted for the vertical federated learning framework iteratively. We use the trained base estimator to generate forecast scores for each sample in the testing dataset. Based on the sum of scores and their frequency of occurrence in the testing datasets, we calculate the probability of being positive for each unlabeled sample. Those with top probabilities are regarded as reliable positive samples. They are then added to the positive samples and subsequently removed from the unlabeled data. This process of sampling, training, and selecting positive samples is iterated repeatedly. Experimental results demonstrated that VFPU performed comparably to its non-federated counterparts and outperformed other federated semi-supervised learning methods.
{"title":"Multi-Party Federated Recommendation Based on Semi-Supervised Learning","authors":"Xin Liu;Jiuluan Lv;Feng Chen;Qingjie Wei;Hangxuan He;Ying Qian","doi":"10.1109/TBDATA.2023.3338009","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338009","url":null,"abstract":"Leveraging multi-party data to provide recommendations remains a challenge, particularly when the party in need of recommendation services possesses only positive samples while other parties just have unlabeled data. To address UDD-PU learning problem, this paper proposes an algorithm VFPU, Vertical Federated learning with Positive and Unlabeled data. VFPU conducts random sampling repeatedly from the multi-party unlabeled data, treating sampled data as negative ones. It hence forms multiple training datasets with balanced positive and negative samples, and multiple testing datasets with those unsampled data. For each training dataset, VFPU trains a base estimator adapted for the vertical federated learning framework iteratively. We use the trained base estimator to generate forecast scores for each sample in the testing dataset. Based on the sum of scores and their frequency of occurrence in the testing datasets, we calculate the probability of being positive for each unlabeled sample. Those with top probabilities are regarded as reliable positive samples. They are then added to the positive samples and subsequently removed from the unlabeled data. This process of sampling, training, and selecting positive samples is iterated repeatedly. Experimental results demonstrated that VFPU performed comparably to its non-federated counterparts and outperformed other federated semi-supervised learning methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"356-370"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1109/TBDATA.2023.3338012
Ghulam Mujtaba;Sunder Ali Khowaja;Muhammad Aslam Jarwar;Jaehyuk Choi;Eun-Seok Ryu
Generating video highlights in the form of animated graphics interchange formats (GIFs) has significantly simplified the process of video browsing. Animated GIFs have paved the way for applications concerning streaming platforms and emerging technologies. Existing studies have led to large computational complexity without considering user personalization. This paper proposes lightweight method to attract users and increase views of videos through personalized artistic media, i.e., static thumbnails and animated GIF generation. The proposed method analyzes lightweight thumbnail containers (LTC) using the computational resources of the client device to recognize personalized events from feature-length sports videos. Next, the thumbnails are then ranked through the frame rank pooling method for their selection. Subsequently, the proposed method processes small video segments rather than considering the whole video for generating artistic media. This makes our approach more computationally efficient compared to existing methods that use the entire video data; thus, the proposed method complies with sustainable development goals. Furthermore, the proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data. Experiments reveal that the computational complexity of our method is 3.73 times lower than that of the state-of-the-art method.
{"title":"FRC-GIF: Frame Ranking-Based Personalized Artistic Media Generation Method for Resource Constrained Devices","authors":"Ghulam Mujtaba;Sunder Ali Khowaja;Muhammad Aslam Jarwar;Jaehyuk Choi;Eun-Seok Ryu","doi":"10.1109/TBDATA.2023.3338012","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338012","url":null,"abstract":"Generating video highlights in the form of animated graphics interchange formats (GIFs) has significantly simplified the process of video browsing. Animated GIFs have paved the way for applications concerning streaming platforms and emerging technologies. Existing studies have led to large computational complexity without considering user personalization. This paper proposes lightweight method to attract users and increase views of videos through personalized artistic media, i.e., static thumbnails and animated GIF generation. The proposed method analyzes lightweight thumbnail containers (LTC) using the computational resources of the client device to recognize personalized events from feature-length sports videos. Next, the thumbnails are then ranked through the frame rank pooling method for their selection. Subsequently, the proposed method processes small video segments rather than considering the whole video for generating artistic media. This makes our approach more computationally efficient compared to existing methods that use the entire video data; thus, the proposed method complies with sustainable development goals. Furthermore, the proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data. Experiments reveal that the computational complexity of our method is 3.73 times lower than that of the state-of-the-art method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"343-355"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10336393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.
{"title":"Label Distribution Learning Based on Horizontal and Vertical Mining of Label Correlations","authors":"Yaojin Lin;Yulin Li;Chenxi Wang;Lei Guo;Jinkun Chen","doi":"10.1109/TBDATA.2023.3338023","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338023","url":null,"abstract":"Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"275-287"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1109/TBDATA.2023.3338011
Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li
Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.
{"title":"Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization","authors":"Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li","doi":"10.1109/TBDATA.2023.3338011","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338011","url":null,"abstract":"Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"288-300"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1109/TBDATA.2023.3338019
Lu Guo;Limin Wang;Qilong Li;Kuo Li
How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.
{"title":"Learning Balanced Bayesian Classifiers From Labeled and Unlabeled Data","authors":"Lu Guo;Limin Wang;Qilong Li;Kuo Li","doi":"10.1109/TBDATA.2023.3338019","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338019","url":null,"abstract":"How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"330-342"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}