Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.
{"title":"Label Distribution Learning Based on Horizontal and Vertical Mining of Label Correlations","authors":"Yaojin Lin;Yulin Li;Chenxi Wang;Lei Guo;Jinkun Chen","doi":"10.1109/TBDATA.2023.3338023","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338023","url":null,"abstract":"Label distribution learning (LDL) is a novel approach that outputs labels with varying degrees of description. To enhance the performance of LDL algorithms, researchers have developed different algorithms with mining label correlations globally, locally, and both globally and locally. However, existing LDL algorithms for mining local label correlations roughly assume that samples within a cluster share same label correlations, which may not be applicable to all samples. Moreover, existing LDL algorithms apply global and local label correlations to the same parameter matrix, which cannot fully exploit their respective advantages. To address these issues, a novel LDL method based on horizontal and vertical mining of label correlations (LDL-HVLC) is proposed in this paper. The method first encodes a unique local influence vector for each sample through the label distribution of its neighbor samples. Then, this vector is extended as additional features to assist in predicting unknown instances, and a penalty term is designed to correct wrong local influence vector (horizontal mining). Finally, to capture both local and global correlations of label, a new regularization term is constructed to constrain the global label correlations on the output results (vertical mining). Extensive experiments on real datasets demonstrate that the proposed method effectively solves the label distribution problem and outperforms the current state-of-the-art methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"275-287"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1109/TBDATA.2023.3338011
Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li
Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.
{"title":"Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization","authors":"Xiaodong Li;Pangjing Wu;Chenxin Zou;Qing Li","doi":"10.1109/TBDATA.2023.3338011","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338011","url":null,"abstract":"Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"288-300"},"PeriodicalIF":7.2,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1109/TBDATA.2023.3338019
Lu Guo;Limin Wang;Qilong Li;Kuo Li
How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.
{"title":"Learning Balanced Bayesian Classifiers From Labeled and Unlabeled Data","authors":"Lu Guo;Limin Wang;Qilong Li;Kuo Li","doi":"10.1109/TBDATA.2023.3338019","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3338019","url":null,"abstract":"How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"330-342"},"PeriodicalIF":7.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-20DOI: 10.1109/TBDATA.2023.3334648
Li Li;Chuanqi Tao;Hongjing Guo;Jingxuan Zhang;Xiaobing Sun
Deep Learning has been applied to many applications across different domains. However, the distribution shift between the test data and training data is a major factor impacting the quality of deep neural networks (DNNs). To address this issue, existing research mainly focuses on enhancing DNN models by retraining them using labeled test data. However, labeling test data is costly, which seriously reduces the efficiency of DNN testing. To solve this problem, test selection strategically selected a small set of tests to label. Unfortunately, existing test selection methods seldom focus on the data distribution shift. To address the issue, this paper proposes an approach for test selection named Feature Distribution Analysis-Based Test Selection (FATS). FATS analyzes the distributions of test data and training data and then adopts learning to rank (a kind of supervised machine learning to solve ranking tasks) to intelligently combine the results of analysis for test selection. We conduct an empirical study on popular datasets and DNN models, and then compare FATS with seven test selection methods. Experiment results show that FATS effectively alleviates the impact of distribution shifts and outperforms the compared methods with the average accuracy improvement of 19.6% $sim$