{"title":"基于规则的股票交易策略:一种新的深度强化学习方法","authors":"Badr Hirchoua, B. Ouhbi, B. Frikh","doi":"10.1109/CloudTech49835.2020.9365878","DOIUrl":null,"url":null,"abstract":"Automated trading is fully represented as an online decision-making problem, where agents desire to sell it at a higher price to buy at a low one. In financial theory, financial markets trading produces a noisy and random behavior involving highly imperfect information. Therefore, developing a profitable strategy is very complicated in dynamic and complex stock market environments.This paper introduces a new deep reinforcement learning (DRL) method based on the encouragement window policy for automatic stock trading. Motivated by the advantage function, the proposed approach trains a DRL agent to handle the trading environment’s dynamicity and generate huge profits. On the one hand, the advantage function tries to estimate the relative value of the current state’s selected actions. It consists of the discounted sum of rewards and the baseline estimate. On the other hand, the encouragement window is based only on the last rewards, providing a dense synthesized experience instead of a noisy signal. This process has progressively improved actions’ quality by balancing the action selection versus states’ uncertainty. The self-learned rules drive the agent’s policy to choose productive actions that produce a high achievement across the environment. Experimental results on four real-world stocks have proven the proposed system’s efficiency. Precisely, it has produced outstanding performances, executed more creative trades by a small number of transactions, and outperformed different baselines.","PeriodicalId":272860,"journal":{"name":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Rules Based Policy for Stock Trading: A New Deep Reinforcement Learning Method\",\"authors\":\"Badr Hirchoua, B. Ouhbi, B. Frikh\",\"doi\":\"10.1109/CloudTech49835.2020.9365878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated trading is fully represented as an online decision-making problem, where agents desire to sell it at a higher price to buy at a low one. In financial theory, financial markets trading produces a noisy and random behavior involving highly imperfect information. Therefore, developing a profitable strategy is very complicated in dynamic and complex stock market environments.This paper introduces a new deep reinforcement learning (DRL) method based on the encouragement window policy for automatic stock trading. Motivated by the advantage function, the proposed approach trains a DRL agent to handle the trading environment’s dynamicity and generate huge profits. On the one hand, the advantage function tries to estimate the relative value of the current state’s selected actions. It consists of the discounted sum of rewards and the baseline estimate. On the other hand, the encouragement window is based only on the last rewards, providing a dense synthesized experience instead of a noisy signal. This process has progressively improved actions’ quality by balancing the action selection versus states’ uncertainty. The self-learned rules drive the agent’s policy to choose productive actions that produce a high achievement across the environment. Experimental results on four real-world stocks have proven the proposed system’s efficiency. Precisely, it has produced outstanding performances, executed more creative trades by a small number of transactions, and outperformed different baselines.\",\"PeriodicalId\":272860,\"journal\":{\"name\":\"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudTech49835.2020.9365878\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudTech49835.2020.9365878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rules Based Policy for Stock Trading: A New Deep Reinforcement Learning Method
Automated trading is fully represented as an online decision-making problem, where agents desire to sell it at a higher price to buy at a low one. In financial theory, financial markets trading produces a noisy and random behavior involving highly imperfect information. Therefore, developing a profitable strategy is very complicated in dynamic and complex stock market environments.This paper introduces a new deep reinforcement learning (DRL) method based on the encouragement window policy for automatic stock trading. Motivated by the advantage function, the proposed approach trains a DRL agent to handle the trading environment’s dynamicity and generate huge profits. On the one hand, the advantage function tries to estimate the relative value of the current state’s selected actions. It consists of the discounted sum of rewards and the baseline estimate. On the other hand, the encouragement window is based only on the last rewards, providing a dense synthesized experience instead of a noisy signal. This process has progressively improved actions’ quality by balancing the action selection versus states’ uncertainty. The self-learned rules drive the agent’s policy to choose productive actions that produce a high achievement across the environment. Experimental results on four real-world stocks have proven the proposed system’s efficiency. Precisely, it has produced outstanding performances, executed more creative trades by a small number of transactions, and outperformed different baselines.