{"title":"Exploring the Use of Data at Multiple Granularity Levels in Machine Learning-Based Stock Trading","authors":"Jacopo Fior, Luca Cagliero","doi":"10.1109/ICDMW51313.2020.00053","DOIUrl":null,"url":null,"abstract":"In the last decade the Artificial Intelligence and Data Science communities have paid an increasing attention to the problem of forecasting stock market movements. The abundance of stock-related data, including price series, news articles, financial reports, and social content has leveraged the use of Machine Learning techniques to drive quantitative stock trading. In this field, a huge body of work has been devoted to identifying the most predictive features and to select the best performing algorithms. However, since algorithm performance is heavily affected by the granularity of the analyzed time series as well as by the amount of historical data used to train the ML models, identifying the most appropriate time granularity and ML pipeline can be challenging. This paper studies the relationship between the granularity of time series data and ML performance. It compares also the performance of established ML pipelines in order to evaluate the pros and cons of periodically retraining the ML models. Furthermore, it performs a step beyond towards the integration of ML into real trading systems by studying how to conveniently set up the most established trading system characteristics. The results provide preliminary empirical evidences on how to profitably trade U.S. NASDAQ-100 stocks and leave room for further investigations.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the last decade the Artificial Intelligence and Data Science communities have paid an increasing attention to the problem of forecasting stock market movements. The abundance of stock-related data, including price series, news articles, financial reports, and social content has leveraged the use of Machine Learning techniques to drive quantitative stock trading. In this field, a huge body of work has been devoted to identifying the most predictive features and to select the best performing algorithms. However, since algorithm performance is heavily affected by the granularity of the analyzed time series as well as by the amount of historical data used to train the ML models, identifying the most appropriate time granularity and ML pipeline can be challenging. This paper studies the relationship between the granularity of time series data and ML performance. It compares also the performance of established ML pipelines in order to evaluate the pros and cons of periodically retraining the ML models. Furthermore, it performs a step beyond towards the integration of ML into real trading systems by studying how to conveniently set up the most established trading system characteristics. The results provide preliminary empirical evidences on how to profitably trade U.S. NASDAQ-100 stocks and leave room for further investigations.