{"title":"Denoised Labels for Financial Time Series Data via Self-Supervised Learning","authors":"Yanqing Ma, Carmine Ventre, M. Polukarov","doi":"10.1145/3533271.3561687","DOIUrl":null,"url":null,"abstract":"The introduction of electronic trading platforms effectively changed the organisation of traditional systemic trading from quote-driven markets into order-driven markets. Its convenience led to an exponentially increasing amount of financial data, which is however hard to use for the prediction of future prices, due to the low signal-to-noise ratio and the non-stationarity of financial time series. Simpler classification tasks — where the goal is to predict the directions of future price movement via supervised learning algorithms — need sufficiently reliable labels to generalise well. Labelling financial data is however less well defined than in other domains: did the price go up because of noise or a signal? The existing labelling methods have limited countermeasures against the noise, as well as limited effects in improving learning algorithms. This work takes inspiration from image classification in trading [6] and the success of self-supervised learning in computer vision (e.g., [16]). We investigate the idea of applying these techniques to financial time series to reduce the noise exposure and hence generate correct labels. We look at label generation as the pretext task of a self-supervised learning approach and compare the naive (and noisy) labels, commonly used in the literature, with the labels generated by a denoising autoencoder for the same downstream classification task. Our results demonstrate that these denoised labels improve the performances of the downstream learning algorithm, for both small and large datasets, while preserving the market trends. These findings suggest that with our proposed techniques, self-supervised learning constitutes a powerful framework for generating “better” financial labels that are useful for studying the underlying patterns of the market.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The introduction of electronic trading platforms effectively changed the organisation of traditional systemic trading from quote-driven markets into order-driven markets. Its convenience led to an exponentially increasing amount of financial data, which is however hard to use for the prediction of future prices, due to the low signal-to-noise ratio and the non-stationarity of financial time series. Simpler classification tasks — where the goal is to predict the directions of future price movement via supervised learning algorithms — need sufficiently reliable labels to generalise well. Labelling financial data is however less well defined than in other domains: did the price go up because of noise or a signal? The existing labelling methods have limited countermeasures against the noise, as well as limited effects in improving learning algorithms. This work takes inspiration from image classification in trading [6] and the success of self-supervised learning in computer vision (e.g., [16]). We investigate the idea of applying these techniques to financial time series to reduce the noise exposure and hence generate correct labels. We look at label generation as the pretext task of a self-supervised learning approach and compare the naive (and noisy) labels, commonly used in the literature, with the labels generated by a denoising autoencoder for the same downstream classification task. Our results demonstrate that these denoised labels improve the performances of the downstream learning algorithm, for both small and large datasets, while preserving the market trends. These findings suggest that with our proposed techniques, self-supervised learning constitutes a powerful framework for generating “better” financial labels that are useful for studying the underlying patterns of the market.