The Effect of Data Types' on the Performance of Machine Learning Algorithms for Financial Prediction

Hulusi Mehmet Tanrikulu, Hakan Pabuccu
{"title":"The Effect of Data Types' on the Performance of Machine Learning Algorithms for Financial Prediction","authors":"Hulusi Mehmet Tanrikulu, Hakan Pabuccu","doi":"arxiv-2404.19324","DOIUrl":null,"url":null,"abstract":"Forecasting cryptocurrencies as a financial issue is crucial as it provides\ninvestors with possible financial benefits. A small improvement in forecasting\nperformance can lead to increased profitability; therefore, obtaining a\nrealistic forecast is very important for investors. Successful forecasting\nprovides traders with effective buy-or-hold strategies, allowing them to make\nmore profits. The most important thing in this process is to produce accurate\nforecasts suitable for real-life applications. Bitcoin, frequently mentioned\nrecently due to its volatility and chaotic behavior, has begun to pay great\nattention and has become an investment tool, especially during and after the\nCOVID-19 pandemic. This study provided a comprehensive methodology, including\nconstructing continuous and trend data using one and seven years periods of\ndata as inputs and applying machine learning (ML) algorithms to forecast\nBitcoin price movement. A binarization procedure was applied using continuous\ndata to construct the trend data representing each input feature trend.\nFollowing the related literature, the input features are determined as\ntechnical indicators, google trends, and the number of tweets. Random forest\n(RF), K-Nearest neighbor (KNN), Extreme Gradient Boosting (XGBoost-XGB),\nSupport vector machine (SVM) Naive Bayes (NB), Artificial Neural Networks\n(ANN), and Long-Short-Term Memory (LSTM) networks were applied on the selected\nfeatures for prediction purposes. This work investigates two main research\nquestions: i. How does the sample size affect the prediction performance of ML\nalgorithms? ii. How does the data type affect the prediction performance of ML\nalgorithms? Accuracy and area under the ROC curve (AUC) values were used to\ncompare the model performance. A t-test was performed to test the statistical\nsignificance of the prediction results.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.19324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Forecasting cryptocurrencies as a financial issue is crucial as it provides investors with possible financial benefits. A small improvement in forecasting performance can lead to increased profitability; therefore, obtaining a realistic forecast is very important for investors. Successful forecasting provides traders with effective buy-or-hold strategies, allowing them to make more profits. The most important thing in this process is to produce accurate forecasts suitable for real-life applications. Bitcoin, frequently mentioned recently due to its volatility and chaotic behavior, has begun to pay great attention and has become an investment tool, especially during and after the COVID-19 pandemic. This study provided a comprehensive methodology, including constructing continuous and trend data using one and seven years periods of data as inputs and applying machine learning (ML) algorithms to forecast Bitcoin price movement. A binarization procedure was applied using continuous data to construct the trend data representing each input feature trend. Following the related literature, the input features are determined as technical indicators, google trends, and the number of tweets. Random forest (RF), K-Nearest neighbor (KNN), Extreme Gradient Boosting (XGBoost-XGB), Support vector machine (SVM) Naive Bayes (NB), Artificial Neural Networks (ANN), and Long-Short-Term Memory (LSTM) networks were applied on the selected features for prediction purposes. This work investigates two main research questions: i. How does the sample size affect the prediction performance of ML algorithms? ii. How does the data type affect the prediction performance of ML algorithms? Accuracy and area under the ROC curve (AUC) values were used to compare the model performance. A t-test was performed to test the statistical significance of the prediction results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据类型对金融预测机器学习算法性能的影响
将加密货币作为一个金融问题进行预测至关重要,因为它能为投资者带来可能的经济利益。预测性能的微小改进都可能导致盈利能力的提高;因此,获得准确的预测对投资者来说非常重要。成功的预测为交易者提供了有效的买入或持有策略,使他们能够获得更多利润。在这一过程中,最重要的是做出适合实际应用的准确预测。比特币因其波动性和混沌行为最近经常被提及,已开始受到高度关注,并已成为一种投资工具,尤其是在 COVID-19 大流行期间和之后。本研究提供了一种全面的方法,包括使用一年和七年的数据作为输入,构建连续数据和趋势数据,并应用机器学习(ML)算法预测比特币的价格走势。根据相关文献,输入特征被确定为技术指标、谷歌趋势和推文数量。随机森林(RF)、K-近邻(KNN)、极梯度提升(XGBoost-XGB)、支持向量机(SVM)、奈夫贝叶斯(NB)、人工神经网络(ANN)和长短期记忆(LSTM)网络被应用于所选特征的预测。这项工作主要研究两个问题:i. 样本大小如何影响 ML 算法的预测性能?使用准确率和 ROC 曲线下面积(AUC)值来比较模型性能。采用 t 检验来检验预测结果的统计显著性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A deep primal-dual BSDE method for optimal stopping problems Robust financial calibration: a Bayesian approach for neural SDEs MANA-Net: Mitigating Aggregated Sentiment Homogenization with News Weighting for Enhanced Market Prediction QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE Signature of maturity in cryptocurrency volatility
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1