Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Big Data Pub Date : 2025-06-01 Epub Date: 2024-01-17 DOI:10.1089/big.2023.0004

Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed

{"title":"Modeling of Machine Learning-Based Extreme Value Theory in Stock Investment Risk Prediction: A Systematic Literature Review.","authors":"Melina Melina, Sukono, Herlina Napitupulu, Norizan Mohamed","doi":"10.1089/big.2023.0004","DOIUrl":null,"url":null,"abstract":"<p><p>The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is \"VaR.\" The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"161-180"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1089/big.2023.0004","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The stock market is heavily influenced by global sentiment, which is full of uncertainty and is characterized by extreme values and linear and nonlinear variables. High-frequency data generally refer to data that are collected at a very fast rate based on days, hours, minutes, and even seconds. Stock prices fluctuate rapidly and even at extremes along with changes in the variables that affect stock fluctuations. Research on investment risk estimation in the stock market that can identify extreme values is nonlinear, reliable in multivariate cases, and uses high-frequency data that are very important. The extreme value theory (EVT) approach can detect extreme values. This method is reliable in univariate cases and very complicated in multivariate cases. The purpose of this research was to collect, characterize, and analyze the investment risk estimation literature to identify research gaps. The literature used was selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and sourced from Sciencedirect.com and Scopus databases. A total of 1107 articles were produced from the search at the identification stage, reduced to 236 in the eligibility stage, and 90 articles in the included studies set. The bibliometric networks were visualized using the VOSviewer software, and the main keyword used as the search criteria is "VaR." The visualization showed that EVT, the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and historical simulation are models often used to estimate the investment risk; the application of the machine learning (ML)-based investment risk estimation model is low. There has been no research using a combination of EVT and ML to estimate the investment risk. The results showed that the hybrid model produced better Value-at-Risk (VaR) accuracy under uncertainty and nonlinear conditions. Generally, models only use daily return data as model input. Based on research gaps, a hybrid model framework for estimating risk measures is proposed using a combination of EVT and ML, using multivariable and high-frequency data to identify extreme values in the distribution of data. The goal is to produce an accurate and flexible estimated risk value against extreme changes and shocks in the stock market. Mathematics Subject Classification: 60G25; 62M20; 6245; 62P05; 91G70.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的极值理论在股票投资风险预测中的建模：系统性文献综述。

股票市场深受全球情绪的影响，而全球情绪充满了不确定性，其特点是极端值以及线性和非线性变量。高频数据一般是指以天、小时、分钟甚至秒为单位快速收集的数据。股票价格随着影响股票波动的变量的变化而快速波动，甚至出现极端波动。能够识别极值的股市投资风险评估研究是非线性的，在多变量情况下是可靠的，并且使用的是非常重要的高频数据。极值理论（EVT）方法可以检测极值。这种方法在单变量情况下是可靠的，而在多变量情况下则非常复杂。本研究的目的是收集、描述和分析投资风险估计文献，找出研究空白。所使用的文献是根据《系统综述和元分析首选报告项目》（Preferred Reporting Items for Systematic Reviews and Meta-Analyses，PRISMA）进行筛选的，来源于 Sciencedirect.com 和 Scopus 数据库。在识别阶段共搜索到 1107 篇文章，在资格审查阶段减少到 236 篇，在纳入研究集中有 90 篇文章。使用 VOSviewer 软件对文献计量学网络进行了可视化，搜索标准的主要关键词是 "VaR"。可视化结果显示，EVT、广义自回归条件异方差（GARCH）模型和历史模拟是常用的投资风险估计模型；基于机器学习（ML）的投资风险估计模型应用较少。目前还没有将 EVT 和 ML 结合起来估计投资风险的研究。研究结果表明，在不确定和非线性条件下，混合模型能产生更好的风险价值（VaR）精度。一般来说，模型仅使用每日收益数据作为模型输入。基于研究差距，我们提出了一个结合 EVT 和 ML 的混合模型框架来估算风险度量，使用多变量和高频数据来识别数据分布中的极端值。其目标是针对股票市场的极端变化和冲击，得出准确而灵活的估计风险值。数学学科分类：60G25; 62M20; 6245; 62P05; 91G70.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Big Data COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

9.10

自引率

2.20%

发文量

期刊介绍： Big Data is the leading peer-reviewed journal covering the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data. The Journal addresses questions surrounding this powerful and growing field of data science and facilitates the efforts of researchers, business managers, analysts, developers, data scientists, physicists, statisticians, infrastructure developers, academics, and policymakers to improve operations, profitability, and communications within their businesses and institutions. Spanning a broad array of disciplines focusing on novel big data technologies, policies, and innovations, the Journal brings together the community to address current challenges and enforce effective efforts to organize, store, disseminate, protect, manipulate, and, most importantly, find the most effective strategies to make this incredible amount of information work to benefit society, industry, academia, and government. Big Data coverage includes: Big data industry standards, New technologies being developed specifically for big data, Data acquisition, cleaning, distribution, and best practices, Data protection, privacy, and policy, Business interests from research to product, The changing role of business intelligence, Visualization and design principles of big data infrastructures, Physical interfaces and robotics, Social networking advantages for Facebook, Twitter, Amazon, Google, etc, Opportunities around big data and how companies can harness it to their advantage.