HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer

Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu
{"title":"HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer","authors":"Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu","doi":"arxiv-2408.01271","DOIUrl":null,"url":null,"abstract":"In quantitative trading, it is common to find patterns in short term volatile\ntrends of the market. These patterns are known as High Frequency (HF) risk\nfactors, serving as key indicators of future stock price volatility.\nTraditionally, these risk factors were generated by financial models relying\nheavily on domain-specific knowledge manually added rather than extensive\nmarket data. Inspired by symbolic regression (SR), which infers mathematical\nlaws from data, we treat the extraction of formulaic risk factors from\nhigh-frequency trading (HFT) market data as an SR task. In this paper, we\nchallenge the manual construction of risk factors and propose an end-to-end\nmethodology, Intraday Risk Factor Transformer (IRFT), to directly predict\ncomplete formulaic factors, including constants. We use a hybrid\nsymbolic-numeric vocabulary where symbolic tokens represent operators/stock\nfeatures and numeric tokens represent constants. We train a Transformer model\non the HFT dataset to generate complete formulaic HF risk factors without\nrelying on a predefined skeleton of operators. It determines the general shape\nof the stock volatility law up to a choice of constants. We refine the\npredicted constants (a, b) using the Broyden Fletcher Goldfarb Shanno algorithm\n(BFGS) to mitigate non-linear issues. Compared to the 10 approaches in SRBench,\na living benchmark for SR, IRFT gains a 30% excess investment return on the\nHS300 and SP500 datasets, with inference times orders of magnitude faster than\ntheirs in HF risk factor mining tasks.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In quantitative trading, it is common to find patterns in short term volatile trends of the market. These patterns are known as High Frequency (HF) risk factors, serving as key indicators of future stock price volatility. Traditionally, these risk factors were generated by financial models relying heavily on domain-specific knowledge manually added rather than extensive market data. Inspired by symbolic regression (SR), which infers mathematical laws from data, we treat the extraction of formulaic risk factors from high-frequency trading (HFT) market data as an SR task. In this paper, we challenge the manual construction of risk factors and propose an end-to-end methodology, Intraday Risk Factor Transformer (IRFT), to directly predict complete formulaic factors, including constants. We use a hybrid symbolic-numeric vocabulary where symbolic tokens represent operators/stock features and numeric tokens represent constants. We train a Transformer model on the HFT dataset to generate complete formulaic HF risk factors without relying on a predefined skeleton of operators. It determines the general shape of the stock volatility law up to a choice of constants. We refine the predicted constants (a, b) using the Broyden Fletcher Goldfarb Shanno algorithm (BFGS) to mitigate non-linear issues. Compared to the 10 approaches in SRBench, a living benchmark for SR, IRFT gains a 30% excess investment return on the HS300 and SP500 datasets, with inference times orders of magnitude faster than theirs in HF risk factor mining tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HRFT:通过变压器端到端挖掘高频风险因素集合
在量化交易中,从市场短期波动趋势中发现规律是很常见的。这些模式被称为高频(HF)风险因子,是未来股价波动的关键指标。传统上,这些风险因子是由金融模型生成的,主要依赖于人工添加的特定领域知识,而不是广泛的市场数据。受到从数据中推断数学法则的符号回归(SR)的启发,我们将从高频交易(HFT)市场数据中提取公式化风险因子视为 SR 任务。在本文中,我们挑战了手动构建风险因子的方法,并提出了一种端到端的方法--日内风险因子转换器(IRFT),可直接预测包括常数在内的完整公式因子。我们使用符号-数字混合词汇,其中符号标记代表运算符/股票特征,数字标记代表常数。我们在 HFT 数据集上训练 Transformer 模型,以生成完整的公式化高频风险因子,而无需依赖预定义的运算符骨架。它确定了股票波动率规律的一般形状,直至常数的选择。我们使用 Broyden Fletcher Goldfarb Shanno 算法(BFGS)对预测常数(a、b)进行细化,以缓解非线性问题。与 SRBench(SR 的活基准)中的 10 种方法相比,IRFT 在 HS300 和 SP500 数据集上获得了 30% 的超额投资回报,其推理时间比它们在高频风险因素挖掘任务中的推理时间快了几个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A generalized non-hourglass updated Lagrangian formulation for SPH solid dynamics A Knowledge-Inspired Hierarchical Physics-Informed Neural Network for Pipeline Hydraulic Transient Simulation Uncertainty Analysis of Limit Cycle Oscillations in Nonlinear Dynamical Systems with the Fourier Generalized Polynomial Chaos Expansion Micropolar elastoplasticity using a fast Fourier transform-based solver A differentiable structural analysis framework for high-performance design optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1