Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu
{"title":"HRFT:通过变压器端到端挖掘高频风险因素集合","authors":"Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu","doi":"arxiv-2408.01271","DOIUrl":null,"url":null,"abstract":"In quantitative trading, it is common to find patterns in short term volatile\ntrends of the market. These patterns are known as High Frequency (HF) risk\nfactors, serving as key indicators of future stock price volatility.\nTraditionally, these risk factors were generated by financial models relying\nheavily on domain-specific knowledge manually added rather than extensive\nmarket data. Inspired by symbolic regression (SR), which infers mathematical\nlaws from data, we treat the extraction of formulaic risk factors from\nhigh-frequency trading (HFT) market data as an SR task. In this paper, we\nchallenge the manual construction of risk factors and propose an end-to-end\nmethodology, Intraday Risk Factor Transformer (IRFT), to directly predict\ncomplete formulaic factors, including constants. We use a hybrid\nsymbolic-numeric vocabulary where symbolic tokens represent operators/stock\nfeatures and numeric tokens represent constants. We train a Transformer model\non the HFT dataset to generate complete formulaic HF risk factors without\nrelying on a predefined skeleton of operators. It determines the general shape\nof the stock volatility law up to a choice of constants. We refine the\npredicted constants (a, b) using the Broyden Fletcher Goldfarb Shanno algorithm\n(BFGS) to mitigate non-linear issues. Compared to the 10 approaches in SRBench,\na living benchmark for SR, IRFT gains a 30% excess investment return on the\nHS300 and SP500 datasets, with inference times orders of magnitude faster than\ntheirs in HF risk factor mining tasks.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer\",\"authors\":\"Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu\",\"doi\":\"arxiv-2408.01271\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In quantitative trading, it is common to find patterns in short term volatile\\ntrends of the market. These patterns are known as High Frequency (HF) risk\\nfactors, serving as key indicators of future stock price volatility.\\nTraditionally, these risk factors were generated by financial models relying\\nheavily on domain-specific knowledge manually added rather than extensive\\nmarket data. Inspired by symbolic regression (SR), which infers mathematical\\nlaws from data, we treat the extraction of formulaic risk factors from\\nhigh-frequency trading (HFT) market data as an SR task. In this paper, we\\nchallenge the manual construction of risk factors and propose an end-to-end\\nmethodology, Intraday Risk Factor Transformer (IRFT), to directly predict\\ncomplete formulaic factors, including constants. We use a hybrid\\nsymbolic-numeric vocabulary where symbolic tokens represent operators/stock\\nfeatures and numeric tokens represent constants. We train a Transformer model\\non the HFT dataset to generate complete formulaic HF risk factors without\\nrelying on a predefined skeleton of operators. It determines the general shape\\nof the stock volatility law up to a choice of constants. We refine the\\npredicted constants (a, b) using the Broyden Fletcher Goldfarb Shanno algorithm\\n(BFGS) to mitigate non-linear issues. Compared to the 10 approaches in SRBench,\\na living benchmark for SR, IRFT gains a 30% excess investment return on the\\nHS300 and SP500 datasets, with inference times orders of magnitude faster than\\ntheirs in HF risk factor mining tasks.\",\"PeriodicalId\":501309,\"journal\":{\"name\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.01271\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer
In quantitative trading, it is common to find patterns in short term volatile
trends of the market. These patterns are known as High Frequency (HF) risk
factors, serving as key indicators of future stock price volatility.
Traditionally, these risk factors were generated by financial models relying
heavily on domain-specific knowledge manually added rather than extensive
market data. Inspired by symbolic regression (SR), which infers mathematical
laws from data, we treat the extraction of formulaic risk factors from
high-frequency trading (HFT) market data as an SR task. In this paper, we
challenge the manual construction of risk factors and propose an end-to-end
methodology, Intraday Risk Factor Transformer (IRFT), to directly predict
complete formulaic factors, including constants. We use a hybrid
symbolic-numeric vocabulary where symbolic tokens represent operators/stock
features and numeric tokens represent constants. We train a Transformer model
on the HFT dataset to generate complete formulaic HF risk factors without
relying on a predefined skeleton of operators. It determines the general shape
of the stock volatility law up to a choice of constants. We refine the
predicted constants (a, b) using the Broyden Fletcher Goldfarb Shanno algorithm
(BFGS) to mitigate non-linear issues. Compared to the 10 approaches in SRBench,
a living benchmark for SR, IRFT gains a 30% excess investment return on the
HS300 and SP500 datasets, with inference times orders of magnitude faster than
theirs in HF risk factor mining tasks.