We explore the performance of various artificial neural network architectures, including a multilayer perceptron (MLP), Kolmogorov-Arnold network (KAN), LSTM-GRU hybrid recursive neural network (RNN) models, and a time-delay neural network (TDNN) for pricing European call options. In this study, we attempt to leverage the ability of supervised learning methods, such as ANNs, KANs, and gradient-boosted decision trees, to approximate complex multivariate functions in order to calibrate option prices based on past market data. The motivation for using ANNs and KANs is the Universal Approximation Theorem and Kolmogorov-Arnold Representation Theorem, respectively. Specifically, we use S&P 500 (SPX) and NASDAQ 100 (NDX) index options traded during 2015-2023 with times to maturity ranging from 15 days to over 4 years (OptionMetrics IvyDB US dataset). Black & Scholes's (BS) PDE cite{Black1973} model's performance in pricing the same options compared to real data is used as a benchmark. This model relies on strong assumptions, and it has been observed and discussed in the literature that real data does not match its predictions. Supervised learning methods are widely used as an alternative for calibrating option prices due to some of the limitations of this model. In our experiments, the BS model underperforms compared to all of the others. Also, the best TDNN model outperforms the best MLP model on all error metrics. We implement a simple self-attention mechanism to enhance the RNN models, significantly improving their performance. The best-performing model overall is the LSTM-GRU hybrid RNN model with attention. Also, the KAN model outperforms the TDNN and MLP models. We analyze the performance of all models by ticker, moneyness category, and over/under/correctly-priced percentage.
{"title":"MLP, XGBoost, KAN, TDNN, and LSTM-GRU Hybrid RNN with Attention for SPX and NDX European Call Option Pricing","authors":"Boris Ter-Avanesov, Homayoon Beigi","doi":"arxiv-2409.06724","DOIUrl":"https://doi.org/arxiv-2409.06724","url":null,"abstract":"We explore the performance of various artificial neural network\u0000architectures, including a multilayer perceptron (MLP), Kolmogorov-Arnold\u0000network (KAN), LSTM-GRU hybrid recursive neural network (RNN) models, and a\u0000time-delay neural network (TDNN) for pricing European call options. In this\u0000study, we attempt to leverage the ability of supervised learning methods, such\u0000as ANNs, KANs, and gradient-boosted decision trees, to approximate complex\u0000multivariate functions in order to calibrate option prices based on past market\u0000data. The motivation for using ANNs and KANs is the Universal Approximation\u0000Theorem and Kolmogorov-Arnold Representation Theorem, respectively.\u0000Specifically, we use S&P 500 (SPX) and NASDAQ 100 (NDX) index options traded\u0000during 2015-2023 with times to maturity ranging from 15 days to over 4 years\u0000(OptionMetrics IvyDB US dataset). Black & Scholes's (BS) PDE cite{Black1973}\u0000model's performance in pricing the same options compared to real data is used\u0000as a benchmark. This model relies on strong assumptions, and it has been\u0000observed and discussed in the literature that real data does not match its\u0000predictions. Supervised learning methods are widely used as an alternative for\u0000calibrating option prices due to some of the limitations of this model. In our\u0000experiments, the BS model underperforms compared to all of the others. Also,\u0000the best TDNN model outperforms the best MLP model on all error metrics. We\u0000implement a simple self-attention mechanism to enhance the RNN models,\u0000significantly improving their performance. The best-performing model overall is\u0000the LSTM-GRU hybrid RNN model with attention. Also, the KAN model outperforms\u0000the TDNN and MLP models. We analyze the performance of all models by ticker,\u0000moneyness category, and over/under/correctly-priced percentage.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongcheng Ding, Xuanze Zhao, Zixiao Jiang, Shamsul Nahar Abdullah, Deshinta Arrova Dewi
Accurate forecasting of the EUR/USD exchange rate is crucial for investors, businesses, and policymakers. This paper proposes a novel framework, IUS, that integrates unstructured textual data from news and analysis with structured data on exchange rates and financial indicators to enhance exchange rate prediction. The IUS framework employs large language models for sentiment polarity scoring and exchange rate movement classification of texts. These textual features are combined with quantitative features and input into a Causality-Driven Feature Generator. An Optuna-optimized Bi-LSTM model is then used to forecast the EUR/USD exchange rate. Experiments demonstrate that the proposed method outperforms benchmark models, reducing MAE by 10.69% and RMSE by 9.56% compared to the best performing baseline. Results also show the benefits of data fusion, with the combination of unstructured and structured data yielding higher accuracy than structured data alone. Furthermore, feature selection using the top 12 important quantitative features combined with the textual features proves most effective. The proposed IUS framework and Optuna-Bi-LSTM model provide a powerful new approach for exchange rate forecasting through multi-source data integration.
{"title":"EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods","authors":"Hongcheng Ding, Xuanze Zhao, Zixiao Jiang, Shamsul Nahar Abdullah, Deshinta Arrova Dewi","doi":"arxiv-2408.13214","DOIUrl":"https://doi.org/arxiv-2408.13214","url":null,"abstract":"Accurate forecasting of the EUR/USD exchange rate is crucial for investors,\u0000businesses, and policymakers. This paper proposes a novel framework, IUS, that\u0000integrates unstructured textual data from news and analysis with structured\u0000data on exchange rates and financial indicators to enhance exchange rate\u0000prediction. The IUS framework employs large language models for sentiment\u0000polarity scoring and exchange rate movement classification of texts. These\u0000textual features are combined with quantitative features and input into a\u0000Causality-Driven Feature Generator. An Optuna-optimized Bi-LSTM model is then\u0000used to forecast the EUR/USD exchange rate. Experiments demonstrate that the\u0000proposed method outperforms benchmark models, reducing MAE by 10.69% and RMSE\u0000by 9.56% compared to the best performing baseline. Results also show the\u0000benefits of data fusion, with the combination of unstructured and structured\u0000data yielding higher accuracy than structured data alone. Furthermore, feature\u0000selection using the top 12 important quantitative features combined with the\u0000textual features proves most effective. The proposed IUS framework and\u0000Optuna-Bi-LSTM model provide a powerful new approach for exchange rate\u0000forecasting through multi-source data integration.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, tables, and time-series data to embed comprehensive financial knowledge. FinLLaMA is then instruction fine-tuned with 573K financial instructions, resulting in FinLLaMA-instruct, which enhances task performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types. Extensive evaluations demonstrate FinLLaMA's superior performance over LLaMA3-8B, LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19 and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive Sharpe Ratios in trading simulations, highlighting its robust financial application capabilities. We will continually maintain and improve our models and benchmarks to support ongoing innovation in academia and industry.
{"title":"Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications","authors":"Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu, Yilun Zhao, Yitao Long, Guojun Xiong, Kaleb Smith, Honghai Yu, Yanzhao Lai, Min Peng, Jianyun Nie, Jordan W. Suchow, Xiao-Yang Liu, Benyou Wang, Alejandro Lopez-Lira, Jimin Huang, Sophia Ananiadou","doi":"arxiv-2408.11878","DOIUrl":"https://doi.org/arxiv-2408.11878","url":null,"abstract":"Large language models (LLMs) have advanced financial applications, yet they\u0000often lack sufficient financial knowledge and struggle with tasks involving\u0000multi-modal inputs like tables and time series data. To address these\u0000limitations, we introduce textit{Open-FinLLMs}, a series of Financial LLMs. We\u0000begin with FinLLaMA, pre-trained on a 52 billion token financial corpus,\u0000incorporating text, tables, and time-series data to embed comprehensive\u0000financial knowledge. FinLLaMA is then instruction fine-tuned with 573K\u0000financial instructions, resulting in FinLLaMA-instruct, which enhances task\u0000performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M\u0000image-text instructions to handle complex financial data types. Extensive\u0000evaluations demonstrate FinLLaMA's superior performance over LLaMA3-8B,\u0000LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19\u0000and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other\u0000Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and\u0000charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive\u0000Sharpe Ratios in trading simulations, highlighting its robust financial\u0000application capabilities. We will continually maintain and improve our models\u0000and benchmarks to support ongoing innovation in academia and industry.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present Deep-MacroFin, a comprehensive framework designed to solve partial differential equations, with a particular focus on models in continuous time economics. This framework leverages deep learning methodologies, including conventional Multi-Layer Perceptrons and the newly developed Kolmogorov-Arnold Networks. It is optimized using economic information encapsulated by Hamilton-Jacobi-Bellman equations and coupled algebraic equations. The application of neural networks holds the promise of accurately resolving high-dimensional problems with fewer computational demands and limitations compared to standard numerical methods. This versatile framework can be readily adapted for elementary differential equations, and systems of differential equations, even in cases where the solutions may exhibit discontinuities. Importantly, it offers a more straightforward and user-friendly implementation than existing libraries.
{"title":"Deep-MacroFin: Informed Equilibrium Neural Network for Continuous Time Economic Models","authors":"Yuntao Wu, Jiayuan Guo, Goutham Gopalakrishna, Zisis Poulos","doi":"arxiv-2408.10368","DOIUrl":"https://doi.org/arxiv-2408.10368","url":null,"abstract":"In this paper, we present Deep-MacroFin, a comprehensive framework designed\u0000to solve partial differential equations, with a particular focus on models in\u0000continuous time economics. This framework leverages deep learning\u0000methodologies, including conventional Multi-Layer Perceptrons and the newly\u0000developed Kolmogorov-Arnold Networks. It is optimized using economic\u0000information encapsulated by Hamilton-Jacobi-Bellman equations and coupled\u0000algebraic equations. The application of neural networks holds the promise of\u0000accurately resolving high-dimensional problems with fewer computational demands\u0000and limitations compared to standard numerical methods. This versatile\u0000framework can be readily adapted for elementary differential equations, and\u0000systems of differential equations, even in cases where the solutions may\u0000exhibit discontinuities. Importantly, it offers a more straightforward and\u0000user-friendly implementation than existing libraries.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Cunha Oliveira, Yutong Lu, Xi Lin, Mihai Cucuringu, Andre Fujita
We introduce a novel framework to financial time series forecasting that leverages causality-inspired models to balance the trade-off between invariance to distributional changes and minimization of prediction errors. To the best of our knowledge, this is the first study to conduct a comprehensive comparative analysis among state-of-the-art causal discovery algorithms, benchmarked against non-causal feature selection techniques, in the application of forecasting asset returns. Empirical evaluations demonstrate the efficacy of our approach in yielding stable and accurate predictions, outperforming baseline models, particularly in tumultuous market conditions.
{"title":"Causality-Inspired Models for Financial Time Series Forecasting","authors":"Daniel Cunha Oliveira, Yutong Lu, Xi Lin, Mihai Cucuringu, Andre Fujita","doi":"arxiv-2408.09960","DOIUrl":"https://doi.org/arxiv-2408.09960","url":null,"abstract":"We introduce a novel framework to financial time series forecasting that\u0000leverages causality-inspired models to balance the trade-off between invariance\u0000to distributional changes and minimization of prediction errors. To the best of\u0000our knowledge, this is the first study to conduct a comprehensive comparative\u0000analysis among state-of-the-art causal discovery algorithms, benchmarked\u0000against non-causal feature selection techniques, in the application of\u0000forecasting asset returns. Empirical evaluations demonstrate the efficacy of\u0000our approach in yielding stable and accurate predictions, outperforming\u0000baseline models, particularly in tumultuous market conditions.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Venture Capital(VC) industry, predicting the success of startups is challenging due to limited financial data and the need for subjective revenue forecasts. Previous methods based on time series analysis or deep learning often fall short as they fail to incorporate crucial inter-company relationships such as competition and collaboration. Regarding the issues, we propose a novel approach using GrahphRAG augmented time series model. With GraphRAG, time series predictive methods are enhanced by integrating these vital relationships into the analysis framework, allowing for a more dynamic understanding of the startup ecosystem in venture capital. Our experimental results demonstrate that our model significantly outperforms previous models in startup success predictions. To the best of our knowledge, our work is the first application work of GraphRAG.
{"title":"Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method","authors":"Gao Zitian, Xiao Yihao","doi":"arxiv-2408.09420","DOIUrl":"https://doi.org/arxiv-2408.09420","url":null,"abstract":"In the Venture Capital(VC) industry, predicting the success of startups is\u0000challenging due to limited financial data and the need for subjective revenue\u0000forecasts. Previous methods based on time series analysis or deep learning\u0000often fall short as they fail to incorporate crucial inter-company\u0000relationships such as competition and collaboration. Regarding the issues, we\u0000propose a novel approach using GrahphRAG augmented time series model. With\u0000GraphRAG, time series predictive methods are enhanced by integrating these\u0000vital relationships into the analysis framework, allowing for a more dynamic\u0000understanding of the startup ecosystem in venture capital. Our experimental\u0000results demonstrate that our model significantly outperforms previous models in\u0000startup success predictions. To the best of our knowledge, our work is the\u0000first application work of GraphRAG.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sina Montazeri, Haseebullah Jumakhan, Sonia Abrasiabian, Amir Mirzaeinia
Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model's predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model's generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model's performance in a controlled experimental setting.
{"title":"Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning","authors":"Sina Montazeri, Haseebullah Jumakhan, Sonia Abrasiabian, Amir Mirzaeinia","doi":"arxiv-2408.11859","DOIUrl":"https://doi.org/arxiv-2408.11859","url":null,"abstract":"Building on our prior explorations of convolutional neural networks (CNNs)\u0000for financial data processing, this paper introduces two significant\u0000enhancements to refine our CNN model's predictive performance and robustness\u0000for financial tabular data. Firstly, we integrate a normalization layer at the\u0000input stage to ensure consistent feature scaling, addressing the issue of\u0000disparate feature magnitudes that can skew the learning process. This\u0000modification is hypothesized to aid in stabilizing the training dynamics and\u0000improving the model's generalization across diverse financial datasets.\u0000Secondly, we employ a Gradient Reduction Architecture, where earlier layers are\u0000wider and subsequent layers are progressively narrower. This enhancement is\u0000designed to enable the model to capture more complex and subtle patterns within\u0000the data, a crucial factor in accurately predicting financial outcomes. These\u0000advancements directly respond to the limitations identified in previous\u0000studies, where simpler models struggled with the complexity and variability\u0000inherent in financial applications. Initial tests confirm that these changes\u0000improve accuracy and model stability, suggesting that deeper and more nuanced\u0000network architectures can significantly benefit financial predictive tasks.\u0000This paper details the implementation of these enhancements and evaluates their\u0000impact on the model's performance in a controlled experimental setting.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This thesis explores the historical progression and theoretical constructs of financial mathematics, with an in-depth exploration of Stochastic Calculus as showcased in the Binomial Asset Pricing Model and the Continuous-Time Models. A comprehensive survey of stochastic calculus principles applied to option pricing is offered, highlighting insights from Peter Carr and Lorenzo Torricelli's ``Convex Duality in Continuous Option Pricing Models". This manuscript adopts techniques such as Monte-Carlo Simulation and machine learning algorithms to examine the propositions of Carr and Torricelli, drawing comparisons between the Logistic and Bachelier models. Additionally, it suggests directions for potential future research on option pricing methods.
{"title":"Stochastic Calculus for Option Pricing with Convex Duality, Logistic Model, and Numerical Examination","authors":"Zheng Cao","doi":"arxiv-2408.05672","DOIUrl":"https://doi.org/arxiv-2408.05672","url":null,"abstract":"This thesis explores the historical progression and theoretical constructs of\u0000financial mathematics, with an in-depth exploration of Stochastic Calculus as\u0000showcased in the Binomial Asset Pricing Model and the Continuous-Time Models. A\u0000comprehensive survey of stochastic calculus principles applied to option\u0000pricing is offered, highlighting insights from Peter Carr and Lorenzo\u0000Torricelli's ``Convex Duality in Continuous Option Pricing Models\". This\u0000manuscript adopts techniques such as Monte-Carlo Simulation and machine\u0000learning algorithms to examine the propositions of Carr and Torricelli, drawing\u0000comparisons between the Logistic and Bachelier models. Additionally, it\u0000suggests directions for potential future research on option pricing methods.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Explainable machine learning methods have been accompanied by substantial development. Despite their success, the existing approaches focus more on the general framework with no prior domain expertise. High-stakes financial sectors have extensive domain knowledge of the features. Hence, it is expected that explanations of models will be consistent with domain knowledge to ensure conceptual soundness. In this work, we study the group structures of features that are naturally formed in the financial dataset. Our study shows the importance of considering group structures that conform to the regulations. When group structures are present, direct applications of explainable machine learning methods, such as Shapley values and Integrated Gradients, may not provide consistent explanations; alternatively, group versions of the Shapley value can provide consistent explanations. We contain detailed examples to concentrate on the practical perspective of our framework.
{"title":"Why Groups Matter: Necessity of Group Structures in Attributions","authors":"Dangxing Chen, Jingfeng Chen, Weicheng Ye","doi":"arxiv-2408.05701","DOIUrl":"https://doi.org/arxiv-2408.05701","url":null,"abstract":"Explainable machine learning methods have been accompanied by substantial\u0000development. Despite their success, the existing approaches focus more on the\u0000general framework with no prior domain expertise. High-stakes financial sectors\u0000have extensive domain knowledge of the features. Hence, it is expected that\u0000explanations of models will be consistent with domain knowledge to ensure\u0000conceptual soundness. In this work, we study the group structures of features that are naturally\u0000formed in the financial dataset. Our study shows the importance of considering\u0000group structures that conform to the regulations. When group structures are\u0000present, direct applications of explainable machine learning methods, such as\u0000Shapley values and Integrated Gradients, may not provide consistent\u0000explanations; alternatively, group versions of the Shapley value can provide\u0000consistent explanations. We contain detailed examples to concentrate on the\u0000practical perspective of our framework.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the rapidly evolving domain of large-scale retail data systems, envisioning and simulating future consumer transactions has become a crucial area of interest. It offers significant potential to fortify demand forecasting and fine-tune inventory management. This paper presents an innovative application of Generative Adversarial Networks (GANs) to generate synthetic retail transaction data, specifically focusing on a novel system architecture that combines consumer behavior modeling with stock-keeping unit (SKU) availability constraints to address real-world assortment optimization challenges. We diverge from conventional methodologies by integrating SKU data into our GAN architecture and using more sophisticated embedding methods (e.g., hyper-graphs). This design choice enables our system to generate not only simulated consumer purchase behaviors but also reflects the dynamic interplay between consumer behavior and SKU availability -- an aspect often overlooked, among others, because of data scarcity in legacy retail simulation models. Our GAN model generates transactions under stock constraints, pioneering a resourceful experimental system with practical implications for real-world retail operation and strategy. Preliminary results demonstrate enhanced realism in simulated transactions measured by comparing generated items with real ones using methods employed earlier in related studies. This underscores the potential for more accurate predictive modeling.
在快速发展的大规模零售数据系统领域,设想和模拟未来的消费者交易已成为一个重要的关注领域。它为加强需求预测和微调库存管理提供了巨大的潜力。本文介绍了生成对抗网络(GANs)在生成合成零售交易数据方面的创新应用,特别关注一种新颖的系统架构,该架构将消费者行为建模与库存单位(SKU)可用性约束相结合,以解决现实世界中的分类优化难题。与传统方法不同的是,我们将 SKU 数据整合到我们的 GAN 架构中,并使用更复杂的嵌入方法(如超图)。这种设计选择使我们的系统不仅能生成模拟的消费者购买行为,还能反映消费者行为与 SKU 可用性之间的动态相互作用,而由于传统零售模拟模型中数据稀缺等原因,这一点常常被忽视。我们的 GAN 模型在库存约束条件下生成交易,开创了一个资源丰富的实验系统,对现实世界的零售运营和战略具有实际意义。初步结果表明,通过比较生成的商品和真实商品,并使用相关研究中早期使用的方法,模拟交易的真实性得到了增强。这凸显了更精确预测建模的潜力。
{"title":"Consumer Transactions Simulation through Generative Adversarial Networks","authors":"Sergiy Tkachuk, Szymon Łukasik, Anna Wróblewska","doi":"arxiv-2408.03655","DOIUrl":"https://doi.org/arxiv-2408.03655","url":null,"abstract":"In the rapidly evolving domain of large-scale retail data systems,\u0000envisioning and simulating future consumer transactions has become a crucial\u0000area of interest. It offers significant potential to fortify demand forecasting\u0000and fine-tune inventory management. This paper presents an innovative\u0000application of Generative Adversarial Networks (GANs) to generate synthetic\u0000retail transaction data, specifically focusing on a novel system architecture\u0000that combines consumer behavior modeling with stock-keeping unit (SKU)\u0000availability constraints to address real-world assortment optimization\u0000challenges. We diverge from conventional methodologies by integrating SKU data\u0000into our GAN architecture and using more sophisticated embedding methods (e.g.,\u0000hyper-graphs). This design choice enables our system to generate not only\u0000simulated consumer purchase behaviors but also reflects the dynamic interplay\u0000between consumer behavior and SKU availability -- an aspect often overlooked,\u0000among others, because of data scarcity in legacy retail simulation models. Our\u0000GAN model generates transactions under stock constraints, pioneering a\u0000resourceful experimental system with practical implications for real-world\u0000retail operation and strategy. Preliminary results demonstrate enhanced realism\u0000in simulated transactions measured by comparing generated items with real ones\u0000using methods employed earlier in related studies. This underscores the\u0000potential for more accurate predictive modeling.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}