The emergence of Large Language Models (LLMs) has unlocked unprecedented potential for comprehending and generating human-like text, fueling advances in the finance domain – a tool that can shape investment strategies and market predictions. Nevertheless, challenges stemming from the necessity for extensive labeled data and the imperative for data privacy remain. The generation of high-quality synthetic data emerges as a promising avenue to circumvent these issues. In this paper, we introduce a novel methodology, named “Reinforcement Prompting”, to address these challenges. Our strategy employs a policy network as a Selector to generate prompts, and an LLM as an Executor to produce financial synthetic data. This synthetic data generation process preserves data privacy and mitigates the dependency on real-world labeled datasets. We validate the effectiveness of our approach through experimental evaluations. Our results indicate that models trained on synthetic data generated via our approach exhibit competitive performance when compared to those trained on actual financial data, thereby bridging the performance gap. This research provides a novel solution to the challenges of data privacy and labeled data scarcity in financial sentiment analysis, offering considerable advancement in the field of financial machine learning.
This study compares the predictive ability of various machine learning models for credit card default repayment within different prediction frameworks, using data from a commercial bank in China. Firstly, utilizing different tree models, we explore the impact on post-default repayment of different factors. Next, a split-sample time series prediction is carried out with two neural network algorithms, BPNN and ELM. The outcomes indicate that, ELM yields a significantly superior prediction performance compared to the BPNN model. Thirdly, the predictive performances of ten machine learning models are compared using full-sample data. The findings demonstrate that XGBoost and ELM models have superior predictive performances in full-sample analyses. Fourthly, this study employs the EMD data decomposition technique to examine the predictive ability of the XGBoost and ELM models in various frequency data. The results indicate that the predictive efficacy may differ depending on the frequency and repayment period after default. The findings are valuable for commercial banks in developing a framework and selecting a methodology to address the challenge of predicting credit card default payments.
In order to be able to classify financial chart patterns through machine learning, we introduced and applied a novel classification algorithm on time series data of different financial assets through SAX (Symbolic Aggregate approXimation), a transformation algorithm. After applying a linear regression model on the features of a dataset to reduce the number of parameters needed, converting real valued data to strings of characters through Piecewise Aggregate Approximation (PAA) and labelling each level increasingly with Latin alphabets characters, the new algorithm called CPC-SAX (Chart Pattern Classification) compares vectors describing the ASCII value changes along the string and classifies them using already labelled SAX-transformed data. The results show satisfying accuracy scores on data of different time windows and types of assets. We also obtain information on the appearance of said patterns. By reaching our goal of properly classifying chart patterns as they appear, we can have a better indication of the future price trend, allowing the investor/trader to make better informed decisions.
In this work, we examine the consequences of trading a large position in vanilla European options within a multi-period binomial model framework for the underlying asset price, S. Given the significant size of the transaction, we expect both the derivative's price and the underlying asset's price to be affected by market impacts. Consequently, derivative valuation should incorporate these effects. To address this, we not only utilize a multi-period binomial model to represent the price process S but also incorporate trading impacts in a multiplicative manner.
Moreover, we conduct our analysis in discrete time to better capture the influence of price impacts. Our findings suggest, for instance, that the strike price should be determined by both the trade's magnitude and parameterized market impacts. We present explicit formulas for European option prices under market impacts and offer numerical examples to elucidate our findings. Upon request, we can provide code implemented in the statistical package R.
Time requirements of data collection account for a significant portion of the total time required to provide financial advice. This research applies data collection software to the financial planning process seeking to identify benefits that may assist to reduce rising barriers of accessing financial advice. Experimental two-phase study seeks qualitative input surrounding problematic themes before quantitative input records impacts of data collection software use. The research seeks to evidence beneficial impacts that software use may have on the data collection requirements by way of comparison between traditional and software methodologies in Australian professional practice. Respondents were asked to complete data collection inputs using both traditional and digital methods with metrics recorded throughout the process. Input from 112 consumers and 71 practising advisers were recorded. Results suggest the use of software may decrease time taken to complete task and often results in higher levels of data accuracy. Traditional methods were affiliated with extended time periods and lower levels of data accuracy. Results aim to evolve methods of traditional practise within the financial sector. The research provides original contributions to financial planning literature by examining the potential impact data collection methodologies may have on reducing barriers to accessing financial services in Australia.
This paper describes an approach to simultaneously identify clusters and estimate cluster-specific regression parameters from the given data. Such an approach can be useful in learning the relationship between input and output when the regression parameters for estimating output are different in different regions of the input space. Variational Inference (VI), a machine learning approach to obtain posterior probability densities using optimization techniques, is used to identify clusters of explanatory variables and regression parameters for each cluster. From these results, one can obtain both the expected value and the full distribution of predicted output. Other advantages of the proposed approach include the elegant theoretical solution and clear interpretability of results. The proposed approach is well-suited for financial forecasting where markets have different regimes (or clusters) with different patterns and correlations of market changes in each regime. In financial applications, knowledge about such clusters can provide useful insights about portfolio performance and identify the relative importance of variables in different market regimes. An illustrative example of predicting one-day S&P change is considered to illustrate the approach and compare the performance of the proposed approach with standard regression without clusters. Due to the broad applicability of the problem, its elegant theoretical solution, and the computational efficiency of the proposed algorithm, the approach may be useful in a number of areas extending beyond the financial domain.
Inspired by recent advances in the deep learning literature, this article introduces a novel hybrid anomaly detection framework specifically designed for limit order book (LOB) data. A modified Transformer autoencoder architecture is proposed to learn rich temporal LOB subsequence representations, which eases the separability of normal and fraudulent time series. A dissimilarity function is then learned in the representation space to characterize normal LOB behavior, enabling the detection of any anomalous subsequences out-of-sample. We also develop a complete trade-based manipulation simulation methodology able to generate a variety of scenarios derived from actual trade–based fraud cases. The complete framework is tested on LOB data of five NASDAQ stocks in which we randomly insert synthetic quote stuffing, layering, and pump-and-dump manipulations. We show that the proposed asset-independent approach achieves new state-of-the-art fraud detection performance, without requiring any prior knowledge of manipulation patterns.
The research explores relationship dynamics between process and profit in Australian professional practise. We analyse data collected from 134 financial planning firms located in Southeast Queensland as a sample size. The research introduces a complete financial planning process framework designed to measure the impact that process may have on the relationship with firm profit. Quantitative profit data was recorded using Dovetail software to capture results and evidence regression between groups. The research found that firms’ processes are positively associated with profit, and both process and profit contribute to the decreasing influence of firm agency theory. The research suggests that process could be leveraged as an asset to develop commercial advantages. The research may help identify new measures of standard practise, develop the perception of Australian financial firms and assist to reduce barriers of accessing financial services.
Testing theories and explaining phenomena in empirical finance often requires estimating causal effects from observational data. In this note, we argue that some of the standard practices to address endogeneity concerns in regression-based estimation approaches can, when not correctly implemented and their results not appropriately interpreted, generate additional, often overlooked, problems. We identify three main systemic issues in empirical finance, provide theoretical and numerical examples to illustrate and support our arguments, and propose solutions to overcome these limitations. Overall, we suggest that these issues are caused by a systematic underestimation of the importance of robust ex-ante identification, and interpretation, of causal structures in empirical studies in finance.