In this article, the author is concerned with the problem of efficiently trading a large position in the marketplace when the stock price dynamic follows a regime-switching process. If the execution of a large order is not done properly, this will certainly lead to large losses. Given that the execution of a large position may take several trading days, it is therefore reasonable to assume that the market microstructure may change during the execution of the order. To address this possibility, the author assumes that the stock price follows a regime-switching model. This article is particularly interested in trading algorithms that track market benchmarks such as the volume-weighted average price (VWAP) and the minimum execution shortfall. The author proposes trading algorithms that break the execution order into small pieces and execute them over a predetermined period of time so as to minimize the overall execution shortfall or exceed the overall market VWAP. The underlying problem is formulated as a discrete-time stochastic optimal control problem with resource constraints. The value function and optimal trading strategies are derived in closed form. Numerical simulations with market data are reported to illustrate the pertinence of the approach.
{"title":"Optimal Trading Algorithms under Regime Switching","authors":"M. Pemy","doi":"10.3905/jfds.2022.1.092","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.092","url":null,"abstract":"In this article, the author is concerned with the problem of efficiently trading a large position in the marketplace when the stock price dynamic follows a regime-switching process. If the execution of a large order is not done properly, this will certainly lead to large losses. Given that the execution of a large position may take several trading days, it is therefore reasonable to assume that the market microstructure may change during the execution of the order. To address this possibility, the author assumes that the stock price follows a regime-switching model. This article is particularly interested in trading algorithms that track market benchmarks such as the volume-weighted average price (VWAP) and the minimum execution shortfall. The author proposes trading algorithms that break the execution order into small pieces and execute them over a predetermined period of time so as to minimize the overall execution shortfall or exceed the overall market VWAP. The underlying problem is formulated as a discrete-time stochastic optimal control problem with resource constraints. The value function and optimal trading strategies are derived in closed form. Numerical simulations with market data are reported to illustrate the pertinence of the approach.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116552738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Bonne, A. Lo, Abilash Prabhakaran, K. W. Siah, Manish Singh, Xinxin Wang, Peter J Zangari, Howard Zhang
In this article, the authors develop a data-driven peer grouping system using artificial intelligence (AI) tools to capture market perception and, in turn, group companies into clusters at various levels of granularity. In addition, they develop a continuous measure of similarity between companies; they use this measure to group companies into clusters and construct hedged portfolios. In the peer groupings, companies grouped in the same clusters had strong homogeneous risk and return profiles, whereas different clusters of companies had diverse, varying risk exposures. The authors extensively evaluated the clusters and found that companies grouped by their method had higher out-of-sample return correlation but lower stability and interpretability than companies grouped by a standard industry classification system. The authors also develop an interactive visualization system for identifying AI-based clusters and similar companies.
{"title":"An Artificial Intelligence-Based Industry Peer Grouping System","authors":"George Bonne, A. Lo, Abilash Prabhakaran, K. W. Siah, Manish Singh, Xinxin Wang, Peter J Zangari, Howard Zhang","doi":"10.3905/jfds.2022.1.090","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.090","url":null,"abstract":"In this article, the authors develop a data-driven peer grouping system using artificial intelligence (AI) tools to capture market perception and, in turn, group companies into clusters at various levels of granularity. In addition, they develop a continuous measure of similarity between companies; they use this measure to group companies into clusters and construct hedged portfolios. In the peer groupings, companies grouped in the same clusters had strong homogeneous risk and return profiles, whereas different clusters of companies had diverse, varying risk exposures. The authors extensively evaluated the clusters and found that companies grouped by their method had higher out-of-sample return correlation but lower stability and interpretability than companies grouped by a standard industry classification system. The authors also develop an interactive visualization system for identifying AI-based clusters and similar companies.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126376634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model interpretability is important in the banking industry for three reasons: certain US regulations require creditors to provide consumers with the reasons for taking adverse action (reason codes) on their credit applications; model users want to understand the reasoning behind model predictions; and identification of bias and reinforcement of stakeholders’ trust in the model. In this article, the authors compare the interpretability of an XGBoost versus a logistic model in predicting the probability of default for a credit card customer. They conclude that (1) the reason codes of an XGBoost model and a comparable logistic model are similar, (2) reason codes generated by XGBoost are more trustworthy from the customer’s perspective, and (3) nonlinearity of XGBoost is unlikely to have a significant impact on reason code(s).
{"title":"Interpretability of Machine Learning versus Statistical Credit Risk Models","authors":"Anand K. Ramteke, Pavan Wadhwa, Monica Yan","doi":"10.3905/jfds.2022.1.089","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.089","url":null,"abstract":"Model interpretability is important in the banking industry for three reasons: certain US regulations require creditors to provide consumers with the reasons for taking adverse action (reason codes) on their credit applications; model users want to understand the reasoning behind model predictions; and identification of bias and reinforcement of stakeholders’ trust in the model. In this article, the authors compare the interpretability of an XGBoost versus a logistic model in predicting the probability of default for a credit card customer. They conclude that (1) the reason codes of an XGBoost model and a comparable logistic model are similar, (2) reason codes generated by XGBoost are more trustworthy from the customer’s perspective, and (3) nonlinearity of XGBoost is unlikely to have a significant impact on reason code(s).","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"475 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133436900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent natural language processing (NLP) breakthroughs have proven effective for addressing many language-directed tasks, such as completing sentences and addressing search queries. This technology has been successfully implemented by tech firms including Google and others. An important element consists of language embeddings linked to pretraining systems. This article describes NLP concepts and their application to portfolio models via a modern version of sentiment analysis. The authors demonstrate the advantages of employing information from Twitter along with the NLP for constructing a portfolio of stocks, especially during unusual events such as the COVID-19 pandemic.
{"title":"Improving Portfolio Performance via Natural Language Processing Methods","authors":"DiJia Su, J. Mulvey, H. Poor","doi":"10.3905/jfds.2022.1.088","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.088","url":null,"abstract":"Recent natural language processing (NLP) breakthroughs have proven effective for addressing many language-directed tasks, such as completing sentences and addressing search queries. This technology has been successfully implemented by tech firms including Google and others. An important element consists of language embeddings linked to pretraining systems. This article describes NLP concepts and their application to portfolio models via a modern version of sentiment analysis. The authors demonstrate the advantages of employing information from Twitter along with the NLP for constructing a portfolio of stocks, especially during unusual events such as the COVID-19 pandemic.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116545684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carry, value, and momentum are the trinity of systematic investing. As signals, it is important to know what they signify and how to interpret the signals. What is the cost of delay? How does their effectiveness change as a function of the holding period? The authors illustrate how these signals can differ in terms of their informational content and persistence. They also show how classification trees can be used to combine these signals to get the most meaning out of them.
{"title":"Harvesting Multi-Asset Carry, Value, and Momentum: Work Smarter, Not Harder","authors":"Brian Jacobsen, Matthias Scheiber","doi":"10.3905/jfds.2022.1.087","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.087","url":null,"abstract":"Carry, value, and momentum are the trinity of systematic investing. As signals, it is important to know what they signify and how to interpret the signals. What is the cost of delay? How does their effectiveness change as a function of the holding period? The authors illustrate how these signals can differ in terms of their informational content and persistence. They also show how classification trees can be used to combine these signals to get the most meaning out of them.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-31DOI: 10.3905/jfds.2022.4.1.076
Boyu Wu, Kevin J. DiCiurcio, Beatrice Yeo, Qian Wang
The stock–bond correlation is a cornerstone of every asset allocation decision, but estimating it reliably can prove to be challenging given the potential for co-movements to fluctuate significantly based on economic conditions. Using supervised machine learning techniques, this article presents a new approach for identifying key determinants of the correlation between US equity and bond returns, ultimately finding that inflation, alongside real yields, equity volatility, economic growth, and inflation uncertainty, predict changes in correlation dynamics overtime. Relative to the existing literature, the authors’ approach allows for the systematic detection of the main drivers of stock–bond correlation and uncovers the time variation in importance of each determinant across economic regimes. Upon conducting an out-of-sample portfolio evaluation, the authors show that the five factors with gradient boosting regression approach outperforms all other existing factor-based models in estimating both the trend and level of correlation, thereby offering an alternative robust solution for forecasting time-varying equity and bond co-movements that can be further applied to asset allocation decisions and risk management.
{"title":"Forecasting US Equity and Bond Correlation—A Machine Learning Approach","authors":"Boyu Wu, Kevin J. DiCiurcio, Beatrice Yeo, Qian Wang","doi":"10.3905/jfds.2022.4.1.076","DOIUrl":"https://doi.org/10.3905/jfds.2022.4.1.076","url":null,"abstract":"The stock–bond correlation is a cornerstone of every asset allocation decision, but estimating it reliably can prove to be challenging given the potential for co-movements to fluctuate significantly based on economic conditions. Using supervised machine learning techniques, this article presents a new approach for identifying key determinants of the correlation between US equity and bond returns, ultimately finding that inflation, alongside real yields, equity volatility, economic growth, and inflation uncertainty, predict changes in correlation dynamics overtime. Relative to the existing literature, the authors’ approach allows for the systematic detection of the main drivers of stock–bond correlation and uncovers the time variation in importance of each determinant across economic regimes. Upon conducting an out-of-sample portfolio evaluation, the authors show that the five factors with gradient boosting regression approach outperforms all other existing factor-based models in estimating both the trend and level of correlation, thereby offering an alternative robust solution for forecasting time-varying equity and bond co-movements that can be further applied to asset allocation decisions and risk management.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"380 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, the author addresses Eugene Fama’s skepticism regarding the predictability of stock market bubbles. To do so, he applies two ensemble learning methods, the random cut forest and random forest algorithms, to build a model that predicts large near-term drawdowns based on patterns in stock price behavior. The model includes three predictive variables. The first factor is an anomaly score produced by random cut forest, an algorithm specifically designed to detect outliers in streaming data. The second and third factors are the standard deviation of price returns and the return convexity over specified time horizons, with return convexity defined as the difference between one-year price returns and six-month price returns. The author’s predictions are based on random forest regressions. He applies the model to a large universe of equity sectors and factors. Blocked time-series cross-validation is used to evaluate the predictive efficacy of the model. The author shows that across the sectors and factors considered, the model presented produces predictive scores that are strongly positive. Although bubble prediction is surely a multidimensional endeavor that requires input from a variety of tools and sources, the author demonstrates that a framework built upon ensemble methods can be informationally additive to the detection of bubblelike behavior across a wide array of stocks.
{"title":"Forests for Fama","authors":"Joseph Simonian","doi":"10.3905/jfds.2021.1.086","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.086","url":null,"abstract":"In this article, the author addresses Eugene Fama’s skepticism regarding the predictability of stock market bubbles. To do so, he applies two ensemble learning methods, the random cut forest and random forest algorithms, to build a model that predicts large near-term drawdowns based on patterns in stock price behavior. The model includes three predictive variables. The first factor is an anomaly score produced by random cut forest, an algorithm specifically designed to detect outliers in streaming data. The second and third factors are the standard deviation of price returns and the return convexity over specified time horizons, with return convexity defined as the difference between one-year price returns and six-month price returns. The author’s predictions are based on random forest regressions. He applies the model to a large universe of equity sectors and factors. Blocked time-series cross-validation is used to evaluate the predictive efficacy of the model. The author shows that across the sectors and factors considered, the model presented produces predictive scores that are strongly positive. Although bubble prediction is surely a multidimensional endeavor that requires input from a variety of tools and sources, the author demonstrates that a framework built upon ensemble methods can be informationally additive to the detection of bubblelike behavior across a wide array of stocks.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125637209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arik Ben Dor, Jingling Guan, Adam Kelleher, Adam M. Lauretig, Ryan Preclaw, Xiaming Zeng
The emergence of environmental, social, and governance (ESG) investing resulted in a flurry of studies examining the effects of incorporating ESG considerations on portfolio performance. Limited attention, however, was given to analyzing corporate activities related to ESG and sustainability. The authors employ a novel dataset of over 200 million job postings by US firms since 2014 and use natural language processing to identify ESG-related openings and assess companies’ planned ESG activities. Using the job posting data allows one to learn about and monitor planned sustainability-related corporate activities based on firms’ actions, rather than relying solely on their announcements (i.e., what firms do as opposed to what firms say they do). The authors find that ESG job posting data can serve as a leading indicator of future changes in firms’ ESG ratings. Firms with higher abnormal ESG hiring posting intensity were more likely to experience subsequent rating improvements and enjoyed better stock performance 2–3 years following the posting date.
{"title":"ESG and Alternative Data: Capturing Corporates’ Sustainability-Related Activities with Job Postings","authors":"Arik Ben Dor, Jingling Guan, Adam Kelleher, Adam M. Lauretig, Ryan Preclaw, Xiaming Zeng","doi":"10.3905/jfds.2021.1.082","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.082","url":null,"abstract":"The emergence of environmental, social, and governance (ESG) investing resulted in a flurry of studies examining the effects of incorporating ESG considerations on portfolio performance. Limited attention, however, was given to analyzing corporate activities related to ESG and sustainability. The authors employ a novel dataset of over 200 million job postings by US firms since 2014 and use natural language processing to identify ESG-related openings and assess companies’ planned ESG activities. Using the job posting data allows one to learn about and monitor planned sustainability-related corporate activities based on firms’ actions, rather than relying solely on their announcements (i.e., what firms do as opposed to what firms say they do). The authors find that ESG job posting data can serve as a leading indicator of future changes in firms’ ESG ratings. Firms with higher abnormal ESG hiring posting intensity were more likely to experience subsequent rating improvements and enjoyed better stock performance 2–3 years following the posting date.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131059521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Babak Mahdavi-Damghani, Robert Fraser, James Howell, Jon Sveinbjorn Halldorsson
Although they are presented as a hypothesis, the authors discuss the historical events that have led to the rise of cryptocurrencies as a legitimate new asset class. They also discuss issues around cryptocurrency fundamentals as a means to explain the lack of sectors that exists for other asset classes such as equities or commodities. To address this issue, they propose a new methodology based on a hybrid approach between k-means and hierarchical clustering with alternative data gathered from web-scraping. The authors then reintroduce a couple of mathematical models, namely risk parity and momentum. Finally, they test their geopolitical hypothesis through a long-only strategy using risk parity and test their abstract sectorization through a long–short strategy.
{"title":"Cryptocurrency Sectorization through Clustering and Web-Scraping: Application to Systematic Trading","authors":"Babak Mahdavi-Damghani, Robert Fraser, James Howell, Jon Sveinbjorn Halldorsson","doi":"10.3905/jfds.2021.1.080","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.080","url":null,"abstract":"Although they are presented as a hypothesis, the authors discuss the historical events that have led to the rise of cryptocurrencies as a legitimate new asset class. They also discuss issues around cryptocurrency fundamentals as a means to explain the lack of sectors that exists for other asset classes such as equities or commodities. To address this issue, they propose a new methodology based on a hybrid approach between k-means and hierarchical clustering with alternative data gathered from web-scraping. The authors then reintroduce a couple of mathematical models, namely risk parity and momentum. Finally, they test their geopolitical hypothesis through a long-only strategy using risk parity and test their abstract sectorization through a long–short strategy.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121251695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}