Pascal Blanqué, M. Slimane, Amina Cherief, Théo Le Guenedal, Takaya Sekine, Lauren Stagnol
In this article, the authors show that variables from the Global Database of Events, Language, and Tone convey significant information that can improve on a purely macroeconomic approach when modeling the US equity market. Based on these metrics, the authors construct time series that represent and measure how some narratives that appear to be battling each other are changing in the current market environment. Specifically, the authors appraise the strength of the roaring 20s, back to the 70s, secular stagnation, and monetary economic narratives, but they also add topical societal narratives related to environmental or social aspects and a geopolitical risk narrative. The authors formalize an information content framework and show that including quantitative signals that translate into qualitative stories brings added value when determining the stock market’s movement. Indeed, in addition to having higher explanatory power from their underlying variables, narratives can improve the diversification of standard macroeconomic models. As such, the authors’ results advocate a close monitoring of narratives in financial markets.
{"title":"The Benefit of Narratives for Prediction of the S&P 500 Index","authors":"Pascal Blanqué, M. Slimane, Amina Cherief, Théo Le Guenedal, Takaya Sekine, Lauren Stagnol","doi":"10.3905/jfds.2022.1.107","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.107","url":null,"abstract":"In this article, the authors show that variables from the Global Database of Events, Language, and Tone convey significant information that can improve on a purely macroeconomic approach when modeling the US equity market. Based on these metrics, the authors construct time series that represent and measure how some narratives that appear to be battling each other are changing in the current market environment. Specifically, the authors appraise the strength of the roaring 20s, back to the 70s, secular stagnation, and monetary economic narratives, but they also add topical societal narratives related to environmental or social aspects and a geopolitical risk narrative. The authors formalize an information content framework and show that including quantitative signals that translate into qualitative stories brings added value when determining the stock market’s movement. Indeed, in addition to having higher explanatory power from their underlying variables, narratives can improve the diversification of standard macroeconomic models. As such, the authors’ results advocate a close monitoring of narratives in financial markets.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130709564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, the authors study the utility of deep-learning approaches in statistical arbitrage under the generalized pairs-trading paradigm. Stock returns are regressed on a set of risk factors derived using principal component analysis, and the long short-term memory (LSTM) structure is employed to forecast directions of idiosyncratic residuals. Daily market-neutral trades are constructed based on the predicted signals. The authors compare their results with the influential relative value (RV) model by Avellaneda and Lee (2010) on the universe of S&P 500 Index (S&P 500) stocks. Model evaluations are performed on two distinct periods (2001–2007 and 2015–2021) to alleviate the survivorship bias resulting from the S&P 500 composition changes over time and to study the robustness of these two models in two distinct eras. Their findings suggest that the LSTM model consistently and significantly outperforms the RV model across the two periods when transaction costs are accounted for. However, in the transaction cost–free world, the outperformance is modest even though it is still consistent.
{"title":"Deep Learning Meets Statistical Arbitrage: An Application of Long Short-Term Memory Networks to Algorithmic Trading","authors":"Yijun Zhao, Sheng Xu, Jacek Ossowski","doi":"10.3905/jfds.2022.1.103","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.103","url":null,"abstract":"In this article, the authors study the utility of deep-learning approaches in statistical arbitrage under the generalized pairs-trading paradigm. Stock returns are regressed on a set of risk factors derived using principal component analysis, and the long short-term memory (LSTM) structure is employed to forecast directions of idiosyncratic residuals. Daily market-neutral trades are constructed based on the predicted signals. The authors compare their results with the influential relative value (RV) model by Avellaneda and Lee (2010) on the universe of S&P 500 Index (S&P 500) stocks. Model evaluations are performed on two distinct periods (2001–2007 and 2015–2021) to alleviate the survivorship bias resulting from the S&P 500 composition changes over time and to study the robustness of these two models in two distinct eras. Their findings suggest that the LSTM model consistently and significantly outperforms the RV model across the two periods when transaction costs are accounted for. However, in the transaction cost–free world, the outperformance is modest even though it is still consistent.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124026759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With a total outstanding balance of more than $8 trillion as of this writing, agency mortgage-backed securities (MBS) represent the second largest segment of the US bond market and the second most liquid fixed-income market after US Treasuries. Institutional investors have long participated in this market to take advantage of its attractive spread over US Treasuries, low credit risk, low transaction cost, and the ability to transact large quantities with ease. MBS are made of individual mortgages extended to US homeowners. The ability for a homeowner to refinance at any point introduces complexity in prepayment analysis and investing in the MBS sector. Traditional prepayment modeling has been able to capture many of the relationships between prepayments and related factors such as the level of interest rates and the value of the embedded prepayment option, yet the manual nature of variable construction and sheer amount of available data make it difficult to capture the dynamics of extremely complex systems. The long history and large amount of data available in MBS make it a prime candidate to leverage machine learning (ML) algorithms to better explain complex relationships between various macro- and microeconomic factors and MBS prepayments. The authors propose a systematic investment strategy using an ML-based mortgage prepayment model approach combined with a coupon allocation optimization model to create an optimal portfolio to capture alpha vs. a benchmark.
{"title":"Machine Learning–Based Systematic Investing in Agency Mortgage-Backed Securities","authors":"Nikhil Arvind Jagannathan, Qiulei (Leo) Bao","doi":"10.3905/jfds.2022.1.102","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.102","url":null,"abstract":"With a total outstanding balance of more than $8 trillion as of this writing, agency mortgage-backed securities (MBS) represent the second largest segment of the US bond market and the second most liquid fixed-income market after US Treasuries. Institutional investors have long participated in this market to take advantage of its attractive spread over US Treasuries, low credit risk, low transaction cost, and the ability to transact large quantities with ease. MBS are made of individual mortgages extended to US homeowners. The ability for a homeowner to refinance at any point introduces complexity in prepayment analysis and investing in the MBS sector. Traditional prepayment modeling has been able to capture many of the relationships between prepayments and related factors such as the level of interest rates and the value of the embedded prepayment option, yet the manual nature of variable construction and sheer amount of available data make it difficult to capture the dynamics of extremely complex systems. The long history and large amount of data available in MBS make it a prime candidate to leverage machine learning (ML) algorithms to better explain complex relationships between various macro- and microeconomic factors and MBS prepayments. The authors propose a systematic investment strategy using an ML-based mortgage prepayment model approach combined with a coupon allocation optimization model to create an optimal portfolio to capture alpha vs. a benchmark.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the most exciting recent developments in financial research is the availability of new administrative, private sector, and micro-level datasets that did not exist a few years ago. The unstructured nature of many of these observations, along with the complexity of the phenomena they measure, means that many of these datasets are beyond the grasp of econometric analysis. Machine learning (ML) techniques offer the numerical power and functional flexibility needed to identify complex patterns in a high-dimensional space. ML is often perceived as a black box, however, in contrast to the transparency of econometric approaches. In this article, the author demonstrates that each analytical step of the econometric process has a homologous step in ML analyses. By clearly stating this correspondence, the author’s goal is to facilitate and reconcile the adoption of ML techniques among econometricians.
{"title":"Machine Learning for Econometricians: The Readme Manual","authors":"Marcos Lopez de Prado","doi":"10.3905/jfds.2022.1.101","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.101","url":null,"abstract":"One of the most exciting recent developments in financial research is the availability of new administrative, private sector, and micro-level datasets that did not exist a few years ago. The unstructured nature of many of these observations, along with the complexity of the phenomena they measure, means that many of these datasets are beyond the grasp of econometric analysis. Machine learning (ML) techniques offer the numerical power and functional flexibility needed to identify complex patterns in a high-dimensional space. ML is often perceived as a black box, however, in contrast to the transparency of econometric approaches. In this article, the author demonstrates that each analytical step of the econometric process has a homologous step in ML analyses. By clearly stating this correspondence, the author’s goal is to facilitate and reconcile the adoption of ML techniques among econometricians.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123779303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article endeavors to investigate the application of machine learning in behavioral economics and behavioral finance to represent a profile of studies conducted in this field. To accomplish this task, 90 scientific studies were systematically extracted between 2000 and June 1, 2020. Utilizing the text analysis techniques and related statistical methods, the abstracts of the extracted studies were reviewed and analyzed. First, it was found that attention to this field has developed in recent years with an accelerating trend. Second, it was demonstrated that specialized journals have also bestowed more curiosity in these studies than in the past by publishing more relevant studies. Third, results revealed that machine learning has been applied in areas such as investor sentiment, decision making, consumer behavior, trading strategies, game theory, and other areas in the field of behavioral economics and behavioral finance. In this regard, the application of machine learning has included techniques such as support vector machine, regression, neural networks, random forest, and so on. Despite the expanding consideration adjusted to this field by researchers and specialized journals, there are still many research gaps in this field. Accordingly, there is a relatively significant distance until fully unleashing the superior powers of machine learning, like prediction and classification in behavioral economics and behavioral finance. Finally, this research completed its mission by suggesting implications for the future of this field based on the acquired outcomes.
{"title":"Machine Learning in Behavioral Finance: A Systematic Literature Review","authors":"S. N. Hojaji, M. Yahyazadehfar, B. Abedin","doi":"10.3905/jfds.2022.1.100","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.100","url":null,"abstract":"This article endeavors to investigate the application of machine learning in behavioral economics and behavioral finance to represent a profile of studies conducted in this field. To accomplish this task, 90 scientific studies were systematically extracted between 2000 and June 1, 2020. Utilizing the text analysis techniques and related statistical methods, the abstracts of the extracted studies were reviewed and analyzed. First, it was found that attention to this field has developed in recent years with an accelerating trend. Second, it was demonstrated that specialized journals have also bestowed more curiosity in these studies than in the past by publishing more relevant studies. Third, results revealed that machine learning has been applied in areas such as investor sentiment, decision making, consumer behavior, trading strategies, game theory, and other areas in the field of behavioral economics and behavioral finance. In this regard, the application of machine learning has included techniques such as support vector machine, regression, neural networks, random forest, and so on. Despite the expanding consideration adjusted to this field by researchers and specialized journals, there are still many research gaps in this field. Accordingly, there is a relatively significant distance until fully unleashing the superior powers of machine learning, like prediction and classification in behavioral economics and behavioral finance. Finally, this research completed its mission by suggesting implications for the future of this field based on the acquired outcomes.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125533570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meta-labeling is a machine learning (ML) layer that sits on top of a base primary strategy to help size positions, filter out false-positive signals, and improve metrics such as the Sharpe ratio and maximum drawdown. This article consolidates the knowledge of several publications into a single work, providing practitioners with a clear framework to support the application of meta-labeling to investment strategies. The relationships between binary classification metrics and strategy performance are explained, alongside answers to many frequently asked questions regarding the technique. The author also deconstructs meta-labeling into three components, using a controlled experiment to show how each component helps to improve strategy metrics and what types of features should be considered in the model specification phase.
{"title":"Meta-Labeling: Theory and Framework","authors":"J. Joubert","doi":"10.3905/jfds.2022.1.098","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.098","url":null,"abstract":"Meta-labeling is a machine learning (ML) layer that sits on top of a base primary strategy to help size positions, filter out false-positive signals, and improve metrics such as the Sharpe ratio and maximum drawdown. This article consolidates the knowledge of several publications into a single work, providing practitioners with a clear framework to support the application of meta-labeling to investment strategies. The relationships between binary classification metrics and strategy performance are explained, alongside answers to many frequently asked questions regarding the technique. The author also deconstructs meta-labeling into three components, using a controlled experiment to show how each component helps to improve strategy metrics and what types of features should be considered in the model specification phase.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115247953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, the authors study the performance of the Black–Litterman model (BLM) and compare it to the traditional mean–variance theory (MVT) of Markowitz (1952) and Sharpe (1964). They begin with the standard Bayesian learning on which the BLM is based (but the existing literature does not follow). Then, they perform a series of tests of the BLM using machine learning tools and view specifications consistent with the existing literature. Their empirical evidence (which uses 30 years of monthly data from January 1991 till December 2020) suggests that the BLM is highly sensitive to the specification of the view. Given that the view is arbitrary (even though in our article, they are rule based), it is quite a challenge to use the BLM in an actual situation. A great amount of caution must be exercised in specifying the view and its corresponding required return. This validates the previous result that BLM specification of views is very important and there is no consistent manner how one can specify a winning portfolio.
{"title":"On the Black–Litterman Model: Learning to Do Better","authors":"Ren‐Raw Chen, S. Yeh, Xiaohu Zhang","doi":"10.3905/jfds.2022.1.096","DOIUrl":"https://doi.org/10.3905/jfds.2022.1.096","url":null,"abstract":"In this article, the authors study the performance of the Black–Litterman model (BLM) and compare it to the traditional mean–variance theory (MVT) of Markowitz (1952) and Sharpe (1964). They begin with the standard Bayesian learning on which the BLM is based (but the existing literature does not follow). Then, they perform a series of tests of the BLM using machine learning tools and view specifications consistent with the existing literature. Their empirical evidence (which uses 30 years of monthly data from January 1991 till December 2020) suggests that the BLM is highly sensitive to the specification of the view. Given that the view is arbitrary (even though in our article, they are rule based), it is quite a challenge to use the BLM in an actual situation. A great amount of caution must be exercised in specifying the view and its corresponding required return. This validates the previous result that BLM specification of views is very important and there is no consistent manner how one can specify a winning portfolio.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-30DOI: 10.3905/jfds.2022.4.2.139
F. Tan, D. Mehta
For countries such as the United States, which lacks a universal health care system, future health care costs can create significant uncertainty that a retirement investment strategy must be built to manage. One of the most important factors determining health care costs is the individual’s health status. Hence, categorizing individuals into meaningful health risk types is an essential task. The conventional approach is to use individuals’ self-rated health state categorization. In this work, the authors provide an objective and data-driven machine learning (ML)–based approach to categorize heath state risk by using the most widely used US household surveys on older Americans, the Health and Retirement Study (HRS). The authors propose an approach of employing the K-modes clustering method to algorithmically cluster on an exhaustive list of categorical health-related variables in the HRS. The resulting clusters are shown to provide an objective, interpretable, and practical health state risk categorization. The authors then compare and contrast the ML-based and self-rated health state categorizations and discuss the implications of the differences. They also illustrate the difficulty in predicting out-of-pocket costs based on self-rated health status and how ML-based categorizations can generate more-accurate health care cost estimates for personalized retirement planning. The results in this article open different avenues of research, including behavioral science analysis for health and retirement study.
{"title":"Health State Risk Categorization: A Machine Learning Clustering Approach Using Health and Retirement Study Data","authors":"F. Tan, D. Mehta","doi":"10.3905/jfds.2022.4.2.139","DOIUrl":"https://doi.org/10.3905/jfds.2022.4.2.139","url":null,"abstract":"For countries such as the United States, which lacks a universal health care system, future health care costs can create significant uncertainty that a retirement investment strategy must be built to manage. One of the most important factors determining health care costs is the individual’s health status. Hence, categorizing individuals into meaningful health risk types is an essential task. The conventional approach is to use individuals’ self-rated health state categorization. In this work, the authors provide an objective and data-driven machine learning (ML)–based approach to categorize heath state risk by using the most widely used US household surveys on older Americans, the Health and Retirement Study (HRS). The authors propose an approach of employing the K-modes clustering method to algorithmically cluster on an exhaustive list of categorical health-related variables in the HRS. The resulting clusters are shown to provide an objective, interpretable, and practical health state risk categorization. The authors then compare and contrast the ML-based and self-rated health state categorizations and discuss the implications of the differences. They also illustrate the difficulty in predicting out-of-pocket costs based on self-rated health status and how ML-based categorizations can generate more-accurate health care cost estimates for personalized retirement planning. The results in this article open different avenues of research, including behavioral science analysis for health and retirement study.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125915027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-30DOI: 10.3905/jfds.2022.4.2.001
F. Fabozzi
Cathy Scott General Manager and Publisher The lead article in this issue is by the co-editor of this journal, Marcos López de Prado, “Machine Learning for Econometricians: The Readme Manual.” As he notes, econometric tools are typically applied in investment research despite the fact that they are poorly suited for uncovering statistical patterns in financial data. This is because of the unstructured nature of financial datasets, as well as the complex relationships involved in financial markets. Researchers and analysts working for asset managers overlook these limitations as they take the view that econometric approaches are more appropriate than machine learning methods. One of their objections to using machine learning is that their tools are not transparent (i.e., it is a black box approach to problem solving). López de Prado demonstrates why it is not the case that machine learning is a black box. For each analytical step of the econometric process, he identifies a corresponding step in machine learning analysis. By clearly stating this correspondence, López de Prado has facilitated and reconciled the adoption of machine techniques among econometricians, offering a bridge from classical statistics to machine learning. The process of meta-labeling, introduced by López de Prado, is used as the machine learning layer of an investment strategy that can determine the size of positions, filter out false-positive signals from backtests, and improve performance metrics. In “Meta-Labeling: Theory and Framework,” Jacques Francois Joubert provides an overview of meta-labeling’s theoretical framework (including its architecture and applications). Then the author describes the methodology for three controlled experiments designed to break meta-labeling down into three components: information advantage, modeling for false positives, and position sizing. The three experiments validated that meta-labeling not only improves classification metrics but also significantly improves the performance of various types of primary investment strategies. Because of this attribute of meta-labeling, this article provides a good case study of how machine learning can be applied in financial markets. Studies have shown that security prices are driven by information beyond the financial information reported by companies in their filings with the Securities and Exchange Commission. This information includes news and investor-based sentiment. In “FinEAS: Financial Embedding Analysis of Sentiment,” a new language representation model for sentiment analysis of financial text called “financial embedding analysis of sentiment” (FinEAS) is introduced by Asier Gutiérrez-Fandiño, Petter N. Kolm, Miquel Noguer i Alonso, and Jordi Armengol-Estapé. Their approach is based on transformer language models that are explicitly developed for sentence-level analysis which builds on Sentence-BERT, a sentence-level extension of vanilla BERT. The authors argue that the new approach generates se
这期的主要文章是由本刊的联合编辑Marcos López de Prado撰写的“计量经济学家的机器学习:自述手册”。正如他所指出的,计量经济学工具通常应用于投资研究,尽管它们不太适合揭示金融数据中的统计模式。这是因为金融数据集的非结构化性质,以及金融市场中涉及的复杂关系。为资产管理公司工作的研究人员和分析师忽视了这些限制,因为他们认为计量经济学方法比机器学习方法更合适。他们反对使用机器学习的原因之一是他们的工具不透明(即,它是解决问题的黑箱方法)。López de Prado展示了为什么机器学习不是一个黑盒子。对于计量经济学过程的每个分析步骤,他确定了机器学习分析中的相应步骤。通过清楚地说明这种对应关系,López de Prado促进和协调了计量经济学家对机器技术的采用,提供了从经典统计学到机器学习的桥梁。由López de Prado引入的元标记过程被用作投资策略的机器学习层,可以确定头寸的大小,过滤回测中的假阳性信号,并提高绩效指标。在“元标签:理论和框架”一文中,Jacques Francois Joubert概述了元标签的理论框架(包括其架构和应用)。然后,作者描述了三个对照实验的方法,旨在将元标签分解为三个组成部分:信息优势、假阳性建模和位置大小。三个实验验证了元标记不仅提高了分类指标,而且显著提高了各类主要投资策略的绩效。由于元标签的这种属性,本文提供了一个很好的案例研究,说明机器学习如何应用于金融市场。研究表明,证券价格是由公司在提交给美国证券交易委员会(Securities and Exchange Commission)的文件中报告的财务信息以外的信息驱动的。这些信息包括新闻和投资者情绪。在“FinEAS:情感的金融嵌入分析”一文中,Asier Gutiérrez-Fandiño、Petter N. Kolm、Miquel Noguer i Alonso和Jordi armengol - estapeer提出了一种新的金融文本情感分析的语言表示模型“情感的金融嵌入分析”(FinEAS)。他们的方法基于为句子级分析而明确开发的转换语言模型,该模型建立在Sentence-BERT (vanilla BERT的句子级扩展)之上。作者认为,新方法生成的句子嵌入质量更高,可以显著提高句子/文档级别的任务,如金融情绪分析。使用来自RavenPack的大规模金融新闻数据集,作者证明,对于金融情绪分析,新模型优于几个最先进的模型。作者公开了模型代码。深度强化学习(DRL)已经引起了实践者的极大兴趣。然而,它的应用一直受到从业者需要的限制
{"title":"Managing Editor’s Letter","authors":"F. Fabozzi","doi":"10.3905/jfds.2022.4.2.001","DOIUrl":"https://doi.org/10.3905/jfds.2022.4.2.001","url":null,"abstract":"Cathy Scott General Manager and Publisher The lead article in this issue is by the co-editor of this journal, Marcos López de Prado, “Machine Learning for Econometricians: The Readme Manual.” As he notes, econometric tools are typically applied in investment research despite the fact that they are poorly suited for uncovering statistical patterns in financial data. This is because of the unstructured nature of financial datasets, as well as the complex relationships involved in financial markets. Researchers and analysts working for asset managers overlook these limitations as they take the view that econometric approaches are more appropriate than machine learning methods. One of their objections to using machine learning is that their tools are not transparent (i.e., it is a black box approach to problem solving). López de Prado demonstrates why it is not the case that machine learning is a black box. For each analytical step of the econometric process, he identifies a corresponding step in machine learning analysis. By clearly stating this correspondence, López de Prado has facilitated and reconciled the adoption of machine techniques among econometricians, offering a bridge from classical statistics to machine learning. The process of meta-labeling, introduced by López de Prado, is used as the machine learning layer of an investment strategy that can determine the size of positions, filter out false-positive signals from backtests, and improve performance metrics. In “Meta-Labeling: Theory and Framework,” Jacques Francois Joubert provides an overview of meta-labeling’s theoretical framework (including its architecture and applications). Then the author describes the methodology for three controlled experiments designed to break meta-labeling down into three components: information advantage, modeling for false positives, and position sizing. The three experiments validated that meta-labeling not only improves classification metrics but also significantly improves the performance of various types of primary investment strategies. Because of this attribute of meta-labeling, this article provides a good case study of how machine learning can be applied in financial markets. Studies have shown that security prices are driven by information beyond the financial information reported by companies in their filings with the Securities and Exchange Commission. This information includes news and investor-based sentiment. In “FinEAS: Financial Embedding Analysis of Sentiment,” a new language representation model for sentiment analysis of financial text called “financial embedding analysis of sentiment” (FinEAS) is introduced by Asier Gutiérrez-Fandiño, Petter N. Kolm, Miquel Noguer i Alonso, and Jordi Armengol-Estapé. Their approach is based on transformer language models that are explicitly developed for sentence-level analysis which builds on Sentence-BERT, a sentence-level extension of vanilla BERT. The authors argue that the new approach generates se","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126420722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}