{"title":"Unsupervised generation of tradable topic indices through textual analysis","authors":"Marcel Lee , Alan Spark","doi":"10.1016/j.jfds.2025.100149","DOIUrl":null,"url":null,"abstract":"<div><div>Stock returns are moved by many risk factors. Thematic stock indices try to represent these factors, but are limited by the fact that risk factors are not directly observable. This paper introduces a method to uncover hidden risk factors through text analysis. It applies the dynamic variant of the <em>Latent Dirichlet Allocation</em> (LDA) model to annual and quarterly reports to find a topic distribution for each stock. This is then interpreted as the risk factor partition and transformed into a standard normal basis which corresponds to pure risk factors. The weights indicate the proportions necessary to combine the equities into tradable topic indices. The need for human intervention is minimized by determining the optimal parameters automatically.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100149"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Finance and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405918825000017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
Stock returns are moved by many risk factors. Thematic stock indices try to represent these factors, but are limited by the fact that risk factors are not directly observable. This paper introduces a method to uncover hidden risk factors through text analysis. It applies the dynamic variant of the Latent Dirichlet Allocation (LDA) model to annual and quarterly reports to find a topic distribution for each stock. This is then interpreted as the risk factor partition and transformed into a standard normal basis which corresponds to pure risk factors. The weights indicate the proportions necessary to combine the equities into tradable topic indices. The need for human intervention is minimized by determining the optimal parameters automatically.