Patrick Schulze, Simon Wiegrebe, Paul W. Thurner, Christian Heumann, Matthias Aßenmacher
{"title":"贝叶斯方法为主题-元数据关系建模","authors":"Patrick Schulze, Simon Wiegrebe, Paul W. Thurner, Christian Heumann, Matthias Aßenmacher","doi":"10.1007/s10182-023-00485-9","DOIUrl":null,"url":null,"abstract":"<div><p>The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the <i>method of composition</i>, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the <span>R</span> package <span>stm</span> by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the structural topic model to estimate topic proportions.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-023-00485-9.pdf","citationCount":"0","resultStr":"{\"title\":\"A Bayesian approach to modeling topic-metadata relationships\",\"authors\":\"Patrick Schulze, Simon Wiegrebe, Paul W. Thurner, Christian Heumann, Matthias Aßenmacher\",\"doi\":\"10.1007/s10182-023-00485-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the <i>method of composition</i>, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the <span>R</span> package <span>stm</span> by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the structural topic model to estimate topic proportions.</p></div>\",\"PeriodicalId\":55446,\"journal\":{\"name\":\"Asta-Advances in Statistical Analysis\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10182-023-00485-9.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asta-Advances in Statistical Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10182-023-00485-9\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asta-Advances in Statistical Analysis","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10182-023-00485-9","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
摘要
高级主题建模的目的不仅在于探索潜在的主题结构,还在于估计所发现的主题与理论上相关的元数据之间的关系。用于估算这种关系的方法必须考虑到拓扑结构不是直接观察到的,而是以无监督的方式估算出来的,通常是通过普通的主题模型。为实现这一目的,经常使用的程序是构成法,这是一种蒙特卡罗抽样技术,对元数据协变量的抽样主题比例进行多次重复线性回归。在本文中,我们对这种方法提出了两点修改建议:首先,我们用更合适的 Beta 回归取代了线性回归,从而大大改进了 R 软件包 stm 中现有的组成方法实现。其次,我们从根本上改进了整个估计框架,用完全的贝叶斯方法取代了目前的频繁法和贝叶斯方法的混合方法。这样就能更恰当地量化不确定性。我们通过调查德国议员的 Twitter 帖子与其选区相关的不同元数据协变量之间的关系来说明我们改进后的方法,并使用结构主题模型来估计主题比例。
A Bayesian approach to modeling topic-metadata relationships
The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the method of composition, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the R package stm by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the structural topic model to estimate topic proportions.
期刊介绍:
AStA - Advances in Statistical Analysis, a journal of the German Statistical Society, is published quarterly and presents original contributions on statistical methods and applications and review articles.