Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates.

IF 7.6 1区 心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2023-10-01 Epub Date: 2023-01-05 DOI:10.1037/met0000541
Kenneth Tyler Wilcox, Ross Jacobucci, Zhiyong Zhang, Brooke A Ammerman
{"title":"Supervised latent Dirichlet allocation with covariates: A Bayesian structural and measurement model of text and covariates.","authors":"Kenneth Tyler Wilcox,&nbsp;Ross Jacobucci,&nbsp;Zhiyong Zhang,&nbsp;Brooke A Ammerman","doi":"10.1037/met0000541","DOIUrl":null,"url":null,"abstract":"<p><p>Text is a burgeoning data source for psychological researchers, but little methodological research has focused on adapting popular modeling approaches for text to the context of psychological research. One popular measurement model for text, topic modeling, uses a latent mixture model to represent topics underlying a body of documents. Recently, psychologists have studied relationships between these topics and other psychological measures by using estimates of the topics as regression predictors along with other manifest variables. While similar two-stage approaches involving estimated latent variables are known to yield biased estimates and incorrect standard errors, two-stage topic modeling approaches have received limited statistical study and, as we show, are subject to the same problems. To address these problems, we proposed a novel statistical model-supervised latent Dirichlet allocation with covariates (SLDAX)-that jointly incorporates a latent variable measurement model of text and a structural regression model to allow the latent topics and other manifest variables to serve as predictors of an outcome. Using a simulation study with data characteristics consistent with psychological text data, we found that SLDAX estimates were generally more accurate and more efficient. To illustrate the application of SLDAX and a two-stage approach, we provide an empirical clinical application to compare the application of both the two-stage and SLDAX approaches. Finally, we implemented the SLDAX model in an open-source R package to facilitate its use and further study. (PsycInfo Database Record (c) 2023 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1178-1206"},"PeriodicalIF":7.6000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000541","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Text is a burgeoning data source for psychological researchers, but little methodological research has focused on adapting popular modeling approaches for text to the context of psychological research. One popular measurement model for text, topic modeling, uses a latent mixture model to represent topics underlying a body of documents. Recently, psychologists have studied relationships between these topics and other psychological measures by using estimates of the topics as regression predictors along with other manifest variables. While similar two-stage approaches involving estimated latent variables are known to yield biased estimates and incorrect standard errors, two-stage topic modeling approaches have received limited statistical study and, as we show, are subject to the same problems. To address these problems, we proposed a novel statistical model-supervised latent Dirichlet allocation with covariates (SLDAX)-that jointly incorporates a latent variable measurement model of text and a structural regression model to allow the latent topics and other manifest variables to serve as predictors of an outcome. Using a simulation study with data characteristics consistent with psychological text data, we found that SLDAX estimates were generally more accurate and more efficient. To illustrate the application of SLDAX and a two-stage approach, we provide an empirical clinical application to compare the application of both the two-stage and SLDAX approaches. Finally, we implemented the SLDAX model in an open-source R package to facilitate its use and further study. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有协变量的监督潜在狄利克雷分配:文本和协变量的贝叶斯结构和测量模型。
文本是心理学研究人员新兴的数据来源,但很少有方法论研究关注将流行的文本建模方法应用于心理学研究。一个流行的文本测量模型,主题建模,使用潜在的混合模型来表示文档主体下面的主题。最近,心理学家研究了这些主题和其他心理测量之间的关系,方法是将主题的估计值与其他明显变量一起用作回归预测因子。虽然已知涉及估计潜在变量的类似两阶段方法会产生有偏差的估计和不正确的标准误差,但两阶段主题建模方法受到的统计研究有限,正如我们所表明的,也会遇到同样的问题。为了解决这些问题,我们提出了一种新的统计模型,监督具有协变量的潜在狄利克雷分配(SLDAX),该模型结合了文本的潜在变量测量模型和结构回归模型,以允许潜在主题和其他明显变量作为结果的预测因子。使用一项数据特征与心理文本数据一致的模拟研究,我们发现SLDAX估计通常更准确、更有效。为了说明SLDAX和两阶段方法的应用,我们提供了一个经验临床应用来比较两阶段方法和SLDAX方法的应用。最后,我们在一个开源的R包中实现了SLDAX模型,以方便其使用和进一步研究。(PsycInfo数据库记录(c)2023 APA,保留所有权利)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychological methods
Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
13.10
自引率
7.10%
发文量
159
期刊介绍: Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.
期刊最新文献
Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. How to conduct an integrative mixed methods meta-analysis: A tutorial for the systematic review of quantitative and qualitative evidence. Updated guidelines on selecting an intraclass correlation coefficient for interrater reliability, with applications to incomplete observational designs. Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator. Estimating and investigating multiple constructs multiple indicators social relations models with and without roles within the traditional structural equation modeling framework: A tutorial.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1