Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures

IF 4.7 2区 社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2021-09-27 DOI:10.1017/pan.2021.33
Luwei Ying, J. Montgomery, Brandon M Stewart
{"title":"Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures","authors":"Luwei Ying, J. Montgomery, Brandon M Stewart","doi":"10.1017/pan.2021.33","DOIUrl":null,"url":null,"abstract":"Abstract Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":"30 1","pages":"570 - 589"},"PeriodicalIF":4.7000,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.33","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
引用次数: 22

Abstract

Abstract Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
主题、概念和度量:验证主题作为度量的众包程序
摘要主题模型是在计算机科学中发展起来的,是探索和总结大型文档集的有效工具。然而,当应用于社会科学研究时,它们通常用于测量,这项任务需要仔细验证,以确保模型输出实际捕捉到所需的兴趣概念。在本文中,我们回顾了该领域主题验证的当前实践,并表明广泛的模型验证越来越罕见,或者至少在论文和附录中没有系统地报告。为了补充当前的实践,我们改进了Chang和合著者现有的众包方法,以验证主题质量,并继续创建新的程序来验证研究人员提供的概念标签。我们通过分析美国参议员在脸书上的帖子来说明我们的方法,并为希望验证自己的主题模型的研究人员提供软件和指导。虽然量身定制的、针对具体案例的验证练习总是最好的,但我们的目标是通过提供一个通用工具来验证主题作为衡量标准来改进标准实践。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Political Analysis
Political Analysis POLITICAL SCIENCE-
CiteScore
8.80
自引率
3.70%
发文量
30
期刊介绍: Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.
期刊最新文献
Assessing Performance of Martins's and Sampson's Formulae for Calculation of LDL-C in Indian Population: A Single Center Retrospective Study. On Finetuning Large Language Models Explaining Recruitment to Extremism: A Bayesian Hierarchical Case–Control Approach Implementation Matters: Evaluating the Proportional Hazard Test’s Performance Face Detection, Tracking, and Classification from Large-Scale News Archives for Analysis of Key Political Figures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1