Cross-Domain Topic Classification for Political Texts

IF 4.7 2区 社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2021-10-21 DOI:10.1017/pan.2021.37
Moritz Osnabrügge, Elliott Ash, M. Morelli
{"title":"Cross-Domain Topic Classification for Political Texts","authors":"Moritz Osnabrügge, Elliott Ash, M. Morelli","doi":"10.1017/pan.2021.37","DOIUrl":null,"url":null,"abstract":"Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.37","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
引用次数: 17

Abstract

Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
政治文本的跨领域主题分类
摘要我们介绍并评估了监督学习在跨领域主题分类中的应用。在这种方法中,算法学习对标记的源语料库中的主题进行分类,然后从另一个领域推断未标记的目标语料库中的话题。使用现有训练数据的能力使该方法比域内监督学习更有效。与无监督主题模型相比,它还有三个优点:该方法可以更具体地针对研究问题,并且生成的主题更容易验证和解释。我们使用标记的政党纲领(源语料库)和未标记的议会演讲(目标语料库)来演示该方法。除了标准的域内错误度量外,我们还通过标记目标语料库文档的子集来进一步验证跨域性能。我们发现,分类器准确地分配了议会演讲中的主题,尽管准确性因主题而异。我们还提出了诊断跨领域分类的工具。为了说明该方法的有用性,我们提出了两个关于选举规则和议员性别如何影响演讲主题选择的案例研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Political Analysis
Political Analysis POLITICAL SCIENCE-
CiteScore
8.80
自引率
3.70%
发文量
30
期刊介绍: Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.
期刊最新文献
Synthetic Replacements for Human Survey Data? The Perils of Large Language Models NonRandom Tweet Mortality and Data Access Restrictions: Compromising the Replication of Sensitive Twitter Studies Generalizing toward Nonrespondents: Effect Estimates in Survey Experiments Are Broadly Similar for Eager and Reluctant Participants Estimators for Topic-Sampling Designs Flexible Estimation of Policy Preferences for Witnesses in Committee Hearings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1