Cross-Domain Topic Classification for Political Texts

IF 4.7 2区社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2021-10-21 DOI:10.1017/pan.2021.37

Moritz Osnabrügge, Elliott Ash, M. Morelli

{"title":"Cross-Domain Topic Classification for Political Texts","authors":"Moritz Osnabrügge, Elliott Ash, M. Morelli","doi":"10.1017/pan.2021.37","DOIUrl":null,"url":null,"abstract":"Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":"31 1","pages":"59 - 80"},"PeriodicalIF":4.7000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.37","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}

引用次数: 17

Abstract

Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

政治文本的跨领域主题分类

摘要我们介绍并评估了监督学习在跨领域主题分类中的应用。在这种方法中，算法学习对标记的源语料库中的主题进行分类，然后从另一个领域推断未标记的目标语料库中的话题。使用现有训练数据的能力使该方法比域内监督学习更有效。与无监督主题模型相比，它还有三个优点：该方法可以更具体地针对研究问题，并且生成的主题更容易验证和解释。我们使用标记的政党纲领（源语料库）和未标记的议会演讲（目标语料库）来演示该方法。除了标准的域内错误度量外，我们还通过标记目标语料库文档的子集来进一步验证跨域性能。我们发现，分类器准确地分配了议会演讲中的主题，尽管准确性因主题而异。我们还提出了诊断跨领域分类的工具。为了说明该方法的有用性，我们提出了两个关于选举规则和议员性别如何影响演讲主题选择的案例研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Political Analysis POLITICAL SCIENCE-

CiteScore

8.80

自引率

3.70%

发文量

期刊介绍： Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.