The Congressional Classification Challenge: Domain Specificity and Partisan Intensity

Hao Yan, Sanmay Das, Allen Lavoie, Sirui Li, Betsy Sinclair
{"title":"The Congressional Classification Challenge: Domain Specificity and Partisan Intensity","authors":"Hao Yan, Sanmay Das, Allen Lavoie, Sirui Li, Betsy Sinclair","doi":"10.1145/3328526.3329582","DOIUrl":null,"url":null,"abstract":"In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affiliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affiliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.","PeriodicalId":416173,"journal":{"name":"Proceedings of the 2019 ACM Conference on Economics and Computation","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328526.3329582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we study the effectiveness and generalizability of techniques for classifying partisanship and ideology from text in the context of US politics. In particular, we are interested in how well measures of partisanship transfer across domains as well as the potential to rely upon measures of partisan intensity as a proxy for political ideology. We construct novel datasets of English texts from (1) the Congressional Record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms to evaluate domain specificity via a domain adaptation technique. Surprisingly, we find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is in general poor, even though the algorithms perform very well in within-dataset cross-validation tests. While party affiliation of legislators is not predictable based on models learned from other sources, we do find some ability to predict the leanings of the media and crowdsourced websites based on models learned from the Congressional Record. This predictivity is different across topics, and itself a priori predictable based on within-topic cross-validation results. Temporally, phrases tend to move from politicians to the media, helping to explain this predictivity. Finally, when we compare legislators themselves across different media (the Congressional Record and press releases), we find that while party affiliation is highly predictable, within-party ideology is completely unpredictable. Legislators are communicating different messages through different channels while clearly signaling party identity systematically across all channels. Choice of language is a clearly strategic act, among both legislators and the media, and we must therefore proceed with extreme caution in extrapolating from language to partisanship or ideology across domains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
国会分类挑战:领域特异性和党派强度
在本文中,我们研究了在美国政治背景下从文本中对党派和意识形态进行分类的技术的有效性和普遍性。特别是,我们感兴趣的是党派关系跨领域转移的衡量标准,以及依赖党派强度衡量作为政治意识形态代理的潜力。我们从(1)国会记录、(2)著名的保守派和自由派媒体网站以及(3)保守派和自由派维基百科中构建了新的英文文本数据集,并通过领域自适应技术应用文本分类算法来评估领域特异性。令人惊讶的是,我们发现跨领域学习性能,从一个数据集推广到另一个数据集的基准能力,通常很差,即使算法在数据集内交叉验证测试中表现非常好。虽然根据从其他来源获得的模型无法预测立法者的党派关系,但我们确实发现,根据从国会记录中获得的模型,可以预测媒体和众包网站的倾向。这种可预测性在不同主题之间是不同的,并且本身是基于主题内交叉验证结果的先验可预测性。从时间上看,短语往往会从政客那里转移到媒体那里,这有助于解释这种预测性。最后,当我们比较不同媒体(国会记录和新闻稿)上的立法者时,我们发现,虽然党派关系是高度可预测的,但党内意识形态是完全不可预测的。立法者通过不同的渠道传达不同的信息,同时在所有渠道中系统地明确表明党的身份。对于立法者和媒体来说,语言的选择显然是一种战略行为,因此,我们必须非常谨慎地从语言推断出跨领域的党派或意识形态。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computing Core-Stable Outcomes in Combinatorial Exchanges with Financially Constrained Bidders No Stratification Without Representation How to Sell a Dataset? Pricing Policies for Data Monetization Prophet Inequalities for I.I.D. Random Variables from an Unknown Distribution Incorporating Compatible Pairs in Kidney Exchange: A Dynamic Weighted Matching Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1