Angela Lai, Megan A. Brown, James Bisbee, Joshua A. Tucker, Jonathan Nagler, Richard Bonneau
We present a method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape.
{"title":"Estimating the Ideology of Political YouTube Videos","authors":"Angela Lai, Megan A. Brown, James Bisbee, Joshua A. Tucker, Jonathan Nagler, Richard Bonneau","doi":"10.1017/pan.2023.42","DOIUrl":"https://doi.org/10.1017/pan.2023.42","url":null,"abstract":"\u0000 We present a method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139781418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valence is a crucial concept in studying spatial voting and party competition. The widely adopted approach is to rely on intercepts of vote choice models and to infer, based on their size and direction, how valence affects party strategies in empirical settings. The approach suffers from fundamental statistical flaws. This contribution provides the statistical fundamentals to advance the empirical modeling of valence. It proposes an appropriate modeling approach to interpret intercepts as valences and alternate specifications to parameterize the effects of valence.
{"title":"Vote Choices and Valence: Intercepts and Alternate Specifications","authors":"Ingrid Mauerer, Gerhard Tutz","doi":"10.1017/pan.2023.43","DOIUrl":"https://doi.org/10.1017/pan.2023.43","url":null,"abstract":"\u0000 Valence is a crucial concept in studying spatial voting and party competition. The widely adopted approach is to rely on intercepts of vote choice models and to infer, based on their size and direction, how valence affects party strategies in empirical settings. The approach suffers from fundamental statistical flaws. This contribution provides the statistical fundamentals to advance the empirical modeling of valence. It proposes an appropriate modeling approach to interpret intercepts as valences and alternate specifications to parameterize the effects of valence.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A recent paper by Häffner et al. (2023, Political Analysis 31, 481–499) introduces an interpretable deep learning approach for domain-specific dictionary creation, where it is claimed that the dictionary-based approach outperforms finetuned language models in predictive accuracy while retaining interpretability. We show that the dictionary-based approach’s reported superiority over large language models, BERT specifically, is due to the fact that most of the parameters in the language models are excluded from finetuning. In this letter, we first discuss the architecture of BERT models, then explain the limitations of finetuning only the top classification layer, and lastly we report results where finetuned language models outperform the newly proposed dictionary-based approach by 27% in terms of $R^2$ and 46% in terms of mean squared error once we allow these parameters to learn during finetuning. Researchers interested in large language models, text classification, and text regression should find our results useful. Our code and data are publicly available.
{"title":"On Finetuning Large Language Models","authors":"Yu Wang","doi":"10.1017/pan.2023.36","DOIUrl":"https://doi.org/10.1017/pan.2023.36","url":null,"abstract":"A recent paper by Häffner et al. (2023, Political Analysis 31, 481–499) introduces an interpretable deep learning approach for domain-specific dictionary creation, where it is claimed that the dictionary-based approach outperforms finetuned language models in predictive accuracy while retaining interpretability. We show that the dictionary-based approach’s reported superiority over large language models, BERT specifically, is due to the fact that most of the parameters in the language models are excluded from finetuning. In this letter, we first discuss the architecture of BERT models, then explain the limitations of finetuning only the top classification layer, and lastly we report results where finetuned language models outperform the newly proposed dictionary-based approach by 27% in terms of $R^2$ and 46% in terms of mean squared error once we allow these parameters to learn during finetuning. Researchers interested in large language models, text classification, and text regression should find our results useful. Our code and data are publicly available.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139227089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Cerina, C. Barrie, Neil Ketchley, Aaron Y. Zelin
Who joins extremist movements? Answering this question is beset by methodological challenges as survey techniques are infeasible and selective samples provide no counterfactual. Recruits can be assigned to contextual units, but this is vulnerable to problems of ecological inference. In this article, we elaborate a technique that combines survey and ecological approaches. The Bayesian hierarchical case–control design that we propose allows us to identify individual-level and contextual factors patterning the incidence of recruitment to extremism, while accounting for spatial autocorrelation, rare events, and contamination. We empirically validate our approach by matching a sample of Islamic State (ISIS) fighters from nine MENA countries with representative population surveys enumerated shortly before recruits joined the movement. High-status individuals in their early twenties with college education were more likely to join ISIS. There is more mixed evidence for relative deprivation. The accompanying extremeR package provides functionality for applied researchers to implement our approach.
{"title":"Explaining Recruitment to Extremism: A Bayesian Hierarchical Case–Control Approach","authors":"Roberto Cerina, C. Barrie, Neil Ketchley, Aaron Y. Zelin","doi":"10.1017/pan.2023.35","DOIUrl":"https://doi.org/10.1017/pan.2023.35","url":null,"abstract":"Who joins extremist movements? Answering this question is beset by methodological challenges as survey techniques are infeasible and selective samples provide no counterfactual. Recruits can be assigned to contextual units, but this is vulnerable to problems of ecological inference. In this article, we elaborate a technique that combines survey and ecological approaches. The Bayesian hierarchical case–control design that we propose allows us to identify individual-level and contextual factors patterning the incidence of recruitment to extremism, while accounting for spatial autocorrelation, rare events, and contamination. We empirically validate our approach by matching a sample of Islamic State (ISIS) fighters from nine MENA countries with representative population surveys enumerated shortly before recruits joined the movement. High-status individuals in their early twenties with college education were more likely to join ISIS. There is more mixed evidence for relative deprivation. The accompanying extremeR package provides functionality for applied researchers to implement our approach.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139267638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Political scientists commonly use Grambsch and Therneau’s (1994, Biometrika 81, 515–526) ubiquitous Schoenfeld-based test to diagnose proportional hazard violations in Cox duration models. However, some statistical packages have changed how they implement the test’s calculation. The traditional implementation makes a simplifying assumption about the test’s variance–covariance matrix, while the newer implementation does not. Recent work suggests the test’s performance differs, depending on its implementation. I use Monte Carlo simulations to more thoroughly investigate whether the test’s implementation affects its performance. Surprisingly, I find the newer implementation performs very poorly with correlated covariates, with a false positive rate far above 5%. By contrast, the traditional implementation has no such issues in the same situations. This shocking finding raises new, complex questions for researchers moving forward. It appears to suggest, for now, researchers should favor the traditional implementation in situations where its simplifying assumption is likely met, but researchers must also be mindful that this implementation’s false positive rate can be high in misspecified models.
{"title":"Implementation Matters: Evaluating the Proportional Hazard Test’s Performance","authors":"Shawna K. Metzger","doi":"10.1017/pan.2023.34","DOIUrl":"https://doi.org/10.1017/pan.2023.34","url":null,"abstract":"Abstract Political scientists commonly use Grambsch and Therneau’s (1994, Biometrika 81, 515–526) ubiquitous Schoenfeld-based test to diagnose proportional hazard violations in Cox duration models. However, some statistical packages have changed how they implement the test’s calculation. The traditional implementation makes a simplifying assumption about the test’s variance–covariance matrix, while the newer implementation does not. Recent work suggests the test’s performance differs, depending on its implementation. I use Monte Carlo simulations to more thoroughly investigate whether the test’s implementation affects its performance. Surprisingly, I find the newer implementation performs very poorly with correlated covariates, with a false positive rate far above 5%. By contrast, the traditional implementation has no such issues in the same situations. This shocking finding raises new, complex questions for researchers moving forward. It appears to suggest, for now, researchers should favor the traditional implementation in situations where its simplifying assumption is likely met, but researchers must also be mindful that this implementation’s false positive rate can be high in misspecified models.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135476764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreu Girbau, Tetsuro Kobayashi, Benjamin Renoust, Yusuke Matsui, Shin’ichi Satoh
Abstract Analyzing the appearances of political figures in large-scale news archives is increasingly important with the growing availability of large-scale news archives and developments in computer vision. We present a deep learning-based method combining face detection, tracking, and classification, which is particularly unique because it does not require any re-training when targeting new individuals. Users can feed only a few images of target individuals to reliably detect, track, and classify them. Extensive validation of prominent political figures in two news archives spanning 10 to 20 years, one containing three U.S. cable news and the other including two major Japanese news programs, consistently shows high performance and flexibility of the proposed method. The codes are made readily available to the public.
{"title":"Face Detection, Tracking, and Classification from Large-Scale News Archives for Analysis of Key Political Figures","authors":"Andreu Girbau, Tetsuro Kobayashi, Benjamin Renoust, Yusuke Matsui, Shin’ichi Satoh","doi":"10.1017/pan.2023.33","DOIUrl":"https://doi.org/10.1017/pan.2023.33","url":null,"abstract":"Abstract Analyzing the appearances of political figures in large-scale news archives is increasingly important with the growing availability of large-scale news archives and developments in computer vision. We present a deep learning-based method combining face detection, tracking, and classification, which is particularly unique because it does not require any re-training when targeting new individuals. Users can feed only a few images of target individuals to reliably detect, track, and classify them. Extensive validation of prominent political figures in two news archives spanning 10 to 20 years, one containing three U.S. cable news and the other including two major Japanese news programs, consistently shows high performance and flexibility of the proposed method. The codes are made readily available to the public.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135635203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This article introduces to political science a framework to analyze the content of visual material through unsupervised and semi-supervised methods. It details the implementation of a tool from the computer vision field, the Bag of Visual Words (BoVW), for the definition and extraction of “tokens” that allow researchers to build an Image-Visual Word Matrix which emulates the Document-Term matrix in text analysis. This reduction technique is the basis for several tools familiar to social scientists, such as topic models, that permit exploratory, and semi-supervised analysis of images. The framework has gains in transparency, interpretability, and inclusion of domain knowledge with respect to other deep learning techniques. I illustrate the scope of the BoVW by conducting a novel visual structural topic model which focuses substantively on the identification of visual frames from the pictures of the migrant caravan from Central America.
{"title":"A Framework for the Unsupervised and Semi-Supervised Analysis of Visual Frames","authors":"Michelle Torres","doi":"10.1017/pan.2023.32","DOIUrl":"https://doi.org/10.1017/pan.2023.32","url":null,"abstract":"Abstract This article introduces to political science a framework to analyze the content of visual material through unsupervised and semi-supervised methods. It details the implementation of a tool from the computer vision field, the Bag of Visual Words (BoVW), for the definition and extraction of “tokens” that allow researchers to build an Image-Visual Word Matrix which emulates the Document-Term matrix in text analysis. This reduction technique is the basis for several tools familiar to social scientists, such as topic models, that permit exploratory, and semi-supervised analysis of images. The framework has gains in transparency, interpretability, and inclusion of domain knowledge with respect to other deep learning techniques. I illustrate the scope of the BoVW by conducting a novel visual structural topic model which focuses substantively on the identification of visual frames from the pictures of the migrant caravan from Central America.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135366150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Quantitative empirical inquiry in international relations often relies on dyadic data. Standard analytic techniques do not account for the fact that dyads are not generally independent of one another. That is, when dyads share a constituent member (e.g., a common country), they may be statistically dependent, or “clustered.” Recent work has developed dyadic clustering robust standard errors (DCRSEs) that account for this dependence. Using these DCRSEs, we reanalyzed all empirical articles published in International Organization between January 2014 and January 2020 that feature dyadic data. We find that published standard errors for key explanatory variables are, on average, approximately half as large as DCRSEs, suggesting that dyadic clustering is leading researchers to severely underestimate uncertainty. However, most (67% of) statistically significant findings remain statistically significant when using DCRSEs. We conclude that accounting for dyadic clustering is both important and feasible, and offer software in R and Stata to facilitate use of DCRSEs in future research.
{"title":"Dyadic Clustering in International Relations","authors":"Jacob Carlson, Trevor Incerti, P. M. Aronow","doi":"10.1017/pan.2023.26","DOIUrl":"https://doi.org/10.1017/pan.2023.26","url":null,"abstract":"Abstract Quantitative empirical inquiry in international relations often relies on dyadic data. Standard analytic techniques do not account for the fact that dyads are not generally independent of one another. That is, when dyads share a constituent member (e.g., a common country), they may be statistically dependent, or “clustered.” Recent work has developed dyadic clustering robust standard errors (DCRSEs) that account for this dependence. Using these DCRSEs, we reanalyzed all empirical articles published in International Organization between January 2014 and January 2020 that feature dyadic data. We find that published standard errors for key explanatory variables are, on average, approximately half as large as DCRSEs, suggesting that dyadic clustering is leading researchers to severely underestimate uncertainty. However, most (67% of) statistically significant findings remain statistically significant when using DCRSEs. We conclude that accounting for dyadic clustering is both important and feasible, and offer software in R and Stata to facilitate use of DCRSEs in future research.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135696155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Hartmann, Macartan Humphreys, Ferdinand Geissler, Heike Klüver, Johannes Giesecke
An abstract is not available for this content. As you have access to this content, full HTML content is provided on this page. A PDF of this content is also available in through the ‘Save PDF’ action button.
{"title":"Trading Liberties: Estimating COVID-19 Policy Preferences from Conjoint Data – CORRIGENDUM","authors":"Felix Hartmann, Macartan Humphreys, Ferdinand Geissler, Heike Klüver, Johannes Giesecke","doi":"10.1017/pan.2023.29","DOIUrl":"https://doi.org/10.1017/pan.2023.29","url":null,"abstract":"An abstract is not available for this content. As you have access to this content, full HTML content is provided on this page. A PDF of this content is also available in through the ‘Save PDF’ action button.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135011389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}