Human Rights Violations in Space: Assessing the External Validity of Machine-Geocoded versus Human-Geocoded Data

IF 5.4 2区社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2021-12-15 DOI:10.1017/pan.2021.40

Logan Stundal, Benjamin E. Bagozzi, John R. Freeman, J. Holmes

{"title":"Human Rights Violations in Space: Assessing the External Validity of Machine-Geocoded versus Human-Geocoded Data","authors":"Logan Stundal, Benjamin E. Bagozzi, John R. Freeman, J. Holmes","doi":"10.1017/pan.2021.40","DOIUrl":null,"url":null,"abstract":"Abstract Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":"31 1","pages":"81 - 97"},"PeriodicalIF":5.4000,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.40","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}

引用次数: 2

Abstract

Abstract Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

空间侵犯人权行为：评估机器地理编码数据与人类地理编码数据的外部有效性

摘要政治事件数据被广泛用于政治暴力研究。近年来，国际新闻来源的政治事件数据的自动编码取得了显著进展。然而，机器编码的事件数据的有效性仍然存在争议，尤其是在事件地理定位的背景下。我们分析了与独立（地面实况）源相关的人类和机器地理编码事件数据一致性的频率。这些事件是哥伦比亚境内侵犯人权的行为。我们对哥伦比亚冲突的一个关键的8年时期、三个2年的次级时期以及一组选定的（非）新闻偏远城市进行了评估。作为对该分析的补充，我们基于这三个数据集估计空间概率集模型。这些模型假设高斯马尔可夫随机场误差过程；它们是使用随机偏微分方程构造的，并使用集成嵌套拉普拉斯近似进行估计。估计的模型告诉我们，这三个数据集是否产生了可比较的预测，是否少报了与相同协变量相关的事件，以及是否具有相似的预测误差模式。这两项分析共同表明，对于这种国家以下的冲突，机器和人类地理编码的数据集在外部有效性方面是可比较的，但根据地质统计学模型，会产生在重要方面不同的预测误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Political Analysis POLITICAL SCIENCE-

CiteScore

8.80

自引率

3.70%

发文量

期刊介绍： Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.