Human Rights Violations in Space: Assessing the External Validity of Machine-Geocoded versus Human-Geocoded Data

IF 4.7 2区 社会学 Q1 POLITICAL SCIENCE Political Analysis Pub Date : 2021-12-15 DOI:10.1017/pan.2021.40
Logan Stundal, Benjamin E. Bagozzi, John R. Freeman, J. Holmes
{"title":"Human Rights Violations in Space: Assessing the External Validity of Machine-Geocoded versus Human-Geocoded Data","authors":"Logan Stundal, Benjamin E. Bagozzi, John R. Freeman, J. Holmes","doi":"10.1017/pan.2021.40","DOIUrl":null,"url":null,"abstract":"Abstract Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.","PeriodicalId":48270,"journal":{"name":"Political Analysis","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Analysis","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/pan.2021.40","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"POLITICAL SCIENCE","Score":null,"Total":0}
引用次数: 2

Abstract

Abstract Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
空间侵犯人权行为:评估机器地理编码数据与人类地理编码数据的外部有效性
摘要政治事件数据被广泛用于政治暴力研究。近年来,国际新闻来源的政治事件数据的自动编码取得了显著进展。然而,机器编码的事件数据的有效性仍然存在争议,尤其是在事件地理定位的背景下。我们分析了与独立(地面实况)源相关的人类和机器地理编码事件数据一致性的频率。这些事件是哥伦比亚境内侵犯人权的行为。我们对哥伦比亚冲突的一个关键的8年时期、三个2年的次级时期以及一组选定的(非)新闻偏远城市进行了评估。作为对该分析的补充,我们基于这三个数据集估计空间概率集模型。这些模型假设高斯马尔可夫随机场误差过程;它们是使用随机偏微分方程构造的,并使用集成嵌套拉普拉斯近似进行估计。估计的模型告诉我们,这三个数据集是否产生了可比较的预测,是否少报了与相同协变量相关的事件,以及是否具有相似的预测误差模式。这两项分析共同表明,对于这种国家以下的冲突,机器和人类地理编码的数据集在外部有效性方面是可比较的,但根据地质统计学模型,会产生在重要方面不同的预测误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Political Analysis
Political Analysis POLITICAL SCIENCE-
CiteScore
8.80
自引率
3.70%
发文量
30
期刊介绍: Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.
期刊最新文献
Synthetic Replacements for Human Survey Data? The Perils of Large Language Models NonRandom Tweet Mortality and Data Access Restrictions: Compromising the Replication of Sensitive Twitter Studies Generalizing toward Nonrespondents: Effect Estimates in Survey Experiments Are Broadly Similar for Eager and Reluctant Participants Estimators for Topic-Sampling Designs Flexible Estimation of Policy Preferences for Witnesses in Committee Hearings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1