From text to ties: Extraction of corruption network data from deferred prosecution agreements

IF 1.8 Q3 PUBLIC ADMINISTRATION Data & policy Pub Date : 2023-01-01 DOI:10.1017/dap.2022.41
T. Diviák, Nicholas Lord
{"title":"From text to ties: Extraction of corruption network data from deferred prosecution agreements","authors":"T. Diviák, Nicholas Lord","doi":"10.1017/dap.2022.41","DOIUrl":null,"url":null,"abstract":"Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dap.2022.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC ADMINISTRATION","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从文本到纽带:从延期起诉协议中提取腐败网络数据
摘要延期起诉协议是非透明解决腐败案件的法律工具。每一份DPA都附有一份事实陈述,其中提供了特定案件的详细和公开的文本记录,包括涉及谁、他们犯了什么以及与谁合作的汇总证据。这些陈述可以被翻译成适合社交网络分析的网络,从而允许对每个案例的结构和动态进行分析。在这项研究中,我们展示了如何从五份事实陈述中提取关于特定案件中涉及哪些行为者、这些行为者之间的关系和互动(例如,沟通或支付)以及他们的相关个人属性(性别、隶属关系和部门)的信息。我们用两个独立的编码器手动编码提取的信息,随后,我们评估编码器间的可靠性。为了评估节点和属性的编码可靠性,我们使用匹配系数,而为了评估关系的编码可靠性。我们根据每个编码器的编码构建一个网络,然后计算两个结果网络的图相关性。在五个提取的网络中,节点和关系的编码被证明是高度可靠的,而在最大网络的情况下,编码可靠性仅略低。属性的编码也非常可靠,尽管它很容易丢失有关参与者性别的数据。最后,我们讨论了我们的数据收集框架的灵活性,并通过在网络表示中包括网络动态和非人类参与者(如公司)对其进行了扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.10
自引率
0.00%
发文量
0
审稿时长
12 weeks
期刊最新文献
Determinants for university students’ location data sharing with public institutions during COVID-19: The Italian case Bus Rapid Transit: End of trend in Latin America? Accelerating and enhancing the generation of socioeconomic data to inform forced displacement policy and response “That is why users do not understand the maps we make for them”: Cartographic gaps between experts and domestic workers and the Right to the City Analysis of spatial–temporal validation patterns in Fortaleza’s public transport systems: a data mining approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1