From text to ties: Extraction of corruption network data from deferred prosecution agreements

IF 1.8 Q3 PUBLIC ADMINISTRATION Data & policy Pub Date : 2023-01-01 DOI:10.1017/dap.2022.41

T. Diviák, Nicholas Lord

{"title":"From text to ties: Extraction of corruption network data from deferred prosecution agreements","authors":"T. Diviák, Nicholas Lord","doi":"10.1017/dap.2022.41","DOIUrl":null,"url":null,"abstract":"Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dap.2022.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC ADMINISTRATION","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从文本到纽带：从延期起诉协议中提取腐败网络数据

摘要延期起诉协议是非透明解决腐败案件的法律工具。每一份DPA都附有一份事实陈述，其中提供了特定案件的详细和公开的文本记录，包括涉及谁、他们犯了什么以及与谁合作的汇总证据。这些陈述可以被翻译成适合社交网络分析的网络，从而允许对每个案例的结构和动态进行分析。在这项研究中，我们展示了如何从五份事实陈述中提取关于特定案件中涉及哪些行为者、这些行为者之间的关系和互动（例如，沟通或支付）以及他们的相关个人属性（性别、隶属关系和部门）的信息。我们用两个独立的编码器手动编码提取的信息，随后，我们评估编码器间的可靠性。为了评估节点和属性的编码可靠性，我们使用匹配系数，而为了评估关系的编码可靠性。我们根据每个编码器的编码构建一个网络，然后计算两个结果网络的图相关性。在五个提取的网络中，节点和关系的编码被证明是高度可靠的，而在最大网络的情况下，编码可靠性仅略低。属性的编码也非常可靠，尽管它很容易丢失有关参与者性别的数据。最后，我们讨论了我们的数据收集框架的灵活性，并通过在网络表示中包括网络动态和非人类参与者（如公司）对其进行了扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊