Hadassah Drukarch, Carlos Calleja, E. Fosch-Villaronga
Abstract There is an increasing gap between the policy cycle’s speed and that of technological and social change. This gap is becoming broader and more prominent in robotics, that is, movable machines that perform tasks either automatically or with a degree of autonomy. This is because current legislation was unprepared for machine learning and autonomous agents. As a result, the law often lags behind and does not adequately frame robot technologies. This state of affairs inevitably increases legal uncertainty. It is unclear what regulatory frameworks developers have to follow to comply, often resulting in technology that does not perform well in the wild, is unsafe, and can exacerbate biases and lead to discrimination. This paper explores these issues and considers the background, key findings, and lessons learned of the LIAISON project, which stands for “Liaising robot development and policymaking,” and aims to ideate an alignment model for robots’ legal appraisal channeling robot policy development from a hybrid top-down/bottom-up perspective to solve this mismatch. As such, LIAISON seeks to uncover to what extent compliance tools could be used as data generators for robot policy purposes to unravel an optimal regulatory framing for existing and emerging robot technologies.
{"title":"An iterative regulatory process for robot governance","authors":"Hadassah Drukarch, Carlos Calleja, E. Fosch-Villaronga","doi":"10.1017/dap.2023.3","DOIUrl":"https://doi.org/10.1017/dap.2023.3","url":null,"abstract":"Abstract There is an increasing gap between the policy cycle’s speed and that of technological and social change. This gap is becoming broader and more prominent in robotics, that is, movable machines that perform tasks either automatically or with a degree of autonomy. This is because current legislation was unprepared for machine learning and autonomous agents. As a result, the law often lags behind and does not adequately frame robot technologies. This state of affairs inevitably increases legal uncertainty. It is unclear what regulatory frameworks developers have to follow to comply, often resulting in technology that does not perform well in the wild, is unsafe, and can exacerbate biases and lead to discrimination. This paper explores these issues and considers the background, key findings, and lessons learned of the LIAISON project, which stands for “Liaising robot development and policymaking,” and aims to ideate an alignment model for robots’ legal appraisal channeling robot policy development from a hybrid top-down/bottom-up perspective to solve this mismatch. As such, LIAISON seeks to uncover to what extent compliance tools could be used as data generators for robot policy purposes to unravel an optimal regulatory framing for existing and emerging robot technologies.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44989249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hou Yee Quek, F. Sielker, J. Akroyd, A. Bhave, Aurel von Richthofen, P. Herthogs, C. Yamu, L. Wan, T. Nochta, G. Burgess, Mei Qi Lim, S. Mosbach, Markus Kraft
Abstract Today, technological developments are ever-growing yet fragmented. Alongside inconsistent digital approaches and attitudes across city administrations, such developments have made it difficult to reap the benefits of city digital twins. Bringing together experiences from five research projects, this paper discusses these digital twins based on two digital integration methodologies—systems and semantic integration. We revisit the nature of the underlying technologies, and their implications for interoperability and compatibility in the context of planning processes and smart urbanism. Semantic approaches present a new opportunity for bidirectional data flows that can inform both governance processes and technological systems to co-create, cross-pollinate, and support optimal outcomes. Building on this opportunity, we suggest that considering the technological dimension as a new addition to the trifecta of economic, environmental, and social sustainability goals that guide planning processes, can aid governments to address this conundrum of fragmentation, interoperability, and compatibility.
{"title":"The conundrum in smart city governance: Interoperability and compatibility in an ever-growing ecosystem of digital twins","authors":"Hou Yee Quek, F. Sielker, J. Akroyd, A. Bhave, Aurel von Richthofen, P. Herthogs, C. Yamu, L. Wan, T. Nochta, G. Burgess, Mei Qi Lim, S. Mosbach, Markus Kraft","doi":"10.1017/dap.2023.1","DOIUrl":"https://doi.org/10.1017/dap.2023.1","url":null,"abstract":"Abstract Today, technological developments are ever-growing yet fragmented. Alongside inconsistent digital approaches and attitudes across city administrations, such developments have made it difficult to reap the benefits of city digital twins. Bringing together experiences from five research projects, this paper discusses these digital twins based on two digital integration methodologies—systems and semantic integration. We revisit the nature of the underlying technologies, and their implications for interoperability and compatibility in the context of planning processes and smart urbanism. Semantic approaches present a new opportunity for bidirectional data flows that can inform both governance processes and technological systems to co-create, cross-pollinate, and support optimal outcomes. Building on this opportunity, we suggest that considering the technological dimension as a new addition to the trifecta of economic, environmental, and social sustainability goals that guide planning processes, can aid governments to address this conundrum of fragmentation, interoperability, and compatibility.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49167143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Supreme audit institutions (SAIs) are touted as an integral component to anticorruption efforts in developing nations. SAIs review governmental budgets and report fiscal discrepancies in publicly available audit reports. These documents contain valuable information on budgetary discrepancies, missing resources, or may even report fraud and corruption. Existing research on anticorruption efforts relies on information published by national-level SAIs while mostly ignoring audits from subnational SAIs because their information is not published in accessible formats. I collect publicly available audit reports published by a subnational SAI in Mexico, the Auditoria Superior del Estado de Sinaloa, and build a pipeline for extracting the monetary value of discrepancies detected in municipal budgets. I systematically convert scanned documents into machine-readable text using optical character recognition, and I then train a classification model to identify paragraphs with relevant information. From the relevant paragraphs, I extract the monetary values of budgetary discrepancies by developing a named entity recognizer that automates the identification of this information. In this paper, I explain the steps for building the pipeline and detail the procedures for replicating it in different contexts. The resulting dataset contains the official amounts of discrepancies in municipal budgets for the state of Sinaloa. This information is useful to anticorruption policymakers because it quantifies discrepancies in municipal spending potentially motivating reforms that mitigate misappropriation. Although I focus on a single state in Mexico, this method can be extended to any context where audit reports are publicly available.
{"title":"Fiscal data in text: Information extraction from audit reports using Natural Language Processing","authors":"Alejandro Beltran","doi":"10.1017/dap.2023.4","DOIUrl":"https://doi.org/10.1017/dap.2023.4","url":null,"abstract":"Abstract Supreme audit institutions (SAIs) are touted as an integral component to anticorruption efforts in developing nations. SAIs review governmental budgets and report fiscal discrepancies in publicly available audit reports. These documents contain valuable information on budgetary discrepancies, missing resources, or may even report fraud and corruption. Existing research on anticorruption efforts relies on information published by national-level SAIs while mostly ignoring audits from subnational SAIs because their information is not published in accessible formats. I collect publicly available audit reports published by a subnational SAI in Mexico, the Auditoria Superior del Estado de Sinaloa, and build a pipeline for extracting the monetary value of discrepancies detected in municipal budgets. I systematically convert scanned documents into machine-readable text using optical character recognition, and I then train a classification model to identify paragraphs with relevant information. From the relevant paragraphs, I extract the monetary values of budgetary discrepancies by developing a named entity recognizer that automates the identification of this information. In this paper, I explain the steps for building the pipeline and detail the procedures for replicating it in different contexts. The resulting dataset contains the official amounts of discrepancies in municipal budgets for the state of Sinaloa. This information is useful to anticorruption policymakers because it quantifies discrepancies in municipal spending potentially motivating reforms that mitigate misappropriation. Although I focus on a single state in Mexico, this method can be extended to any context where audit reports are publicly available.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45973847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Open government and open data are often presented as the Asterix and Obelix of modern government—one cannot discuss one, without involving the other. Modern government, in this narrative, should open itself up, be more transparent, and allow the governed to have a say in their governance. The usage of technologies, and especially the communication of governmental data, is then thought to be one of the crucial instruments helping governments achieving these goals. Much open government data research, hence, focuses on the publication of open government data, their reuse, and re-users. Recent research trends, by contrast, divert from this focus on data and emphasize the importance of studying open government data in practice, in interaction with practitioners, while simultaneously paying attention to their political character. This commentary looks more closely at the implications of emphasizing the practical and political dimensions of open government data. It argues that researchers should explicate how and in what way open government data policies present solutions to what kind of problems. Such explications should be based on a detailed empirical analysis of how different actors do or do not do open data. The key question to be continuously asked and answered when studying and implementing open government data is how the solutions openness present latch onto the problem they aim to solve.
{"title":"Studying open government data: Acknowledging practices and politics","authors":"Gijs van Maanen","doi":"10.1017/dap.2022.40","DOIUrl":"https://doi.org/10.1017/dap.2022.40","url":null,"abstract":"Abstract Open government and open data are often presented as the Asterix and Obelix of modern government—one cannot discuss one, without involving the other. Modern government, in this narrative, should open itself up, be more transparent, and allow the governed to have a say in their governance. The usage of technologies, and especially the communication of governmental data, is then thought to be one of the crucial instruments helping governments achieving these goals. Much open government data research, hence, focuses on the publication of open government data, their reuse, and re-users. Recent research trends, by contrast, divert from this focus on data and emphasize the importance of studying open government data in practice, in interaction with practitioners, while simultaneously paying attention to their political character. This commentary looks more closely at the implications of emphasizing the practical and political dimensions of open government data. It argues that researchers should explicate how and in what way open government data policies present solutions to what kind of problems. Such explications should be based on a detailed empirical analysis of how different actors do or do not do open data. The key question to be continuously asked and answered when studying and implementing open government data is how the solutions openness present latch onto the problem they aim to solve.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46217602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The promised merits of data-driven innovation in general and algorithmic systems in particular hardly need enumeration. However, as decision-making tasks are increasingly delegated to algorithmic systems, this raises questions about accountability. These pressing questions of algorithmic accountability, particularly with regard to data-driven innovation in the public sector, deserve ample scholarly attention. Therefore, this paper brings together perspectives from governance studies and critical algorithm studies to assess how algorithmic accountability succeeds or falls short in practice and analyses the Dutch System Risk Indication (SyRI) as an empirical case. Dissecting a concrete case teases out to which degree archetypical accountability practices and processes function in relation to algorithmic decision-making processes, and which new questions concerning algorithmic accountability emerge therein. The case is approached through the analysis of “scavenged” material. It was found that while these archetypical accountability processes and practices can be incredibly productive in dealing with algorithmic systems they are simultaneously at risk. The current accountability configurations hinge predominantly on the ex ante sensitivity and responsiveness of the political fora. When these prove insufficient, mitigation in medias res/ex post is very difficult for other actants. In part, this is not a new phenomenon, but it is amplified in relation to algorithmic systems. Different fora ask different kinds of medium-specific questions to the actor, from different perspectives with varying power relations. These algorithm-specific considerations relate to the decision-making around an algorithmic system, their functionality, and their deployment. Strengthening ex ante political accountability fora to these algorithm-specific considerations could help mitigate this.
{"title":"“Hey SyRI, tell me about algorithmic accountability”: Lessons from a landmark case","authors":"M. Wieringa","doi":"10.1017/dap.2022.39","DOIUrl":"https://doi.org/10.1017/dap.2022.39","url":null,"abstract":"Abstract The promised merits of data-driven innovation in general and algorithmic systems in particular hardly need enumeration. However, as decision-making tasks are increasingly delegated to algorithmic systems, this raises questions about accountability. These pressing questions of algorithmic accountability, particularly with regard to data-driven innovation in the public sector, deserve ample scholarly attention. Therefore, this paper brings together perspectives from governance studies and critical algorithm studies to assess how algorithmic accountability succeeds or falls short in practice and analyses the Dutch System Risk Indication (SyRI) as an empirical case. Dissecting a concrete case teases out to which degree archetypical accountability practices and processes function in relation to algorithmic decision-making processes, and which new questions concerning algorithmic accountability emerge therein. The case is approached through the analysis of “scavenged” material. It was found that while these archetypical accountability processes and practices can be incredibly productive in dealing with algorithmic systems they are simultaneously at risk. The current accountability configurations hinge predominantly on the ex ante sensitivity and responsiveness of the political fora. When these prove insufficient, mitigation in medias res/ex post is very difficult for other actants. In part, this is not a new phenomenon, but it is amplified in relation to algorithmic systems. Different fora ask different kinds of medium-specific questions to the actor, from different perspectives with varying power relations. These algorithm-specific considerations relate to the decision-making around an algorithmic system, their functionality, and their deployment. Strengthening ex ante political accountability fora to these algorithm-specific considerations could help mitigate this.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44861752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Bream McIntosh, Amy Burnett, Ira Feldman, Jenna A. Lamphere, Thomas A. Reuter, Emmanuelle Vital
{"title":"The role of sustainability knowledge-action platforms in advancing multi-stakeholder engagement on sustainability – ERRATUM","authors":"Oliver Bream McIntosh, Amy Burnett, Ira Feldman, Jenna A. Lamphere, Thomas A. Reuter, Emmanuelle Vital","doi":"10.1017/dap.2023.31","DOIUrl":"https://doi.org/10.1017/dap.2023.31","url":null,"abstract":"","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134980928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele Guariso, Omar A. Guerrero, Gonzalo Castañeda
Abstract The “budgeting for SDGs”–B4SDGs–paradigm seeks to coordinate the budgeting process of the fiscal cycle with the sustainable development goals (SDGs) set by the United Nations. Integrating the goals into public financial management systems is crucial for an effective alignment of national development priorities with the objectives set in the 2030 Agenda. Within the dynamic process defined in the B4SDGs framework, the step of SDG budget tagging represents a precondition for subsequent budget diagnostics. However, developing a national SDG taxonomy requires substantial investment in terms of time, human, and administrative resources. Such costs are exacerbated in least-developed countries, which are often characterized by a constrained institutional capacity. The automation of SDG budget tagging could represent a cost-effective solution. We use well-established text analysis and machine learning techniques to explore the scope and scalability of automatic labeling budget programs within the B4SDGs framework. The results show that, while our classifiers can achieve great accuracy, they face limitations when trained with data that is not representative of the institutional setting considered. These findings imply that a national government trying to integrate SDGs into its planning and budgeting practices cannot just rely solely on artificial intelligence (AI) tools and off-the-shelf coding schemes. Our results are relevant to academics and the broader policymaker community, contributing to the debate around the strengths and weaknesses of adopting computer algorithms to assist decision-making processes.
{"title":"Automatic SDG budget tagging: Building public financial management capacity through natural language processing","authors":"Daniele Guariso, Omar A. Guerrero, Gonzalo Castañeda","doi":"10.1017/dap.2023.28","DOIUrl":"https://doi.org/10.1017/dap.2023.28","url":null,"abstract":"Abstract The “budgeting for SDGs”–B4SDGs–paradigm seeks to coordinate the budgeting process of the fiscal cycle with the sustainable development goals (SDGs) set by the United Nations. Integrating the goals into public financial management systems is crucial for an effective alignment of national development priorities with the objectives set in the 2030 Agenda. Within the dynamic process defined in the B4SDGs framework, the step of SDG budget tagging represents a precondition for subsequent budget diagnostics. However, developing a national SDG taxonomy requires substantial investment in terms of time, human, and administrative resources. Such costs are exacerbated in least-developed countries, which are often characterized by a constrained institutional capacity. The automation of SDG budget tagging could represent a cost-effective solution. We use well-established text analysis and machine learning techniques to explore the scope and scalability of automatic labeling budget programs within the B4SDGs framework. The results show that, while our classifiers can achieve great accuracy, they face limitations when trained with data that is not representative of the institutional setting considered. These findings imply that a national government trying to integrate SDGs into its planning and budgeting practices cannot just rely solely on artificial intelligence (AI) tools and off-the-shelf coding schemes. Our results are relevant to academics and the broader policymaker community, contributing to the debate around the strengths and weaknesses of adopting computer algorithms to assist decision-making processes.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135800620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Bream McIntosh, Amy Burnett, Ira Feldman, Jenna A. Lamphere, Thomas A. Reuter, Emmanuelle Vital
Abstract Within the last decade, online sustainability knowledge-action platforms have proliferated. We surveyed 198 sustainability-oriented sites and conducted a review of 41 knowledge-action platforms, which we define as digital tools that advance sustainability through organized activities and knowledge dissemination. We analyzed platform structure and functionality through a systematic coding process based on key issues identified in three bodies of literature: (a) the emergence of digital platforms, (b) the localization of the sustainable development goals (SDGs), and (c) the importance of multi-level governance to sustainability action. While online collaborative tools offer an array of resources, our analysis indicates that they struggle to provide context-sensitivity and higher-level analysis of the trade-offs and synergies between sustainability actions. SDG localization adds another layer of complexity where multi-level governance, actor, and institutional priorities may generate tensions as well as opportunities for intra- and cross-sectoral alignment. On the basis of our analysis, we advocate for the development of integrative open-source and dynamic global online data management tools that would enable the monitoring of progress and facilitate peer-to-peer exchange of ideas and experience among local government, community, and business stakeholders. We argue that by showcasing and exemplifying local actions, an integrative platform that leverages existing content from multiple extant platforms through effective data interoperability can provide additional functionality and significantly empower local actors to accelerate local to global actions, while also complex system change.
{"title":"The role of sustainability knowledge-action platforms in advancing multi-stakeholder engagement on sustainability","authors":"Oliver Bream McIntosh, Amy Burnett, Ira Feldman, Jenna A. Lamphere, Thomas A. Reuter, Emmanuelle Vital","doi":"10.1017/dap.2023.27","DOIUrl":"https://doi.org/10.1017/dap.2023.27","url":null,"abstract":"Abstract Within the last decade, online sustainability knowledge-action platforms have proliferated. We surveyed 198 sustainability-oriented sites and conducted a review of 41 knowledge-action platforms, which we define as digital tools that advance sustainability through organized activities and knowledge dissemination. We analyzed platform structure and functionality through a systematic coding process based on key issues identified in three bodies of literature: (a) the emergence of digital platforms, (b) the localization of the sustainable development goals (SDGs), and (c) the importance of multi-level governance to sustainability action. While online collaborative tools offer an array of resources, our analysis indicates that they struggle to provide context-sensitivity and higher-level analysis of the trade-offs and synergies between sustainability actions. SDG localization adds another layer of complexity where multi-level governance, actor, and institutional priorities may generate tensions as well as opportunities for intra- and cross-sectoral alignment. On the basis of our analysis, we advocate for the development of integrative open-source and dynamic global online data management tools that would enable the monitoring of progress and facilitate peer-to-peer exchange of ideas and experience among local government, community, and business stakeholders. We argue that by showcasing and exemplifying local actions, an integrative platform that leverages existing content from multiple extant platforms through effective data interoperability can provide additional functionality and significantly empower local actors to accelerate local to global actions, while also complex system change.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":"231 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.
{"title":"From text to ties: Extraction of corruption network data from deferred prosecution agreements","authors":"T. Diviák, Nicholas Lord","doi":"10.1017/dap.2022.41","DOIUrl":"https://doi.org/10.1017/dap.2022.41","url":null,"abstract":"Abstract Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48710706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruben L. Bach, Christoph Kern, Hannah Mautner, Frauke Kreuter
Abstract Statistical profiling of job seekers is an attractive option to guide the activities of public employment services. Many hope that algorithms will improve both efficiency and effectiveness of employment services’ activities that are so far often based on human judgment. Against this backdrop, we evaluate regression and machine-learning models for predicting job-seekers’ risk of becoming long-term unemployed using German administrative labor market data. While our models achieve competitive predictive performance, we show that training an accurate prediction model is just one element in a series of design and modeling decisions, each having notable effects that span beyond predictive accuracy. We observe considerable variation in the cases flagged as high risk across models, highlighting the need for systematic evaluation and transparency of the full prediction pipeline if statistical profiling techniques are to be implemented by employment agencies.
{"title":"The impact of modeling decisions in statistical profiling","authors":"Ruben L. Bach, Christoph Kern, Hannah Mautner, Frauke Kreuter","doi":"10.1017/dap.2023.29","DOIUrl":"https://doi.org/10.1017/dap.2023.29","url":null,"abstract":"Abstract Statistical profiling of job seekers is an attractive option to guide the activities of public employment services. Many hope that algorithms will improve both efficiency and effectiveness of employment services’ activities that are so far often based on human judgment. Against this backdrop, we evaluate regression and machine-learning models for predicting job-seekers’ risk of becoming long-term unemployed using German administrative labor market data. While our models achieve competitive predictive performance, we show that training an accurate prediction model is just one element in a series of design and modeling decisions, each having notable effects that span beyond predictive accuracy. We observe considerable variation in the cases flagged as high risk across models, highlighting the need for systematic evaluation and transparency of the full prediction pipeline if statistical profiling techniques are to be implemented by employment agencies.","PeriodicalId":93427,"journal":{"name":"Data & policy","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135908480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}