Prediction of Recidivism and Detection of Risk Factors Under Different Time Windows Using Machine Learning Techniques

Social Science Computer Review Pub Date : 2024-01-12 DOI:10.1177/08944393241226607

Di Mu, Simai Zhang, Ting Zhu, Yong Zhou, Wei Zhang

{"title":"Prediction of Recidivism and Detection of Risk Factors Under Different Time Windows Using Machine Learning Techniques","authors":"Di Mu, Simai Zhang, Ting Zhu, Yong Zhou, Wei Zhang","doi":"10.1177/08944393241226607","DOIUrl":null,"url":null,"abstract":"Following a comprehensive analysis of the initial three generations of prisoner risk assessment tools, the field has observed a notable prominence in the integration of fourth-generation tools and machine learning techniques. However, limited efforts have been made to address the explainability of data-driven prediction models and their connection with treatment recommendations. Our primary objective was to develop predictive models for assessing the likelihood of recidivism among prisoners released from their index incarceration within 1-year, 2-year, and 5-year timeframes. We aimed to enhance interpretability using SHapley Additive exPlanations (SHAP). We collected data from 20,457 in-prison records from February 10, 2005, to August 25, 2021, sourced from a Southwestern China prison’s data management system. Recidivism records were officially determined through data mining from an official website and combined identification data from neighboring prisons. We employed five machine learning algorithms, considering sociodemographic, physical health, psychological assessments, criminological characteristics, crime history, social support, and in-prison behaviors as factors. For interpretability, SHAP was applied to reveal feature contributions. Findings indicated that young prisoners accused of larceny, previous convictions, lower fines, and limited family support faced higher reoffending risk. Conversely, middle-aged and senior prisoners with no prior convictions, lower monthly supermarket expenses, and positive psychological test results had lower reoffending risk. We also explored interactions between significant predictive features, such as prisoner age at incarceration initiation and primary accusation, and the duration of current incarceration and cumulative prior incarcerations. Notably, our models consistently exhibited high performance, as shown by AUC on the test dataset across time windows. Interpretability results provided insights into evolving risk factors over time, valuable for intervention with high-risk individuals. These insights, with additional validation, could offer dynamic prisoner information for stakeholders. Moreover, interpretability results can be seamlessly integrated into prison and court management systems as a valuable risk assessment tool.","PeriodicalId":506768,"journal":{"name":"Social Science Computer Review","volume":"13 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/08944393241226607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Following a comprehensive analysis of the initial three generations of prisoner risk assessment tools, the field has observed a notable prominence in the integration of fourth-generation tools and machine learning techniques. However, limited efforts have been made to address the explainability of data-driven prediction models and their connection with treatment recommendations. Our primary objective was to develop predictive models for assessing the likelihood of recidivism among prisoners released from their index incarceration within 1-year, 2-year, and 5-year timeframes. We aimed to enhance interpretability using SHapley Additive exPlanations (SHAP). We collected data from 20,457 in-prison records from February 10, 2005, to August 25, 2021, sourced from a Southwestern China prison’s data management system. Recidivism records were officially determined through data mining from an official website and combined identification data from neighboring prisons. We employed five machine learning algorithms, considering sociodemographic, physical health, psychological assessments, criminological characteristics, crime history, social support, and in-prison behaviors as factors. For interpretability, SHAP was applied to reveal feature contributions. Findings indicated that young prisoners accused of larceny, previous convictions, lower fines, and limited family support faced higher reoffending risk. Conversely, middle-aged and senior prisoners with no prior convictions, lower monthly supermarket expenses, and positive psychological test results had lower reoffending risk. We also explored interactions between significant predictive features, such as prisoner age at incarceration initiation and primary accusation, and the duration of current incarceration and cumulative prior incarcerations. Notably, our models consistently exhibited high performance, as shown by AUC on the test dataset across time windows. Interpretability results provided insights into evolving risk factors over time, valuable for intervention with high-risk individuals. These insights, with additional validation, could offer dynamic prisoner information for stakeholders. Moreover, interpretability results can be seamlessly integrated into prison and court management systems as a valuable risk assessment tool.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用机器学习技术预测累犯率并检测不同时间窗口下的风险因素

在对最初三代囚犯风险评估工具进行全面分析之后，该领域观察到第四代工具与机器学习技术的整合明显突出。然而，在解决数据驱动的预测模型的可解释性及其与治疗建议的联系方面所做的努力还很有限。我们的主要目标是开发预测模型，用于评估从指数监禁中释放的囚犯在 1 年、2 年和 5 年时间框架内重新犯罪的可能性。我们的目标是使用 SHapley Additive exPlanations (SHAP) 增强可解释性。我们从中国西南某监狱的数据管理系统中收集了 2005 年 2 月 10 日至 2021 年 8 月 25 日期间的 20457 条在狱记录。累犯记录是通过对官方网站的数据挖掘，并结合邻近监狱的身份识别数据正式确定的。我们采用了五种机器学习算法，将社会人口、身体健康、心理评估、犯罪学特征、犯罪史、社会支持和狱中行为作为考虑因素。为了便于解释，还采用了 SHAP 来揭示特征贡献。研究结果表明，被控盗窃、有前科、罚金较低和家庭支持有限的年轻囚犯面临较高的再犯罪风险。相反，没有前科、每月超市支出较少以及心理测试结果呈阳性的中年和老年囚犯的再犯罪风险较低。我们还探索了重要预测特征之间的交互作用，如囚犯入狱时的年龄和主要指控，以及当前监禁时间和累积前科。值得注意的是，从跨时间窗口测试数据集的 AUC 来看，我们的模型始终表现出很高的性能。可解释性结果提供了对随时间演变的风险因素的见解，这对干预高风险人群非常有价值。这些见解经过进一步验证后，可为利益相关者提供动态的囚犯信息。此外，可解释性结果还可以无缝集成到监狱和法院管理系统中，成为一种有价值的风险评估工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Social Science Computer Review

自引率

0.00%

发文量