Pub Date : 2023-04-24DOI: 10.1007/s10506-023-09359-6
Yifan Hou, Ge Cheng, Yun Zhang, Dongliang Zhang
Law article prediction is a task of predicting the relevant laws and regulations involved in a case according to the description text of the case, and it has broad application prospects in improving judicial efficiency. In the existing research work, researchers often only consider a single case, employing the neural network method to extract features for prediction, which lack the mining of related and common element information between different data. In order to solve this problem, we propose a law article prediction method that integrates the characteristics of common elements. It can effectively utilize the co-occurrence information of the training data, fully mine the relevant common elements between cases, and fuse local features. Experiments show that our method performs well.
{"title":"Methods of incorporating common element characteristics for law article prediction","authors":"Yifan Hou, Ge Cheng, Yun Zhang, Dongliang Zhang","doi":"10.1007/s10506-023-09359-6","DOIUrl":"10.1007/s10506-023-09359-6","url":null,"abstract":"<div><p>Law article prediction is a task of predicting the relevant laws and regulations involved in a case according to the description text of the case, and it has broad application prospects in improving judicial efficiency. In the existing research work, researchers often only consider a single case, employing the neural network method to extract features for prediction, which lack the mining of related and common element information between different data. In order to solve this problem, we propose a law article prediction method that integrates the characteristics of common elements. It can effectively utilize the co-occurrence information of the training data, fully mine the relevant common elements between cases, and fuse local features. Experiments show that our method performs well.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 2","pages":"487 - 503"},"PeriodicalIF":3.1,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47160130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-04DOI: 10.1007/s10506-023-09356-9
Bartosz Brożek, Michał Furman, Marek Jakubiec, Bartłomiej Kucharzyk
This paper addresses the black-box problem in artificial intelligence (AI), and the related problem of explainability of AI in the legal context. We argue, first, that the black box problem is, in fact, a superficial one as it results from an overlap of four different – albeit interconnected – issues: the opacity problem, the strangeness problem, the unpredictability problem, and the justification problem. Thus, we propose a framework for discussing both the black box problem and the explainability of AI. We argue further that contrary to often defended claims the opacity issue is not a genuine problem. We also dismiss the justification problem. Further, we describe the tensions involved in the strangeness and unpredictability problems and suggest some ways to alleviate them.
{"title":"The black box problem revisited. Real and imaginary challenges for automated legal decision making","authors":"Bartosz Brożek, Michał Furman, Marek Jakubiec, Bartłomiej Kucharzyk","doi":"10.1007/s10506-023-09356-9","DOIUrl":"10.1007/s10506-023-09356-9","url":null,"abstract":"<div><p>This paper addresses the black-box problem in artificial intelligence (AI), and the related problem of explainability of AI in the legal context. We argue, first, that the black box problem is, in fact, a superficial one as it results from an overlap of four different – albeit interconnected – issues: the opacity problem, the strangeness problem, the unpredictability problem, and the justification problem. Thus, we propose a framework for discussing both the black box problem and the explainability of AI. We argue further that contrary to often defended claims the opacity issue is not a genuine problem. We also dismiss the justification problem. Further, we describe the tensions involved in the strangeness and unpredictability problems and suggest some ways to alleviate them.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 2","pages":"427 - 440"},"PeriodicalIF":3.1,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-023-09356-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46797808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-30DOI: 10.1007/s10506-023-09353-y
Maxime C Cohen, Samuel Dahan, Warut Khern-Am-Nuai, Hajime Shimao, Jonathan Touboul
The use of artificial intelligence (AI) to aid legal decision making has become prominent. This paper investigates the use of AI in a critical issue in employment law, the determination of a worker's status-employee vs. independent contractor-in two common law countries (the U.S. and Canada). This legal question has been a contentious labor issue insofar as independent contractors are not eligible for the same benefits as employees. It has become an important societal issue due to the ubiquity of the gig economy and the recent disruptions in employment arrangements. To address this problem, we collected, annotated, and structured the data for all Canadian and Californian court cases related to this legal question between 2002 and 2021, resulting in 538 Canadian cases and 217 U.S. cases. In contrast to legal literature focusing on complex and correlated characteristics of the employment relationship, our statistical analyses of the data show very strong correlations between the worker's status and a small subset of quantifiable characteristics of the employment relationship. In fact, despite the variety of situations in the case law, we show that simple, off-the-shelf AI models classify the cases with an out-of-sample accuracy of more than 90%. Interestingly, the analysis of misclassified cases reveals consistent misclassification patterns by most algorithms. Legal analyses of these cases led us to identify how equity is ensured by judges in ambiguous situations. Finally, our findings have practical implications for access to legal advice and justice. We deployed our AI model via the open-access platform, https://MyOpenCourt.org/, to help users answer employment legal questions. This platform has already assisted many Canadian users, and we hope it will help democratize access to legal advice to large crowds.
{"title":"The use of AI in legal systems: determining independent contractor vs. employee status.","authors":"Maxime C Cohen, Samuel Dahan, Warut Khern-Am-Nuai, Hajime Shimao, Jonathan Touboul","doi":"10.1007/s10506-023-09353-y","DOIUrl":"10.1007/s10506-023-09353-y","url":null,"abstract":"<p><p>The use of artificial intelligence (AI) to aid legal decision making has become prominent. This paper investigates the use of AI in a critical issue in employment law, the determination of a worker's status-employee vs. independent contractor-in two common law countries (the U.S. and Canada). This legal question has been a contentious labor issue insofar as independent contractors are not eligible for the same benefits as employees. It has become an important societal issue due to the ubiquity of the gig economy and the recent disruptions in employment arrangements. To address this problem, we collected, annotated, and structured the data for all Canadian and Californian court cases related to this legal question between 2002 and 2021, resulting in 538 Canadian cases and 217 U.S. cases. In contrast to legal literature focusing on complex and correlated characteristics of the employment relationship, our statistical analyses of the data show very strong correlations between the worker's status and a small subset of quantifiable characteristics of the employment relationship. In fact, despite the variety of situations in the case law, we show that simple, off-the-shelf AI models classify the cases with an out-of-sample accuracy of more than 90%. Interestingly, the analysis of misclassified cases reveals consistent misclassification patterns by most algorithms. Legal analyses of these cases led us to identify how equity is ensured by judges in ambiguous situations. Finally, our findings have practical implications for access to legal advice and justice. We deployed our AI model via the open-access platform, https://MyOpenCourt.org/, to help users answer employment legal questions. This platform has already assisted many Canadian users, and we hope it will help democratize access to legal advice to large crowds.</p>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":" ","pages":"1-30"},"PeriodicalIF":3.1,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9742579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-16DOI: 10.1007/s10506-023-09351-0
Meghdad Ghari
We combine linear temporal logic (with both past and future modalities) with a deontic version of justification logic to provide a framework for reasoning about time and epistemic and normative reasons. In addition to temporal modalities, the resulting logic contains two kinds of justification assertions: epistemic justification assertions and deontic justification assertions. The former presents justification for the agent’s knowledge and the latter gives reasons for why a proposition is obligatory. We present two kinds of semantics for the logic: one based on Fitting models and the other based on neighborhood models. The use of neighborhood semantics enables us to define the dual of deontic justification assertions properly, which corresponds to a notion of permission in deontic logic. We then establish the soundness and completeness of an axiom system of the logic with respect to these semantics. Further, we formalize the Protagoras versus Euathlus paradox in this logic and present a precise analysis of the paradox, and also briefly discuss Leibniz’s solution.
{"title":"A formalization of the Protagoras court paradox in a temporal logic of epistemic and normative reasons","authors":"Meghdad Ghari","doi":"10.1007/s10506-023-09351-0","DOIUrl":"10.1007/s10506-023-09351-0","url":null,"abstract":"<div><p>We combine linear temporal logic (with both past and future modalities) with a deontic version of justification logic to provide a framework for reasoning about time and epistemic and normative reasons. In addition to temporal modalities, the resulting logic contains two kinds of justification assertions: epistemic justification assertions and deontic justification assertions. The former presents justification for the agent’s knowledge and the latter gives reasons for why a proposition is obligatory. We present two kinds of semantics for the logic: one based on Fitting models and the other based on neighborhood models. The use of neighborhood semantics enables us to define the dual of deontic justification assertions properly, which corresponds to a notion of permission in deontic logic. We then establish the soundness and completeness of an axiom system of the logic with respect to these semantics. Further, we formalize the Protagoras versus Euathlus paradox in this logic and present a precise analysis of the paradox, and also briefly discuss Leibniz’s solution.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 2","pages":"325 - 367"},"PeriodicalIF":3.1,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44602578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-15DOI: 10.1007/s10506-023-09352-z
Fábio M. Oliveira, Marcelo S. Balbino, Luis E. Zarate, Fawn Ngo, Ramakrishna Govindu, Anurag Agarwal, Cristiane N. Nobre
Internal misconduct is a universal problem in prisons and affects the maintenance of social order. Consequently, correctional institutions often develop rehabilitation programs to reduce the likelihood of inmates committing internal offenses and criminal recidivism after release. Therefore, it is necessary to identify the profile of each offender, both for the appropriate indication of a rehabilitation program and the level of internal security to which he must be submitted. In this context, this work aims to discover the most significant characteristics in predicting inmate misconduct from ML methods and the SHAP approach. A database produced in 2004 through the Survey of Inmates in State and Federal Correctional Facilities in the United States of America was used, which provides nationally representative data on prisoners from state and federal facilities. The predictive model based on Random Forest performed the best, thus, we applied the SHAP to it. Overall, the results showed that features related to victimization, type of crime committed, age and age at first arrest, history of association with criminal groups, education, and drug and alcohol use are most relevant in predicting internal misconduct. Thus, it is expected to contribute to the prior classification of an inmate on time, to use programs and practices that aim to improve the lives of offenders, their reintegration into society, and consequently, the reduction of criminal recidivism.
{"title":"Predicting inmates misconduct using the SHAP approach","authors":"Fábio M. Oliveira, Marcelo S. Balbino, Luis E. Zarate, Fawn Ngo, Ramakrishna Govindu, Anurag Agarwal, Cristiane N. Nobre","doi":"10.1007/s10506-023-09352-z","DOIUrl":"10.1007/s10506-023-09352-z","url":null,"abstract":"<div><p>Internal misconduct is a universal problem in prisons and affects the maintenance of social order. Consequently, correctional institutions often develop rehabilitation programs to reduce the likelihood of inmates committing internal offenses and criminal recidivism after release. Therefore, it is necessary to identify the profile of each offender, both for the appropriate indication of a rehabilitation program and the level of internal security to which he must be submitted. In this context, this work aims to discover the most significant characteristics in predicting inmate misconduct from ML methods and the SHAP approach. A database produced in 2004 through the Survey of Inmates in State and Federal Correctional Facilities in the United States of America was used, which provides nationally representative data on prisoners from state and federal facilities. The predictive model based on Random Forest performed the best, thus, we applied the SHAP to it. Overall, the results showed that features related to victimization, type of crime committed, age and age at first arrest, history of association with criminal groups, education, and drug and alcohol use are most relevant in predicting internal misconduct. Thus, it is expected to contribute to the prior classification of an inmate on time, to use programs and practices that aim to improve the lives of offenders, their reintegration into society, and consequently, the reduction of criminal recidivism.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 2","pages":"369 - 395"},"PeriodicalIF":3.1,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43644754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-14DOI: 10.1007/s10506-023-09354-x
Junlin Zhu, Jiaye Wu, Xudong Luo, Jie Liu
Recently, the pandemic caused by COVID-19 is severe in the entire world. The prevention and control of crimes associated with COVID-19 are critical for controlling the pandemic. Therefore, to provide efficient and convenient intelligent legal knowledge services during the pandemic, we develop an intelligent system for legal information retrieval on the WeChat platform in this paper. The data source we used for training our system is “The typical cases of national procuratorial authorities handling crimes against the prevention and control of the new coronary pneumonia pandemic following the law”, which is published online by the Supreme People’s Procuratorate of the People’s Republic of China. We base our system on convolutional neural network and use the semantic matching mechanism to capture inter-sentence relationship information and make a prediction. Moreover, we introduce an auxiliary learning process to help the network better distinguish the relation between two sentences. Finally, the system uses the trained model to identify the information entered by a user and responds to the user with a reference case similar to the query case and gives the reference legal gist applicable to the query case.
{"title":"Semantic matching based legal information retrieval system for COVID-19 pandemic","authors":"Junlin Zhu, Jiaye Wu, Xudong Luo, Jie Liu","doi":"10.1007/s10506-023-09354-x","DOIUrl":"10.1007/s10506-023-09354-x","url":null,"abstract":"<div><p>Recently, the pandemic caused by COVID-19 is severe in the entire world. The prevention and control of crimes associated with COVID-19 are critical for controlling the pandemic. Therefore, to provide efficient and convenient intelligent legal knowledge services during the pandemic, we develop an intelligent system for legal information retrieval on the WeChat platform in this paper. The data source we used for training our system is “The typical cases of national procuratorial authorities handling crimes against the prevention and control of the new coronary pneumonia pandemic following the law”, which is published online by the Supreme People’s Procuratorate of the People’s Republic of China. We base our system on convolutional neural network and use the semantic matching mechanism to capture inter-sentence relationship information and make a prediction. Moreover, we introduce an auxiliary learning process to help the network better distinguish the relation between two sentences. Finally, the system uses the trained model to identify the information entered by a user and responds to the user with a reference case similar to the query case and gives the reference legal gist applicable to the query case.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 2","pages":"397 - 426"},"PeriodicalIF":3.1,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10074769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-04DOI: 10.1007/s10506-023-09349-8
Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh
Summarization of legal case judgement documents is a practical and challenging problem, for which many summarization algorithms of different varieties have been tried. In this work, rather than developing yet another summarization algorithm, we investigate if intelligently ensembling (combining) the outputs of multiple (base) summarization algorithms can lead to better summaries of legal case judgements than any of the base algorithms. Using two datasets of case judgement documents from the Indian Supreme Court, one with extractive gold standard summaries and the other with abstractive gold standard summaries, we apply various ensembling techniques on summaries generated by a wide variety of summarization algorithms. The ensembling methods applied range from simple voting-based methods to ranking-based and graph-based ensembling methods. We show that many of our ensembling methods yield summaries that are better than the summaries produced by any of the individual base algorithms, in terms of ROUGE and METEOR scores.
{"title":"Ensemble methods for improving extractive summarization of legal case judgements","authors":"Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh","doi":"10.1007/s10506-023-09349-8","DOIUrl":"10.1007/s10506-023-09349-8","url":null,"abstract":"<div><p>Summarization of legal case judgement documents is a practical and challenging problem, for which many summarization algorithms of different varieties have been tried. In this work, rather than developing yet another summarization algorithm, we investigate if intelligently ensembling (combining) the outputs of multiple (base) summarization algorithms can lead to better summaries of legal case judgements than any of the base algorithms. Using two datasets of case judgement documents from the Indian Supreme Court, one with extractive gold standard summaries and the other with abstractive gold standard summaries, we apply various ensembling techniques on summaries generated by a wide variety of summarization algorithms. The ensembling methods applied range from simple voting-based methods to ranking-based and graph-based ensembling methods. We show that many of our ensembling methods yield summaries that are better than the summaries produced by any of the individual base algorithms, in terms of ROUGE and METEOR scores.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 1","pages":"231 - 289"},"PeriodicalIF":3.1,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46658483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-22DOI: 10.1007/s10506-023-09347-w
Athina Sachoulidou
{"title":"Going beyond the “common suspects”: to be presumed innocent in the era of algorithms, big data and artificial intelligence","authors":"Athina Sachoulidou","doi":"10.1007/s10506-023-09347-w","DOIUrl":"https://doi.org/10.1007/s10506-023-09347-w","url":null,"abstract":"","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48818700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-18DOI: 10.1007/s10506-023-09348-9
Hugo Mentzingen, Nuno Antonio, Victor Lobo
Decisions of regulatory government bodies and courts affect many aspects of citizens’ lives. These organizations and courts are expected to provide timely and coherent decisions, although they struggle to keep up with the increasing demand. The ability of machine learning (ML) models to predict such decisions based on past cases under similar circumstances was assessed in some recent works. The dominant conclusion is that the prediction goal is achievable with high accuracy. Nevertheless, most of those works do not consider important aspects for ML models that can impact performance and affect real-world usefulness, such as consistency, out-of-sample applicability, generality, and explainability preservation. To our knowledge, none considered all those aspects, and no previous study addressed the joint use of metadata and text-extracted variables to predict administrative decisions. We propose a predictive model that addresses the abovementioned concerns based on a two-stage cascade classifier. The model employs a first-stage prediction based on textual features extracted from the original documents and a second-stage classifier that includes proceedings’ metadata. The study was conducted using time-based cross-validation, built on data available before the predicted judgment. It provides predictions as soon as the decision date is scheduled and only considers the first document in each proceeding, along with the metadata recorded when the infringement is first registered. Finally, the proposed model provides local explainability by preserving visibility on the textual features and employing the SHapley Additive exPlanations (SHAP). Our findings suggest that this cascade approach surpasses the standalone stages and achieves relatively high Precision and Recall when both text and metadata are available while preserving real-world usefulness. With a weighted F1 score of 0.900, the results outperform the text-only baseline by 1.24% and the metadata-only baseline by 5.63%, with better discriminative properties evaluated by the receiver operating characteristic and precision-recall curves.
政府监管机构和法院的决定影响着公民生活的许多方面。人们期望这些机构和法院及时做出一致的决定,但它们却难以满足日益增长的需求。最近的一些著作评估了机器学习(ML)模型在类似情况下根据以往案例预测此类决定的能力。主要结论是,预测目标是可以实现的,而且准确率很高。然而,这些研究大多没有考虑到 ML 模型的一些重要方面,如一致性、样本外适用性、通用性和可解释性保护等,这些方面可能会影响模型的性能并影响其在现实世界中的实用性。据我们所知,没有一项研究考虑到了所有这些方面,而且以前也没有研究探讨过如何联合使用元数据和文本提取变量来预测行政决策。我们提出了一个基于两级级联分类器的预测模型来解决上述问题。该模型的第一阶段预测基于从原始文件中提取的文本特征,第二阶段分类器则包括诉讼程序的元数据。研究采用基于时间的交叉验证,建立在预测判决之前的可用数据上。该模型在判决日期确定后立即提供预测,并且只考虑每个诉讼程序中的第一份文件以及侵权首次登记时记录的元数据。最后,所提议的模型通过保留文本特征的可见性和使用 SHapley Additive exPlanations(SHAP)提供了局部可解释性。我们的研究结果表明,当文本和元数据都可用时,这种级联方法超越了独立阶段,并实现了相对较高的精确度和召回率,同时保留了现实世界中的实用性。加权 F1 得分为 0.900,结果比纯文本基线高出 1.24%,比纯元数据基线高出 5.63%,并通过接收者操作特征和精确率-召回曲线评估了更好的判别特性。
{"title":"Joining metadata and textual features to advise administrative courts decisions: a cascading classifier approach","authors":"Hugo Mentzingen, Nuno Antonio, Victor Lobo","doi":"10.1007/s10506-023-09348-9","DOIUrl":"10.1007/s10506-023-09348-9","url":null,"abstract":"<div><p>Decisions of regulatory government bodies and courts affect many aspects of citizens’ lives. These organizations and courts are expected to provide timely and coherent decisions, although they struggle to keep up with the increasing demand. The ability of machine learning (ML) models to predict such decisions based on past cases under similar circumstances was assessed in some recent works. The dominant conclusion is that the prediction goal is achievable with high accuracy. Nevertheless, most of those works do not consider important aspects for ML models that can impact performance and affect real-world usefulness, such as consistency, out-of-sample applicability, generality, and explainability preservation. To our knowledge, none considered all those aspects, and no previous study addressed the joint use of metadata and text-extracted variables to predict administrative decisions. We propose a predictive model that addresses the abovementioned concerns based on a two-stage cascade classifier. The model employs a first-stage prediction based on textual features extracted from the original documents and a second-stage classifier that includes proceedings’ metadata. The study was conducted using time-based cross-validation, built on data available before the predicted judgment. It provides predictions as soon as the decision date is scheduled and only considers the first document in each proceeding, along with the metadata recorded when the infringement is first registered. Finally, the proposed model provides local explainability by preserving visibility on the textual features and employing the SHapley Additive exPlanations (SHAP). Our findings suggest that this cascade approach surpasses the standalone stages and achieves relatively high Precision and Recall when both text and metadata are available while preserving real-world usefulness. With a weighted F1 score of 0.900, the results outperform the text-only baseline by 1.24% and the metadata-only baseline by 5.63%, with better discriminative properties evaluated by the receiver operating characteristic and precision-recall curves.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 1","pages":"201 - 230"},"PeriodicalIF":3.1,"publicationDate":"2023-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-023-09348-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45108787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}