首页 > 最新文献

Artificial Intelligence and Law最新文献

英文 中文
Unfair clause detection in terms of service across multiple languages 多语言服务条款中的不公平条款检测
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-03 DOI: 10.1007/s10506-024-09398-7
Andrea Galassi, Francesca Lagioia, Agnieszka Jabłonowska, Marco Lippi

Most of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.

大多数现有的法律文本自然语言处理系统都是针对英语开发的。然而,在一些应用程序领域中,以不同的语言提供相同文档的多个版本,在欧盟内部尤其如此。服务条款(ToS)就是一个明显的例子。在本文中,我们比较了不同的方法来检测多语言服务中潜在的不公平条款。特别是,在为英语开发了带注释的语料库和机器学习分类器之后,我们考虑并比较了几种将系统扩展到其他语言的策略:从头开始为每种语言构建新的语料库和训练新的机器学习系统;在不同语言的文档之间投射注释,以避免创建新的语料库;翻译培训文档,保留原始注释;在预测时翻译查询,并且只依赖于英语系统。在大型原始数据集上进行的扩展实验评估表明,通常可以避免为每种语言重新构建新的带注释的语料库的耗时任务,并且在性能方面没有明显的下降。
{"title":"Unfair clause detection in terms of service across multiple languages","authors":"Andrea Galassi,&nbsp;Francesca Lagioia,&nbsp;Agnieszka Jabłonowska,&nbsp;Marco Lippi","doi":"10.1007/s10506-024-09398-7","DOIUrl":"10.1007/s10506-024-09398-7","url":null,"abstract":"<div><p>Most of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.\u0000</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 3","pages":"641 - 689"},"PeriodicalIF":3.1,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09398-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Code is law: how COMPAS affects the way the judiciary handles the risk of recidivism 更正:法典即法律:COMPAS 如何影响司法机构处理累犯风险的方式
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-02 DOI: 10.1007/s10506-024-09400-2
Christopher Engel, Lorenz Linhardt, Marcel Schubert
{"title":"Correction to: Code is law: how COMPAS affects the way the judiciary handles the risk of recidivism","authors":"Christopher Engel,&nbsp;Lorenz Linhardt,&nbsp;Marcel Schubert","doi":"10.1007/s10506-024-09400-2","DOIUrl":"10.1007/s10506-024-09400-2","url":null,"abstract":"","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 3","pages":"873 - 874"},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09400-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140751139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-evaluating GPT-4’s bar exam performance 重新评估 GPT-4 的律师资格考试成绩
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-30 DOI: 10.1007/s10506-024-09396-9
Eric Martínez

Perhaps the most widely touted of GPT-4’s at-launch, zero-shot capabilities has been its reported 90th-percentile performance on the Uniform Bar Exam. This paper begins by investigating the methodological challenges in documenting and verifying the 90th-percentile claim, presenting four sets of findings that indicate that OpenAI’s estimates of GPT-4’s UBE percentile are overinflated. First, although GPT-4’s UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population. Second, data from a recent July administration of the same exam suggests GPT-4’s overall UBE percentile was below the 69th percentile, and (sim)48th percentile on essays. Third, examining official NCBE data and using several conservative statistical assumptions, GPT-4’s performance against first-time test takers is estimated to be (sim)62nd percentile, including (sim)42nd percentile on essays. Fourth, when examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4’s performance is estimated to drop to (sim)48th percentile overall, and (sim)15th percentile on essays. In addition to investigating the validity of the percentile claim, the paper also investigates the validity of GPT-4’s reported scaled UBE score of 298. The paper successfully replicates the MBE score, but highlights several methodological issues in the grading of the MPT + MEE components of the exam, which call into question the validity of the reported essay score. Finally, the paper investigates the effect of different hyperparameter combinations on GPT-4’s MBE performance, finding no significant effect of adjusting temperature settings, and a significant effect of few-shot chain-of-thought prompting over basic zero-shot prompting. Taken together, these findings carry timely insights for the desirability and feasibility of outsourcing legally relevant tasks to AI models, as well as for the importance for AI developers to implement rigorous and transparent capabilities evaluations to help secure safe and trustworthy AI.

也许GPT-4在发射时最被广泛吹捧的零射击能力是其在统一律师考试中的90百分位表现。本文首先调查了记录和验证第90百分位声明的方法挑战,提出了四组发现,表明OpenAI对GPT-4的UBE百分位的估计过高。首先,虽然GPT-4的UBE分数接近于2月份伊利诺伊州律师考试的近似转换,但这些估计严重偏向于7月份未通过考试的重复考生,他们的分数明显低于一般参加考试的人群。其次,同一考试最近7月份的一次管理数据显示,GPT-4的整体UBE百分位数低于69百分位数,而论文则低于(sim)第48百分位数。第三,通过检查NCBE官方数据并使用几个保守的统计假设,GPT-4相对于首次参加考试的人的表现估计为(sim)第62个百分位,其中包括(sim)第42个百分位的论文。第四,如果只考察那些通过考试的人(即有执照或正在申请执照的律师),GPT-4的总体成绩估计会下降到(sim)第48百分位,论文成绩则下降到(sim)第15百分位。除了调查百分位声明的有效性外,本文还调查了GPT-4报告的量表UBE得分298的有效性。本文成功地复制了MBE分数,但强调了考试中MPT + MEE部分评分的几个方法问题,这些问题对报告的论文分数的有效性提出了质疑。最后,本文研究了不同超参数组合对GPT-4的MBE性能的影响,发现调节温度设置对GPT-4的MBE性能没有显著影响,并且少弹链提示比基本零弹提示效果显著。综上所述,这些发现为将合法相关任务外包给人工智能模型的可取性和可行性提供了及时的见解,同时也为人工智能开发人员实施严格和透明的能力评估以帮助确保安全和值得信赖的人工智能的重要性提供了见解。
{"title":"Re-evaluating GPT-4’s bar exam performance","authors":"Eric Martínez","doi":"10.1007/s10506-024-09396-9","DOIUrl":"10.1007/s10506-024-09396-9","url":null,"abstract":"<div><p>Perhaps the most widely touted of GPT-4’s at-launch, zero-shot capabilities has been its reported 90th-percentile performance on the Uniform Bar Exam. This paper begins by investigating the methodological challenges in documenting and verifying the 90th-percentile claim, presenting four sets of findings that indicate that OpenAI’s estimates of GPT-4’s UBE percentile are overinflated. First, although GPT-4’s UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population. Second, data from a recent July administration of the same exam suggests GPT-4’s overall UBE percentile was below the 69th percentile, and <span>(sim)</span>48th percentile on essays. Third, examining official NCBE data and using several conservative statistical assumptions, GPT-4’s performance against first-time test takers is estimated to be <span>(sim)</span>62nd percentile, including <span>(sim)</span>42nd percentile on essays. Fourth, when examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4’s performance is estimated to drop to <span>(sim)</span>48th percentile overall, and <span>(sim)</span>15th percentile on essays. In addition to investigating the validity of the percentile claim, the paper also investigates the validity of GPT-4’s reported scaled UBE score of 298. The paper successfully replicates the MBE score, but highlights several methodological issues in the grading of the MPT + MEE components of the exam, which call into question the validity of the reported essay score. Finally, the paper investigates the effect of different hyperparameter combinations on GPT-4’s MBE performance, finding no significant effect of adjusting temperature settings, and a significant effect of few-shot chain-of-thought prompting over basic zero-shot prompting. Taken together, these findings carry timely insights for the desirability and feasibility of outsourcing legally relevant tasks to AI models, as well as for the importance for AI developers to implement rigorous and transparent capabilities evaluations to help secure safe and trustworthy AI.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 3","pages":"581 - 604"},"PeriodicalIF":3.1,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09396-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140362183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting court judgment prediction and explanation using legal entities 利用法律实体提高法院判决的预测和解释能力
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1007/s10506-024-09397-8
Irene Benedetto, Alkis Koudounas, Lorenzo Vaiani, Eliana Pastor, Luca Cagliero, Francesco Tarasconi, Elena Baralis

The automatic prediction of court case judgments using Deep Learning and Natural Language Processing is challenged by the variety of norms and regulations, the inherent complexity of the forensic language, and the length of legal judgments. Although state-of-the-art transformer-based architectures and Large Language Models (LLMs) are pre-trained on large-scale datasets, the underlying model reasoning is not transparent to the legal expert. This paper jointly addresses court judgment prediction and explanation by not only predicting the judgment but also providing legal experts with sentence-based explanations. To boost the performance of both tasks we leverage a legal named entity recognition step, which automatically annotates documents with meaningful domain-specific entity tags and masks the corresponding fine-grained descriptions. In such a way, transformer-based architectures and Large Language Models can attend to in-domain entity-related information in the inference process while neglecting irrelevant details. Furthermore, the explainer can boost the relevance of entity-enriched sentences while limiting the diffusion of potentially sensitive information. We also explore the use of in-context learning and lightweight fine-tuning to tailor LLMs to the legal language style and the downstream prediction and explanation tasks. The results obtained on a benchmark dataset from the Indian judicial system show the superior performance of entity-aware approaches to both judgment prediction and explanation.

使用深度学习和自然语言处理对法院案件判决的自动预测受到各种规范和法规、法庭语言的固有复杂性和法律判决长度的挑战。尽管最先进的基于变压器的架构和大型语言模型(llm)在大规模数据集上进行了预训练,但底层模型推理对法律专家来说并不透明。本文将法院判决预测与解释结合起来,既预测判决,又为法律专家提供基于句子的解释。为了提高这两个任务的性能,我们利用了一个合法的命名实体识别步骤,该步骤自动使用有意义的特定于领域的实体标记对文档进行注释,并屏蔽相应的细粒度描述。通过这种方式,基于转换器的体系结构和大型语言模型可以在推理过程中关注与领域内实体相关的信息,而忽略无关的细节。此外,解释器可以提高实体丰富句子的相关性,同时限制潜在敏感信息的扩散。我们还探讨了使用上下文学习和轻量级微调来定制法学硕士,以适应法律语言风格和下游预测和解释任务。在印度司法系统的基准数据集上获得的结果表明,实体感知方法在判决预测和解释方面都表现优异。
{"title":"Boosting court judgment prediction and explanation using legal entities","authors":"Irene Benedetto,&nbsp;Alkis Koudounas,&nbsp;Lorenzo Vaiani,&nbsp;Eliana Pastor,&nbsp;Luca Cagliero,&nbsp;Francesco Tarasconi,&nbsp;Elena Baralis","doi":"10.1007/s10506-024-09397-8","DOIUrl":"10.1007/s10506-024-09397-8","url":null,"abstract":"<div><p>The automatic prediction of court case judgments using Deep Learning and Natural Language Processing is challenged by the variety of norms and regulations, the inherent complexity of the forensic language, and the length of legal judgments. Although state-of-the-art transformer-based architectures and Large Language Models (LLMs) are pre-trained on large-scale datasets, the underlying model reasoning is not transparent to the legal expert. This paper jointly addresses court judgment prediction and explanation by not only predicting the judgment but also providing legal experts with sentence-based explanations. To boost the performance of both tasks we leverage a legal named entity recognition step, which automatically annotates documents with meaningful domain-specific entity tags and masks the corresponding fine-grained descriptions. In such a way, transformer-based architectures and Large Language Models can attend to in-domain entity-related information in the inference process while neglecting irrelevant details. Furthermore, the explainer can boost the relevance of entity-enriched sentences while limiting the diffusion of potentially sensitive information. We also explore the use of in-context learning and lightweight fine-tuning to tailor LLMs to the legal language style and the downstream prediction and explanation tasks. The results obtained on a benchmark dataset from the Indian judicial system show the superior performance of entity-aware approaches to both judgment prediction and explanation.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 3","pages":"605 - 640"},"PeriodicalIF":3.1,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140232862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative user study of human predictions in algorithm-supported recidivism risk assessment 算法支持的累犯风险评估中人类预测的比较用户研究
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-15 DOI: 10.1007/s10506-024-09393-y
Manuel Portela, Carlos Castillo, Songül Tolan, Marzieh Karimi-Haghighi, Antonio Andres Pueyo

In this paper, we study the effects of using an algorithm-based risk assessment instrument (RAI) to support the prediction of risk of violent recidivism upon release. The instrument we used is a machine learning version of RiskCanvi used by the Justice Department of Catalonia, Spain. It was hypothesized that people can improve their performance on defining the risk of recidivism when assisted with a RAI. Also, that professionals can perform better than non-experts on the domain. Participants had to predict whether a person who has been released from prison will commit a new crime leading to re-incarceration, within the next two years. This user study is done with (1) general participants from diverse backgrounds recruited through a crowdsourcing platform, (2) targeted participants who are students and practitioners of data science, criminology, or social work and professionals who work with RisCanvi. We also run focus groups with participants of the targeted study, including people who use RisCanvi in a professional capacity, to interpret the quantitative results. Among other findings, we observe that algorithmic support systematically leads to more accurate predictions from all participants, but that statistically significant gains are only seen in the performance of targeted participants with respect to that of crowdsourced participants. Among other comments, professional participants indicate that they would not foresee using a fully-automated system in criminal risk assessment, but do consider it valuable for training, standardization, and to fine-tune or double-check their predictions on particularly difficult cases. We found that the revised prediction by using a RAI increases the performance of all groups, while professionals show a better performance in general. And, a RAI can be considered for extending professional capacities and skills along their careers.

本文研究了基于算法的风险评估工具(RAI)在刑满释放后暴力再犯风险预测中的应用效果。我们使用的工具是西班牙加泰罗尼亚司法部使用的RiskCanvi的机器学习版本。假设在RAI的帮助下,人们可以提高他们在定义再犯风险方面的表现。此外,专业人士在该领域的表现可能比非专业人士更好。参与者必须预测一个从监狱释放的人是否会在未来两年内犯下新的罪行,导致再次入狱。这项用户研究是由(1)通过众包平台招募的来自不同背景的普通参与者完成的,(2)目标参与者是数据科学、犯罪学或社会工作的学生和从业者,以及与RisCanvi合作的专业人士。我们还与目标研究的参与者(包括以专业身份使用RisCanvi的人)进行焦点小组讨论,以解释定量结果。在其他发现中,我们观察到算法支持系统地导致所有参与者的预测更准确,但统计上显著的收益仅体现在目标参与者相对于众包参与者的表现上。在其他评论中,专业参与者表示,他们不会预见在犯罪风险评估中使用全自动系统,但确实认为它对培训、标准化以及对特别困难案件的预测进行微调或复核是有价值的。我们发现,使用RAI修正后的预测提高了所有组的表现,而专业人员总体上表现更好。并且,RAI可以考虑在他们的职业生涯中扩展专业能力和技能。
{"title":"A comparative user study of human predictions in algorithm-supported recidivism risk assessment","authors":"Manuel Portela,&nbsp;Carlos Castillo,&nbsp;Songül Tolan,&nbsp;Marzieh Karimi-Haghighi,&nbsp;Antonio Andres Pueyo","doi":"10.1007/s10506-024-09393-y","DOIUrl":"10.1007/s10506-024-09393-y","url":null,"abstract":"<div><p>In this paper, we study the effects of using an algorithm-based risk assessment instrument (RAI) to support the prediction of risk of violent recidivism upon release. The instrument we used is a machine learning version of RiskCanvi used by the Justice Department of <i>Catalonia, Spain</i>. It was hypothesized that people can improve their performance on defining the risk of recidivism when assisted with a RAI. Also, that professionals can perform better than non-experts on the domain. Participants had to predict whether a person who has been released from prison will commit a new crime leading to re-incarceration, within the next two years. This user study is done with (1) <i>general</i> participants from diverse backgrounds recruited through a crowdsourcing platform, (2) <i>targeted</i> participants who are students and practitioners of data science, criminology, or social work and professionals who work with RisCanvi. We also run focus groups with participants of the <i>targeted</i> study, including people who use <i>RisCanvi</i> in a professional capacity, to interpret the quantitative results. Among other findings, we observe that algorithmic support systematically leads to more accurate predictions from all participants, but that statistically significant gains are only seen in the performance of <i>targeted</i> participants with respect to that of crowdsourced participants. Among other comments, professional participants indicate that they would not foresee using a fully-automated system in criminal risk assessment, but do consider it valuable for training, standardization, and to fine-tune or double-check their predictions on particularly difficult cases. We found that the revised prediction by using a RAI increases the performance of all groups, while professionals show a better performance in general. And, a RAI can be considered for extending professional capacities and skills along their careers.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 2","pages":"471 - 517"},"PeriodicalIF":3.1,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09393-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145143808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Legal sentence boundary detection using hybrid deep learning and statistical models 使用混合深度学习和统计模型检测法律句子边界
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-14 DOI: 10.1007/s10506-024-09394-x
Reshma Sheik, Sneha Rao Ganta, S. Jaya Nirmala

Sentence boundary detection (SBD) represents an important first step in natural language processing since accurately identifying sentence boundaries significantly impacts downstream applications. Nevertheless, detecting sentence boundaries within legal texts poses a unique and challenging problem due to their distinct structural and linguistic features. Our approach utilizes deep learning models to leverage delimiter and surrounding context information as input, enabling precise detection of sentence boundaries in English legal texts. We evaluate various deep learning models, including domain-specific transformer models like LegalBERT and CaseLawBERT. To assess the efficacy of our deep learning models, we compare them with a state-of-the-art domain-specific statistical conditional random field (CRF) model. After considering model size, F1-score, and inference time, we identify the Convolutional Neural Network Model (CNN) as the top-performing deep learning model. To further enhance performance, we integrate the features of the CNN model into the subsequent CRF model, creating a hybrid architecture that combines the strengths of both models. Our experiments demonstrate that the hybrid model outperforms the baseline model, achieving a 4% improvement in the F1-score. Additional experiments showcase the superiority of the hybrid model over SBD open-source libraries when confronted with an out-of-domain test set. These findings underscore the importance of efficient SBD in legal texts and emphasize the advantages of employing deep learning models and hybrid architectures to achieve optimal performance.

句子边界检测(SBD)是自然语言处理中重要的第一步,因为准确识别句子边界会对下游应用产生重大影响。然而,法律文本中句子边界的检测由于其独特的结构和语言特征而成为一个独特而具有挑战性的问题。我们的方法利用深度学习模型来利用分隔符和周围上下文信息作为输入,从而能够精确检测英语法律文本中的句子边界。我们评估了各种深度学习模型,包括LegalBERT和CaseLawBERT等领域特定的转换模型。为了评估我们的深度学习模型的有效性,我们将它们与最先进的特定领域的统计条件随机场(CRF)模型进行了比较。在考虑了模型大小、f1分数和推理时间后,我们确定卷积神经网络模型(CNN)是表现最好的深度学习模型。为了进一步提高性能,我们将CNN模型的特征集成到后续的CRF模型中,创建了一个结合两种模型优势的混合架构。我们的实验表明,混合模型优于基线模型,f1分数提高了4%。另外的实验表明,当面对域外测试集时,混合模型优于SBD开源库。这些发现强调了在法律文本中高效的SBD的重要性,并强调了采用深度学习模型和混合架构来实现最佳性能的优势。
{"title":"Legal sentence boundary detection using hybrid deep learning and statistical models","authors":"Reshma Sheik,&nbsp;Sneha Rao Ganta,&nbsp;S. Jaya Nirmala","doi":"10.1007/s10506-024-09394-x","DOIUrl":"10.1007/s10506-024-09394-x","url":null,"abstract":"<div><p>Sentence boundary detection (SBD) represents an important first step in natural language processing since accurately identifying sentence boundaries significantly impacts downstream applications. Nevertheless, detecting sentence boundaries within legal texts poses a unique and challenging problem due to their distinct structural and linguistic features. Our approach utilizes deep learning models to leverage delimiter and surrounding context information as input, enabling precise detection of sentence boundaries in English legal texts. We evaluate various deep learning models, including domain-specific transformer models like LegalBERT and CaseLawBERT. To assess the efficacy of our deep learning models, we compare them with a state-of-the-art domain-specific statistical conditional random field (CRF) model. After considering model size, F1-score, and inference time, we identify the Convolutional Neural Network Model (CNN) as the top-performing deep learning model. To further enhance performance, we integrate the features of the CNN model into the subsequent CRF model, creating a hybrid architecture that combines the strengths of both models. Our experiments demonstrate that the hybrid model outperforms the baseline model, achieving a 4% improvement in the F1-score. Additional experiments showcase the superiority of the hybrid model over SBD open-source libraries when confronted with an out-of-domain test set. These findings underscore the importance of efficient SBD in legal texts and emphasize the advantages of employing deep learning models and hybrid architectures to achieve optimal performance.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 2","pages":"519 - 549"},"PeriodicalIF":3.1,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140243978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Reasoning with inconsistent precedents 更正:根据不一致的先例进行推理
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-16 DOI: 10.1007/s10506-024-09392-z
Ilaria Canavotto
{"title":"Correction to: Reasoning with inconsistent precedents","authors":"Ilaria Canavotto","doi":"10.1007/s10506-024-09392-z","DOIUrl":"10.1007/s10506-024-09392-z","url":null,"abstract":"","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 1","pages":"167 - 170"},"PeriodicalIF":3.1,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139961378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents 将基于提示的语言模型与弱监督相结合,用于法律文件的标注命名实体识别
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-15 DOI: 10.1007/s10506-023-09388-1
Vitor Oliveira, Gabriel Nogueira, Thiago Faleiros, Ricardo Marcacini

Named entity recognition (NER) is a very relevant task for text information retrieval in natural language processing (NLP) problems. Most recent state-of-the-art NER methods require humans to annotate and provide useful data for model training. However, using human power to identify, circumscribe and label entities manually can be very expensive in terms of time, money, and effort. This paper investigates the use of prompt-based language models (OpenAI’s GPT-3) and weak supervision in the legal domain. We apply both strategies as alternative approaches to the traditional human-based annotation method, relying on computer power instead human effort for labeling, and subsequently compare model performance between computer and human-generated data. We also introduce combinations of all three mentioned methods (prompt-based, weak supervision, and human annotation), aiming to find ways to maintain high model efficiency and low annotation costs. We showed that, despite human labeling still maintaining better overall performance results, the alternative strategies and their combinations presented themselves as valid options, displaying positive results and similar model scores at lower costs. Final results demonstrate preservation of human-trained models scores averaging 74.0% for GPT-3, 95.6% for weak supervision, 90.7% for GPT + weak supervision combination, and 83.9% for GPT + 30% human-labeling combination.

命名实体识别(NER)是自然语言处理(NLP)问题中与文本信息检索非常相关的一项任务。最新的最先进的NER方法需要人类注释并为模型训练提供有用的数据。然而,使用人力手动识别、限定和标记实体在时间、金钱和精力方面可能非常昂贵。本文研究了基于提示的语言模型(OpenAI的GPT-3)的使用和法律领域的弱监督。我们将这两种策略作为传统的基于人类的标注方法的替代方法,依靠计算机的能力而不是人类的努力来标记,然后比较计算机和人类生成的数据之间的模型性能。我们还介绍了上述三种方法(基于提示、弱监督和人工注释)的组合,旨在找到保持高模型效率和低注释成本的方法。我们表明,尽管人工标记仍然保持更好的整体性能结果,但替代策略及其组合呈现为有效的选择,以较低的成本显示积极的结果和相似的模型分数。最终结果表明,人工训练模型的平均分在GPT-3下为74.0%,在弱监督下为95.6%,在GPT +弱监督组合下为90.7%,在GPT + 30%人工标记组合下为83.9%。
{"title":"Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents","authors":"Vitor Oliveira,&nbsp;Gabriel Nogueira,&nbsp;Thiago Faleiros,&nbsp;Ricardo Marcacini","doi":"10.1007/s10506-023-09388-1","DOIUrl":"10.1007/s10506-023-09388-1","url":null,"abstract":"<div><p>Named entity recognition (NER) is a very relevant task for text information retrieval in natural language processing (NLP) problems. Most recent state-of-the-art NER methods require humans to annotate and provide useful data for model training. However, using human power to identify, circumscribe and label entities manually can be very expensive in terms of time, money, and effort. This paper investigates the use of prompt-based language models (OpenAI’s GPT-3) and weak supervision in the legal domain. We apply both strategies as alternative approaches to the traditional human-based annotation method, relying on computer power instead human effort for labeling, and subsequently compare model performance between computer and human-generated data. We also introduce combinations of all three mentioned methods (prompt-based, weak supervision, and human annotation), aiming to find ways to maintain high model efficiency and low annotation costs. We showed that, despite human labeling still maintaining better overall performance results, the alternative strategies and their combinations presented themselves as valid options, displaying positive results and similar model scores at lower costs. Final results demonstrate preservation of human-trained models scores averaging 74.0% for GPT-3, 95.6% for weak supervision, 90.7% for GPT + weak supervision combination, and 83.9% for GPT + 30% human-labeling combination.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 2","pages":"361 - 381"},"PeriodicalIF":3.1,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139775198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agents preserving privacy on intelligent transportation systems according to EU law 根据欧盟法律保护智能交通系统隐私的代理商
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-12 DOI: 10.1007/s10506-024-09391-0
Javier Carbo, Juanita Pedraza, Jose M. Molina

Intelligent Transportation Systems are expected to automate how parking slots are booked by trucks. The intrinsic dynamic nature of this problem, the need of explanations and the inclusion of private data justify an agent-based solution. Agents solving this problem act with a Believe Desire Intentions reasoning, and are implemented with JASON. Privacy of trucks becomes protected sharing a list of parkings ordered by preference. Furthermore, the process of assigning parking slots takes into account legal requirements on breaks and driving time limits. Finally, the agent simulations use the distances, the number of trucks and parkings corresponding to the proportions of the current European Union data. The performance of the proposed solution is tested in these simulations with three different distances against an alternative with complete knowledge. The difference in efficiency, the number of illegal breaks and the traveled distances are measured in them. Comparing the results, we can conclude that the nonprivate alternative is slightly better in performance while both alternatives do not produce illegal breaks. In this way the simulations show that the proposed privacy protection does not impose a relevant handicap in efficiency.

智能交通系统有望实现卡车预定停车位的自动化。这个问题的内在动态性、解释的需要和私有数据的包含证明了基于代理的解决方案是正确的。解决这一问题的智能体采用相信-欲望-意图推理,并使用JASON实现。卡车的隐私得到了保护,共享了按偏好排序的停车列表。此外,分配停车位的过程要考虑到休息和驾驶时间限制的法律要求。最后,代理模拟使用与当前欧盟数据比例相对应的距离、卡车数量和停车位。在三种不同距离的模拟中,对具有完全知识的备选方案进行了性能测试。效率的差异、违规次数和行驶距离都是用它们来衡量的。比较结果,我们可以得出结论,非私人替代方案在性能上略好,而两种替代方案都不会产生非法中断。这样,仿真结果表明,所提出的隐私保护不会对效率造成相关障碍。
{"title":"Agents preserving privacy on intelligent transportation systems according to EU law","authors":"Javier Carbo,&nbsp;Juanita Pedraza,&nbsp;Jose M. Molina","doi":"10.1007/s10506-024-09391-0","DOIUrl":"10.1007/s10506-024-09391-0","url":null,"abstract":"<div><p>Intelligent Transportation Systems are expected to automate how parking slots are booked by trucks. The intrinsic dynamic nature of this problem, the need of explanations and the inclusion of private data justify an agent-based solution. Agents solving this problem act with a Believe Desire Intentions reasoning, and are implemented with JASON. Privacy of trucks becomes protected sharing a list of parkings ordered by preference. Furthermore, the process of assigning parking slots takes into account legal requirements on breaks and driving time limits. Finally, the agent simulations use the distances, the number of trucks and parkings corresponding to the proportions of the current European Union data. The performance of the proposed solution is tested in these simulations with three different distances against an alternative with complete knowledge. The difference in efficiency, the number of illegal breaks and the traveled distances are measured in them. Comparing the results, we can conclude that the nonprivate alternative is slightly better in performance while both alternatives do not produce illegal breaks. In this way the simulations show that the proposed privacy protection does not impose a relevant handicap in efficiency.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 2","pages":"437 - 470"},"PeriodicalIF":3.1,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09391-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139782882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Code is law: how COMPAS affects the way the judiciary handles the risk of recidivism 法典即法律:COMPAS 如何影响司法机构处理累犯风险的方式
IF 3.1 2区 社会学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-09 DOI: 10.1007/s10506-024-09389-8
Christoph Engel, Lorenz Linhardt, Marcel Schubert

Judges in multiple US states, such as New York, Pennsylvania, Wisconsin, California, and Florida, receive a prediction of defendants’ recidivism risk, generated by the COMPAS algorithm. If judges act on these predictions, they implicitly delegate normative decisions to proprietary software, even beyond the previously documented race and age biases. Using the ProPublica dataset, we demonstrate that COMPAS predictions favor jailing over release. COMPAS is biased against defendants. We show that this bias can largely be removed. Our proposed correction increases overall accuracy, and attenuates anti-black and anti-young bias. However, it also slightly increases the risk that defendants are released who commit a new crime before tried. We argue that this normative decision should not be buried in the code. The tradeoff between the interests of innocent defendants and of future victims should not only be made transparent. The algorithm should be changed such that the legislator and the courts do make this choice.

美国多个州的法官,如纽约州、宾夕法尼亚州、威斯康星州、加利福尼亚州和佛罗里达州,都会收到由COMPAS算法生成的被告再犯风险预测。如果法官根据这些预测行事,他们就隐含地将规范性决策委托给了专有软件,甚至超出了先前记录的种族和年龄偏见。使用ProPublica数据集,我们证明了COMPAS的预测倾向于监禁而不是释放。COMPAS对被告有偏见。我们表明,这种偏见在很大程度上是可以消除的。我们提出的修正提高了整体准确性,并减弱了反黑人和反年轻人的偏见。然而,它也略微增加了被告在审判前犯了新罪而被释放的风险。我们认为这个规范性的决定不应该隐藏在代码中。无辜被告和未来受害者的利益之间的权衡不仅应该是透明的。算法应该改变,让立法者和法院做出选择。
{"title":"Code is law: how COMPAS affects the way the judiciary handles the risk of recidivism","authors":"Christoph Engel,&nbsp;Lorenz Linhardt,&nbsp;Marcel Schubert","doi":"10.1007/s10506-024-09389-8","DOIUrl":"10.1007/s10506-024-09389-8","url":null,"abstract":"<div><p>Judges in multiple US states, such as New York, Pennsylvania, Wisconsin, California, and Florida, receive a prediction of defendants’ recidivism risk, generated by the COMPAS algorithm. If judges act on these predictions, they implicitly delegate normative decisions to proprietary software, even beyond the previously documented race and age biases. Using the ProPublica dataset, we demonstrate that COMPAS predictions favor jailing over release. COMPAS is biased against defendants. We show that this bias can largely be removed. Our proposed correction increases overall accuracy, and attenuates anti-black and anti-young bias. However, it also slightly increases the risk that defendants are released who commit a new crime before tried. We argue that this normative decision should not be buried in the code. The tradeoff between the interests of innocent defendants and of future victims should not only be made transparent. The algorithm should be changed such that the legislator and the courts do make this choice.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 2","pages":"383 - 404"},"PeriodicalIF":3.1,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09389-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139788904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial Intelligence and Law
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1