Large-scale public datasets are vital for driving the progress of abstractive summarization, especially in law, where documents have highly specialized jargon. However, the available resources are English-centered, limiting research advancements in other languages. This paper introduces LAWSUIT, a collection of 14K Italian legal verdicts with expert-authored abstractive maxims drawn from the Constitutional Court of the Italian Republic. LAWSUIT presents an arduous task with lengthy source texts and evenly distributed salient content. We offer extensive experiments with sequence-to-sequence and segmentation-based approaches, revealing that the latter achieve better results in full and few-shot settings. We openly release LAWSUIT to foster the development and automation of real-world legal applications.
{"title":"LAWSUIT: a LArge expert-Written SUmmarization dataset of ITalian constitutional court verdicts","authors":"Luca Ragazzi, Gianluca Moro, Stefano Guidi, Giacomo Frisoni","doi":"10.1007/s10506-024-09414-w","DOIUrl":"10.1007/s10506-024-09414-w","url":null,"abstract":"<div><p>Large-scale public datasets are vital for driving the progress of abstractive summarization, especially in law, where documents have highly specialized jargon. However, the available resources are English-centered, limiting research advancements in other languages. This paper introduces <span>LAWSUIT</span>, a collection of 14K Italian legal verdicts with expert-authored abstractive maxims drawn from the Constitutional Court of the Italian Republic. <span>LAWSUIT</span> presents an arduous task with lengthy source texts and evenly distributed salient content. We offer extensive experiments with sequence-to-sequence and segmentation-based approaches, revealing that the latter achieve better results in full and few-shot settings. We openly release <span>LAWSUIT</span> to foster the development and automation of real-world legal applications.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"1151 - 1187"},"PeriodicalIF":3.1,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09414-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s10506-024-09418-6
Shutao Gong, Xudong Luo
This paper introduces an advanced event detection model for legal intelligence, focusing on identifying event types in legal cases by examining trigger word candidates. It employs the DeBERTa pre-trained language model for encoding sentences into enriched word representations, supplemented by the Global Pointer neural network for initial scoring. The model further uses a graph convolutional network, conditional layer normalisation, and a convolutional neural network to extract features from these representations. A multilayer perceptron then determines the event type based on these features and initial scores. Additionally, a dictionary-matching method revises the predicted event types, with adversarial training and a sentence-length mask employed to enhance model performance and address missing trigger words. The model’s effectiveness is proven through extensive experimentation, outperforming state-of-the-art baselines (including some large language models) and securing third prize in the event detection task at the Challenge of AI in Law (CAIL) 2022. The code of our model is available at https://github.com/1gst/DGGCCN/tree/main.
{"title":"DGGCCM: a hybrid neural model for legal event detection","authors":"Shutao Gong, Xudong Luo","doi":"10.1007/s10506-024-09418-6","DOIUrl":"10.1007/s10506-024-09418-6","url":null,"abstract":"<div><p>This paper introduces an advanced event detection model for legal intelligence, focusing on identifying event types in legal cases by examining trigger word candidates. It employs the DeBERTa pre-trained language model for encoding sentences into enriched word representations, supplemented by the Global Pointer neural network for initial scoring. The model further uses a graph convolutional network, conditional layer normalisation, and a convolutional neural network to extract features from these representations. A multilayer perceptron then determines the event type based on these features and initial scores. Additionally, a dictionary-matching method revises the predicted event types, with adversarial training and a sentence-length mask employed to enhance model performance and address missing trigger words. The model’s effectiveness is proven through extensive experimentation, outperforming state-of-the-art baselines (including some large language models) and securing third prize in the event detection task at the Challenge of AI in Law (CAIL) 2022. The code of our model is available at https://github.com/1gst/DGGCCN/tree/main.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"1109 - 1149"},"PeriodicalIF":3.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1007/s10506-024-09412-y
David Fernández-Llorca, Emilia Gómez, Ignacio Sánchez, Gabriele Mazzini
The European Union’s Artificial Intelligence Act (AI Act) is a groundbreaking regulatory framework that integrates technical concepts and terminology from the rapidly evolving ecosystems of AI research and innovation into the legal domain. Precise definitions accessible to both AI experts and lawyers are crucial for the legislation to be effective. This paper provides an interdisciplinary analysis of the concepts of AI system, general purpose AI system, foundation model and generative AI across the different versions of the legal text (Commission proposal, Parliament position and Council General Approach) before the final political agreement. The goal is to help bridge the understanding of these key terms between the technical and legal communities and contribute to a proper implementation of the AI Act. We provide an analysis of the concept of AI system considering its scientific foundation and the crucial role that it plays in the regulation, which requires a sound definition both from legal and technical standpoints. We connect the outcomes of this discussion with the analysis of the concept of general purpose AI system and its evolution during the negotiations. We also address the distinct conceptual meanings of AI system vs AI model and explore the technical nuances of the term foundation model. We conclude that rooting the definition of foundation model to its general purpose capabilities following standardised evaluation methodologies appears to be most appropriate approach. Lastly, we tackle the concept of generative AI, arguing that definitions of AI system that include “content” as one of the system’s outputs already captures it, and concluding that not all generative AI is based on foundation models.
{"title":"An interdisciplinary account of the terminological choices by EU policymakers ahead of the final agreement on the AI Act: AI system, general purpose AI system, foundation model, and generative AI","authors":"David Fernández-Llorca, Emilia Gómez, Ignacio Sánchez, Gabriele Mazzini","doi":"10.1007/s10506-024-09412-y","DOIUrl":"10.1007/s10506-024-09412-y","url":null,"abstract":"<div><p>The European Union’s Artificial Intelligence Act (AI Act) is a groundbreaking regulatory framework that integrates technical concepts and terminology from the rapidly evolving ecosystems of AI research and innovation into the legal domain. Precise definitions accessible to both AI experts and lawyers are crucial for the legislation to be effective. This paper provides an interdisciplinary analysis of the concepts of <i>AI system</i>, <i>general purpose AI system</i>, <i>foundation model</i> and <i>generative AI</i> across the different versions of the legal text (Commission proposal, Parliament position and Council General Approach) before the final political agreement. The goal is to help bridge the understanding of these key terms between the technical and legal communities and contribute to a proper implementation of the AI Act. We provide an analysis of the concept of <i>AI system</i> considering its scientific foundation and the crucial role that it plays in the regulation, which requires a sound definition both from legal and technical standpoints. We connect the outcomes of this discussion with the analysis of the concept of <i>general purpose AI system</i> and its evolution during the negotiations. We also address the distinct conceptual meanings of <i>AI system</i> vs <i>AI model</i> and explore the technical nuances of the term <i>foundation model</i>. We conclude that rooting the definition of <i>foundation model</i> to its general purpose capabilities following standardised evaluation methodologies appears to be most appropriate approach. Lastly, we tackle the concept of <i>generative AI</i>, arguing that definitions of <i>AI system</i> that include “content” as one of the system’s outputs already captures it, and concluding that not all <i>generative AI</i> is based on <i>foundation models</i>.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"875 - 888"},"PeriodicalIF":3.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09412-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141922867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1007/s10506-024-09416-8
Rafael Mesquita, Antonio Pires
Do hard law international organizations use jurisprudence differently than soft law ones? Precedent can be asset or an encumbrance to international organizations and their members, depending on their aims and on the policy area. Linking current decisions to previously-agreed ones helps to increase cohesion, facilitate consensus among members, and borrow authority – benefits that might be more necessary for some organizations than for others. To compare whether the features of norm-producing organizations correlate with their preference for jurisprudence, we compare two organs from the United Nations system: the Security Council, which produces binding decisions, and the General Assembly, which delivers soft law resolutions. We explore the citation networks formed by the approximately 20,400 resolutions adopted by each organ between 1946 and 2019 to test their differences with regards to the dynamics of citation formation, concentration of citations, and timing. Descriptive results reveal the main periods of jurisprudential activity by the Security Council and the General Assembly, but find no sizeable difference in their overall rate of precedent usage. We apply the Citation Exponential Random Graph Model (cERGM) to test for network determinants of citations and find additional similarities on transitivity and homophily, but also variations regarding preferential attachment.
{"title":"Jurisprudence in hard and soft law output of international organizations: a network analysis of the use of precedent in UN Security Council and general assembly resolutions","authors":"Rafael Mesquita, Antonio Pires","doi":"10.1007/s10506-024-09416-8","DOIUrl":"10.1007/s10506-024-09416-8","url":null,"abstract":"<div><p>Do hard law international organizations use jurisprudence differently than soft law ones? Precedent can be asset or an encumbrance to international organizations and their members, depending on their aims and on the policy area. Linking current decisions to previously-agreed ones helps to increase cohesion, facilitate consensus among members, and borrow authority – benefits that might be more necessary for some organizations than for others. To compare whether the features of norm-producing organizations correlate with their preference for jurisprudence, we compare two organs from the United Nations system: the Security Council, which produces binding decisions, and the General Assembly, which delivers soft law resolutions. We explore the citation networks formed by the approximately 20,400 resolutions adopted by each organ between 1946 and 2019 to test their differences with regards to the dynamics of citation formation, concentration of citations, and timing. Descriptive results reveal the main periods of jurisprudential activity by the Security Council and the General Assembly, but find no sizeable difference in their overall rate of precedent usage. We apply the Citation Exponential Random Graph Model (cERGM) to test for network determinants of citations and find additional similarities on transitivity and homophily, but also variations regarding preferential attachment.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"1079 - 1108"},"PeriodicalIF":3.1,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-03DOI: 10.1007/s10506-024-09415-9
Kilian Lüders, Bent Stohlmann
Proportionality is a central and globally spread argumentation technique in public law. This article provides a conceptual introduction to proportionality and argues that such a domain-specific form of argumentation is particularly interesting for argument mining. As a major contribution of this article, we share a new dataset for which proportionality has been annotated. The dataset consists of 300 German Federal Constitutional Court decisions annotated at the sentence level (54,929 sentences). In addition to separating textual parts, a fine-grained system of proportionality categories was used. Finally, we used these data for a classification task. We built classifiers that predict whether or not proportionality is invoked in a sentence. We employed several models, including neural and deep learning models and transformers. A BERT-BiLSTM-CRF model performed best.
{"title":"Classifying proportionality - identification of a legal argument","authors":"Kilian Lüders, Bent Stohlmann","doi":"10.1007/s10506-024-09415-9","DOIUrl":"10.1007/s10506-024-09415-9","url":null,"abstract":"<div><p>Proportionality is a central and globally spread argumentation technique in public law. This article provides a conceptual introduction to proportionality and argues that such a domain-specific form of argumentation is particularly interesting for argument mining. As a major contribution of this article, we share a new dataset for which proportionality has been annotated. The dataset consists of 300 German Federal Constitutional Court decisions annotated at the sentence level (54,929 sentences). In addition to separating textual parts, a fine-grained system of proportionality categories was used. Finally, we used these data for a classification task. We built classifiers that predict whether or not proportionality is invoked in a sentence. We employed several models, including neural and deep learning models and transformers. A BERT-BiLSTM-CRF model performed best.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"1051 - 1078"},"PeriodicalIF":3.1,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09415-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s10506-024-09413-x
Sławomir Dadas, Marek Kozłowski, Rafał Poświata, Michał Perełkiewicz, Marcin Białas, Małgorzata Grębowiec
{"title":"Correction: A support system for the detection of abusive clauses in B2C contracts","authors":"Sławomir Dadas, Marek Kozłowski, Rafał Poświata, Michał Perełkiewicz, Marcin Białas, Małgorzata Grębowiec","doi":"10.1007/s10506-024-09413-x","DOIUrl":"10.1007/s10506-024-09413-x","url":null,"abstract":"","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"953 - 954"},"PeriodicalIF":3.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09413-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1007/s10506-024-09411-z
Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh
Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We applied various domain-specific abstractive summarization models and general-domain LLMs as well as extractive summarization models over two sets of legal case judgements – from the United Kingdom (UK) Supreme Court and the Indian Supreme Court – and evaluated the quality of the generated summaries. We also perform experiments on a third dataset of legal documents of a different type – Government reports from the United States. Results show that abstractive summarization models and LLMs generally perform better than the extractive methods as per traditional metrics for evaluating summary quality. However, detailed investigation shows the presence of inconsistencies and hallucinations in the outputs of the generative models, and we explore ways to reduce the hallucinations and inconsistencies in the summaries. Overall, the investigation suggests that further improvements are needed to enhance the reliability of abstractive models and LLMs for legal case judgement summarization. At present, a human-in-the-loop technique is more suitable for performing manual checks to identify inconsistencies in the generated summaries.
{"title":"Applicability of large language models and generative models for legal case judgement summarization","authors":"Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh","doi":"10.1007/s10506-024-09411-z","DOIUrl":"10.1007/s10506-024-09411-z","url":null,"abstract":"<div><p>Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We applied various domain-specific abstractive summarization models and general-domain LLMs as well as extractive summarization models over two sets of legal case judgements – from the United Kingdom (UK) Supreme Court and the Indian Supreme Court – and evaluated the quality of the generated summaries. We also perform experiments on a third dataset of legal documents of a different type – Government reports from the United States. Results show that abstractive summarization models and LLMs generally perform better than the extractive methods as per traditional metrics for evaluating summary quality. However, detailed investigation shows the presence of inconsistencies and hallucinations in the outputs of the generative models, and we explore ways to reduce the hallucinations and inconsistencies in the summaries. Overall, the investigation suggests that further improvements are needed to enhance the reliability of abstractive models and LLMs for legal case judgement summarization. At present, a human-in-the-loop technique is more suitable for performing manual checks to identify inconsistencies in the generated summaries.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"1007 - 1050"},"PeriodicalIF":3.1,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141836870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-13DOI: 10.1007/s10506-024-09410-0
Qiang Ge, Jing Zhang, Xiaoding Guo
In recent years, the analysis of legal judgments and the prediction of outcomes based on case factual descriptions have become hot research topics in the field of judiciary. Among them, the task of charge prediction aims to predict the applicable charges of a judicial case based on its factual description, making it an important research area in the intelligent judiciary. While significant progress has been made in machine learning and deep learning, traditional methods are limited to handling data in Euclidean space and cannot effectively capture the semantic information in the text. To overcome the limitations of traditional learning approaches, many studies have started exploring the use of graphs to represent rich relationships between entities in text and employing graph convolutional neural networks to learn text representations. In this paper, we propose a charge prediction method based on graph convolutional neural networks. By constructing a similarity graph between cases and utilizing graph convolutional neural networks to learn case feature representations, we can better capture the relational information between cases and improve the accuracy of charge prediction. Experimental results on multiple benchmark datasets demonstrate that our proposed model outperforms traditional methods in charge prediction tasks.
{"title":"SIM-GCN: similarity graph convolutional networks for charges prediction","authors":"Qiang Ge, Jing Zhang, Xiaoding Guo","doi":"10.1007/s10506-024-09410-0","DOIUrl":"10.1007/s10506-024-09410-0","url":null,"abstract":"<div><p>In recent years, the analysis of legal judgments and the prediction of outcomes based on case factual descriptions have become hot research topics in the field of judiciary. Among them, the task of charge prediction aims to predict the applicable charges of a judicial case based on its factual description, making it an important research area in the intelligent judiciary. While significant progress has been made in machine learning and deep learning, traditional methods are limited to handling data in Euclidean space and cannot effectively capture the semantic information in the text. To overcome the limitations of traditional learning approaches, many studies have started exploring the use of graphs to represent rich relationships between entities in text and employing graph convolutional neural networks to learn text representations. In this paper, we propose a charge prediction method based on graph convolutional neural networks. By constructing a similarity graph between cases and utilizing graph convolutional neural networks to learn case feature representations, we can better capture the relational information between cases and improve the accuracy of charge prediction. Experimental results on multiple benchmark datasets demonstrate that our proposed model outperforms traditional methods in charge prediction tasks.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"983 - 1005"},"PeriodicalIF":3.1,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141650807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-12DOI: 10.1007/s10506-024-09409-7
Jung-Mei Chu, Hao-Cheng Lo, Jieh Hsiang, Chun-Chieh Cho
In patent prosecution, timely and effective responses to Office Actions (OAs) are crucial for securing patents. However, past automation and artificial intelligence research have largely overlooked this aspect. To bridge this gap, our study introduces the Patent Office Action Response Intelligence System (PARIS) and its advanced version, the Large Language Model (LLM) Enhanced PARIS (LE-PARIS). These systems are designed to enhance the efficiency of patent attorneys in handling OA responses through collaboration with AI. The systems’ key features include the construction of an OA Topics Database, development of Response Templates, and implementation of Recommender Systems and LLM-based Response Generation. To validate the effectiveness of the systems, we have employed a multi-paradigm analysis using the USPTO Office Action database and longitudinal data based on attorney interactions with our systems over six years. Through five studies, we have examined the constructiveness of OA topics (studies 1 and 2) using topic modeling and our proposed Delphi process, the efficacy of our proposed hybrid LLM-based recommender system tailored for OA responses (study 3), the quality of generated responses (study 4), and the systems’ practical value in real-world scenarios through user studies (study 5). The results indicate that both PARIS and LE-PARIS significantly achieve key metrics and have a positive impact on attorney performance.
{"title":"From PARIS to LE-PARIS: toward patent response automation with recommender systems and collaborative large language models","authors":"Jung-Mei Chu, Hao-Cheng Lo, Jieh Hsiang, Chun-Chieh Cho","doi":"10.1007/s10506-024-09409-7","DOIUrl":"10.1007/s10506-024-09409-7","url":null,"abstract":"<div><p>In patent prosecution, timely and effective responses to Office Actions (OAs) are crucial for securing patents. However, past automation and artificial intelligence research have largely overlooked this aspect. To bridge this gap, our study introduces the Patent Office Action Response Intelligence System (PARIS) and its advanced version, the Large Language Model (LLM) Enhanced PARIS (LE-PARIS). These systems are designed to enhance the efficiency of patent attorneys in handling OA responses through collaboration with AI. The systems’ key features include the construction of an OA Topics Database, development of Response Templates, and implementation of Recommender Systems and LLM-based Response Generation. To validate the effectiveness of the systems, we have employed a multi-paradigm analysis using the USPTO Office Action database and longitudinal data based on attorney interactions with our systems over six years. Through five studies, we have examined the constructiveness of OA topics (studies 1 and 2) using topic modeling and our proposed Delphi process, the efficacy of our proposed hybrid LLM-based recommender system tailored for OA responses (study 3), the quality of generated responses (study 4), and the systems’ practical value in real-world scenarios through user studies (study 5). The results indicate that both PARIS and LE-PARIS significantly achieve key metrics and have a positive impact on attorney performance.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 4","pages":"955 - 981"},"PeriodicalIF":3.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145493396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s10506-024-09406-w
Sascha Schweitzer, Markus Conrads
In the evolving landscape of legal information systems, ChatGPT-4 and other advanced conversational agents (CAs) offer the potential to disruptively transform the law industry. This study evaluates commercially available CAs within the German legal context, thereby assessing the generalizability of previous U.S.-based findings. Employing a unique corpus of 200 distinct legal tasks, ChatGPT-4 was benchmarked against Google Bard, Google Gemini, and its predecessor, ChatGPT-3.5. Human-expert and automated assessments of 4000 CA-generated responses reveal ChatGPT-4 to be the first CA to surpass the threshold of solving realistic legal tasks and passing a German business law exam. While ChatGPT-4 outperforms ChatGPT-3.5, Google Bard, and Google Gemini in both consistency and quality, the results demonstrate a considerable degree of variability, especially in complex cases with no predefined response options. Based on these findings, legal professionals should manually verify all texts produced by CAs before use. Novices must exercise caution with CA-generated legal advice, given the expertise needed for its assessment.
{"title":"The digital transformation of jurisprudence: an evaluation of ChatGPT-4’s applicability to solve cases in business law","authors":"Sascha Schweitzer, Markus Conrads","doi":"10.1007/s10506-024-09406-w","DOIUrl":"10.1007/s10506-024-09406-w","url":null,"abstract":"<div><p>In the evolving landscape of legal information systems, ChatGPT-4 and other advanced conversational agents (CAs) offer the potential to disruptively transform the law industry. This study evaluates commercially available CAs within the German legal context, thereby assessing the generalizability of previous U.S.-based findings. Employing a unique corpus of 200 distinct legal tasks, ChatGPT-4 was benchmarked against Google Bard, Google Gemini, and its predecessor, ChatGPT-3.5. Human-expert and automated assessments of 4000 CA-generated responses reveal ChatGPT-4 to be the first CA to surpass the threshold of solving realistic legal tasks and passing a German business law exam. While ChatGPT-4 outperforms ChatGPT-3.5, Google Bard, and Google Gemini in both consistency and quality, the results demonstrate a considerable degree of variability, especially in complex cases with no predefined response options. Based on these findings, legal professionals should manually verify all texts produced by CAs before use. Novices must exercise caution with CA-generated legal advice, given the expertise needed for its assessment.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 3","pages":"847 - 871"},"PeriodicalIF":3.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-024-09406-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141707348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}