Pub Date : 2023-11-20DOI: 10.1007/s10506-023-09374-7
Candida M. Greco, Andrea Tagarelli
Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.
{"title":"Bringing order into the realm of Transformer-based language models for artificial intelligence and law","authors":"Candida M. Greco, Andrea Tagarelli","doi":"10.1007/s10506-023-09374-7","DOIUrl":"10.1007/s10506-023-09374-7","url":null,"abstract":"<div><p>Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 4","pages":"863 - 1010"},"PeriodicalIF":3.1,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-023-09374-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142519046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-11DOI: 10.1007/s10506-023-09379-2
S. Georgette Graham, Hamidreza Soltani, Olufemi Isiaq
The contract review process can be a costly and time-consuming task for lawyers and clients alike, requiring significant effort to identify and evaluate the legal implications of individual clauses. To address this challenge, we propose the use of natural language processing techniques, specifically text classification based on deontic tags, to streamline the process. Our research question is whether natural language processing techniques, specifically dense vector embeddings, can help semi-automate the contract review process and reduce time and costs for legal professionals reviewing deontic modalities in contracts. In this study, we create a domain-specific dataset and train both baseline and neural network models for contract sentence classification. This approach offers a more efficient and cost-effective solution for contract review, mimicking the work of a lawyer. Our approach achieves an accuracy of 0.90, showcasing its effectiveness in identifying and evaluating individual contract sentences.
{"title":"Natural language processing for legal document review: categorising deontic modalities in contracts","authors":"S. Georgette Graham, Hamidreza Soltani, Olufemi Isiaq","doi":"10.1007/s10506-023-09379-2","DOIUrl":"10.1007/s10506-023-09379-2","url":null,"abstract":"<div><p>The contract review process can be a costly and time-consuming task for lawyers and clients alike, requiring significant effort to identify and evaluate the legal implications of individual clauses. To address this challenge, we propose the use of natural language processing techniques, specifically text classification based on deontic tags, to streamline the process. Our research question is whether natural language processing techniques, specifically dense vector embeddings, can help semi-automate the contract review process and reduce time and costs for legal professionals reviewing deontic modalities in contracts. In this study, we create a domain-specific dataset and train both baseline and neural network models for contract sentence classification. This approach offers a more efficient and cost-effective solution for contract review, mimicking the work of a lawyer. Our approach achieves an accuracy of 0.90, showcasing its effectiveness in identifying and evaluating individual contract sentences.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 1","pages":"79 - 100"},"PeriodicalIF":3.1,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-25DOI: 10.1007/s10506-023-09378-3
Marcos Aurélio Domingues, Edleno Silva de Moura, Leandro Balby Marinho, Altigran da Silva
The proliferation of legal documents in various formats and their dispersion across multiple courts present a significant challenge for users seeking precise matches to their information requirements. Despite notable advancements in legal information retrieval systems, research into legal recommender systems remains limited. A plausible factor contributing to this scarcity could be the absence of extensive publicly accessible datasets or benchmarks. While a few studies have emerged in this field, a comprehensive analysis of the distinct attributes of legal data that influence the design of effective legal recommenders is notably absent in the current literature. This paper addresses this gap by initially amassing a comprehensive session-based dataset from Jusbrasil, one of Brazil’s largest online legal platforms. Subsequently, we scrutinize and discourse key facets of legal session-based recommendation data, including session duration, types of recommendable legal artifacts, coverage, and popularity. Furthermore, we introduce the first session-based recommendation benchmark tailored to the legal domain, shedding light on the performance and constraints of several renowned session-based recommendation approaches. These evaluations are based on real-world data sourced from Jusbrasil.
{"title":"A large scale benchmark for session-based recommendations on the legal domain","authors":"Marcos Aurélio Domingues, Edleno Silva de Moura, Leandro Balby Marinho, Altigran da Silva","doi":"10.1007/s10506-023-09378-3","DOIUrl":"10.1007/s10506-023-09378-3","url":null,"abstract":"<div><p>The proliferation of legal documents in various formats and their dispersion across multiple courts present a significant challenge for users seeking precise matches to their information requirements. Despite notable advancements in legal information retrieval systems, research into legal recommender systems remains limited. A plausible factor contributing to this scarcity could be the absence of extensive publicly accessible datasets or benchmarks. While a few studies have emerged in this field, a comprehensive analysis of the distinct attributes of legal data that influence the design of effective legal recommenders is notably absent in the current literature. This paper addresses this gap by initially amassing a comprehensive session-based dataset from Jusbrasil, one of Brazil’s largest online legal platforms. Subsequently, we scrutinize and discourse key facets of legal session-based recommendation data, including session duration, types of recommendable legal artifacts, coverage, and popularity. Furthermore, we introduce the first session-based recommendation benchmark tailored to the legal domain, shedding light on the performance and constraints of several renowned session-based recommendation approaches. These evaluations are based on real-world data sourced from Jusbrasil.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 1","pages":"43 - 78"},"PeriodicalIF":3.1,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135113045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-25DOI: 10.1007/s10506-023-09377-4
Jingpei Dan, Lanlin Xu, Yuming Wang
Similar case analysis (SCA) is an essential topic in legal artificial intelligence, serving as a reference for legal professionals. Most existing works treat SCA as a traditional text classification task and ignore some important legal elements that affect the verdict and case similarity, like legal events, and thus are easily misled by semantic structure. To address this issue, we propose a Legal Event-Context Model named LECM to improve the accuracy and interpretability of SCA based on Chinese legal corpus. The event-context integration mechanism, which is an essential component of the LECM, is proposed to integrate the legal event and context information based on the attention mechanism, enabling legal events to be associated with their corresponding relevant contexts. We introduce an event detection module to obtain the legal event information, which is pre-trained on a legal event detection dataset to avoid labeling events manually. We conduct extensive experiments on two SCA tasks, i.e., similar case matching (SCM) and similar case retrieval (SCR). Compared with baseline models, LECM is validated by about 13% and 11% average improvement in terms of mean average precision and accuracy respectively, for SCR and SCM tasks. These results indicate that LECM effectively utilizes event-context knowledge to enhance SCA performance and its potential application in various legal document analysis tasks.
{"title":"Integrating legal event and context information for Chinese similar case analysis","authors":"Jingpei Dan, Lanlin Xu, Yuming Wang","doi":"10.1007/s10506-023-09377-4","DOIUrl":"10.1007/s10506-023-09377-4","url":null,"abstract":"<div><p>Similar case analysis (SCA) is an essential topic in legal artificial intelligence, serving as a reference for legal professionals. Most existing works treat SCA as a traditional text classification task and ignore some important legal elements that affect the verdict and case similarity, like legal events, and thus are easily misled by semantic structure. To address this issue, we propose a Legal Event-Context Model named LECM to improve the accuracy and interpretability of SCA based on Chinese legal corpus. The event-context integration mechanism, which is an essential component of the LECM, is proposed to integrate the legal event and context information based on the attention mechanism, enabling legal events to be associated with their corresponding relevant contexts. We introduce an event detection module to obtain the legal event information, which is pre-trained on a legal event detection dataset to avoid labeling events manually. We conduct extensive experiments on two SCA tasks, i.e., similar case matching (SCM) and similar case retrieval (SCR). Compared with baseline models, LECM is validated by about 13% and 11% average improvement in terms of mean average precision and accuracy respectively, for SCR and SCM tasks. These results indicate that LECM effectively utilizes event-context knowledge to enhance SCA performance and its potential application in various legal document analysis tasks.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"33 1","pages":"1 - 42"},"PeriodicalIF":3.1,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135217653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-19DOI: 10.1007/s10506-023-09375-6
Mayur Makawana, Rupa G. Mehta
{"title":"A novel network-based paragraph filtering technique for legal document similarity analysis","authors":"Mayur Makawana, Rupa G. Mehta","doi":"10.1007/s10506-023-09375-6","DOIUrl":"https://doi.org/10.1007/s10506-023-09375-6","url":null,"abstract":"","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135779227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-25DOI: 10.1007/s10506-023-09373-8
Gianluca Moro, Nicola Piscaglia, Luca Ragazzi, Paolo Italiani
Analyzing and evaluating legal case reports are labor-intensive tasks for judges and lawyers, who usually base their decisions on report abstracts, legal principles, and commonsense reasoning. Thus, summarizing legal documents is time-consuming and requires excellent human expertise. Moreover, public legal corpora of specific languages are almost unavailable. This paper proposes a transfer learning approach with extractive and abstractive techniques to cope with the lack of labeled legal summarization datasets, namely a low-resource scenario. In particular, we conducted extensive multi- and cross-language experiments. The proposed work outperforms the state-of-the-art results of extractive summarization on the Australian Legal Case Reports dataset and sets a new baseline for abstractive summarization. Finally, syntactic and semantic metrics assessments have been carried out to evaluate the accuracy and the factual consistency of the machine-generated legal summaries.
{"title":"Multi-language transfer learning for low-resource legal case summarization","authors":"Gianluca Moro, Nicola Piscaglia, Luca Ragazzi, Paolo Italiani","doi":"10.1007/s10506-023-09373-8","DOIUrl":"10.1007/s10506-023-09373-8","url":null,"abstract":"<div><p>Analyzing and evaluating legal case reports are labor-intensive tasks for judges and lawyers, who usually base their decisions on report abstracts, legal principles, and commonsense reasoning. Thus, summarizing legal documents is time-consuming and requires excellent human expertise. Moreover, public legal corpora of specific languages are almost unavailable. This paper proposes a transfer learning approach with extractive and abstractive techniques to cope with the lack of labeled legal summarization datasets, namely a low-resource scenario. In particular, we conducted extensive multi- and cross-language experiments. The proposed work outperforms the state-of-the-art results of extractive summarization on the Australian Legal Case Reports dataset and sets a new baseline for abstractive summarization. Finally, syntactic and semantic metrics assessments have been carried out to evaluate the accuracy and the factual consistency of the machine-generated legal summaries.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 4","pages":"1111 - 1139"},"PeriodicalIF":3.1,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-023-09373-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135768568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-09DOI: 10.1007/s10506-023-09372-9
Raphaël Gyory, David Restrepo Amariles, Gregory Lewkowicz, Hugues Bersini
Accurate data annotation is essential to successfully implementing machine learning (ML) for regulatory compliance. Annotations allow organizations to train supervised ML algorithms and to adapt and audit the software they buy. The lack of annotation tools focused on regulatory data is slowing the adoption of established ML methodologies and process models, such as CRISP-DM, in various legal domains, including in regulatory compliance. This article introduces Ant, an open-source annotation software for regulatory compliance. Ant is designed to adapt to complex organizational processes and enable compliance experts to be in control of ML projects. By drawing on Business Process Modeling (BPM), we show that Ant can contribute to lift major technical bottlenecks to effectively implement regulatory compliance through software, such as the access to multiple sources of heterogeneous data and the integration of process complexities in the ML pipeline. We provide empirical data to validate the performance of Ant, illustrate its potential to speed up the adoption of ML in regulatory compliance, and highlight its limitations.
准确的数据注释对于成功实施机器学习(ML)以符合法规至关重要。通过注释,企业可以训练有监督的 ML 算法,并对所购买的软件进行调整和审核。由于缺乏专注于监管数据的注释工具,在包括监管合规在内的各种法律领域中,成熟的 ML 方法和流程模型(如 CRISP-DM)的采用速度正在放缓。本文将介绍用于法规遵从的开源注释软件 Ant。Ant 旨在适应复杂的组织流程,使合规专家能够控制 ML 项目。通过借鉴业务流程建模(BPM),我们展示了 Ant 可以帮助解除通过软件有效实施法规遵从的主要技术瓶颈,例如访问多源异构数据和集成 ML 管道中的复杂流程。我们提供了经验数据来验证 Ant 的性能,说明它在加快采用 ML 实现法规遵从方面的潜力,并强调了它的局限性。
{"title":"Ant: a process aware annotation software for regulatory compliance","authors":"Raphaël Gyory, David Restrepo Amariles, Gregory Lewkowicz, Hugues Bersini","doi":"10.1007/s10506-023-09372-9","DOIUrl":"10.1007/s10506-023-09372-9","url":null,"abstract":"<div><p>Accurate data annotation is essential to successfully implementing machine learning (ML) for regulatory compliance. Annotations allow organizations to train supervised ML algorithms and to adapt and audit the software they buy. The lack of annotation tools focused on regulatory data is slowing the adoption of established ML methodologies and process models, such as CRISP-DM, in various legal domains, including in regulatory compliance. This article introduces Ant, an open-source annotation software for regulatory compliance. Ant is designed to adapt to complex organizational processes and enable compliance experts to be in control of ML projects. By drawing on Business Process Modeling (BPM), we show that Ant can contribute to lift major technical bottlenecks to effectively implement regulatory compliance through software, such as the access to multiple sources of heterogeneous data and the integration of process complexities in the ML pipeline. We provide empirical data to validate the performance of Ant, illustrate its potential to speed up the adoption of ML in regulatory compliance, and highlight its limitations.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 4","pages":"1075 - 1110"},"PeriodicalIF":3.1,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10506-023-09372-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42450021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-31DOI: 10.1007/s10506-023-09370-x
Sungmi Park, Joshua I. James
Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.
{"title":"Lessons learned building a legal inference dataset","authors":"Sungmi Park, Joshua I. James","doi":"10.1007/s10506-023-09370-x","DOIUrl":"10.1007/s10506-023-09370-x","url":null,"abstract":"<div><p>Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 4","pages":"1011 - 1044"},"PeriodicalIF":3.1,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48589915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-19DOI: 10.1007/s10506-023-09371-w
Daniela Vianna, Edleno Silva de Moura, Altigran Soares da Silva
Technology has substantially transformed the way legal services operate in many different countries. With a large and complex collection of digitized legal documents, the judiciary system worldwide presents a promising scenario for the development of intelligent tools. In this work, we tackle the challenging task of organizing and summarizing the constantly growing collection of legal documents, uncovering hidden topics, or themes that later can support tasks such as legal case retrieval and legal judgment prediction. Our approach to this problem relies on topic discovery techniques combined with a variety of preprocessing techniques and learning-based vector representations of words, such as Doc2Vec and BERT-like models. The proposed method was validated using four different datasets composed of short and long legal documents in Brazilian Portuguese, from legal decisions to chapters in legal books. Analysis conducted by a team of legal specialists revealed the effectiveness of the proposed approach to uncover unique and relevant topics from large collections of legal documents, serving many purposes, such as giving support to legal case retrieval tools and also providing the team of legal specialists with a tool that can accelerate their work of labeling/tagging legal documents.
{"title":"A topic discovery approach for unsupervised organization of legal document collections","authors":"Daniela Vianna, Edleno Silva de Moura, Altigran Soares da Silva","doi":"10.1007/s10506-023-09371-w","DOIUrl":"10.1007/s10506-023-09371-w","url":null,"abstract":"<div><p>Technology has substantially transformed the way legal services operate in many different countries. With a large and complex collection of digitized legal documents, the judiciary system worldwide presents a promising scenario for the development of intelligent tools. In this work, we tackle the challenging task of organizing and summarizing the constantly growing collection of legal documents, uncovering hidden topics, or themes that later can support tasks such as legal case retrieval and legal judgment prediction. Our approach to this problem relies on topic discovery techniques combined with a variety of preprocessing techniques and learning-based vector representations of words, such as Doc2Vec and BERT-like models. The proposed method was validated using four different datasets composed of short and long legal documents in Brazilian Portuguese, from legal decisions to chapters in legal books. Analysis conducted by a team of legal specialists revealed the effectiveness of the proposed approach to uncover unique and relevant topics from large collections of legal documents, serving many purposes, such as giving support to legal case retrieval tools and also providing the team of legal specialists with a tool that can accelerate their work of labeling/tagging legal documents.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 4","pages":"1045 - 1074"},"PeriodicalIF":3.1,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46420377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-06DOI: 10.1007/s10506-023-09367-6
Mingruo Yuan, Ben Kao, Tien-Hsuan Wu, Michael M. K. Cheung, Henry W. H. Chan, Anne S. Y. Cheung, Felix W. H. Chan, Yongxi Chen
Access to legal information is fundamental to access to justice. Yet accessibility refers not only to making legal documents available to the public, but also rendering legal information comprehensible to them. A vexing problem in bringing legal information to the public is how to turn formal legal documents such as legislation and judgments, which are often highly technical, to easily navigable and comprehensible knowledge to those without legal education. In this study, we formulate a three-step approach for bringing legal knowledge to laypersons, tackling the issues of navigability and comprehensibility. First, we translate selected sections of the law into snippets (called CLIC-pages), each being a small piece of article that focuses on explaining certain technical legal concept in layperson’s terms. Second, we construct a Legal Question Bank, which is a collection of legal questions whose answers can be found in the CLIC-pages. Third, we design an interactive CLIC Recommender. Given a user’s verbal description of a legal situation that requires a legal solution, CRec interprets the user’s input and shortlists questions from the question bank that are most likely relevant to the given legal situation and recommends their corresponding CLIC pages where relevant legal knowledge can be found. In this paper we focus on the technical aspects of creating an LQB. We show how large-scale pre-trained language models, such as GPT-3, can be used to generate legal questions. We compare machine-generated questions against human-composed questions and find that MGQs are more scalable, cost-effective, and more diversified, while HCQs are more precise. We also show a prototype of CRec and illustrate through an example how our 3-step approach effectively brings relevant legal knowledge to the public.
{"title":"Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model","authors":"Mingruo Yuan, Ben Kao, Tien-Hsuan Wu, Michael M. K. Cheung, Henry W. H. Chan, Anne S. Y. Cheung, Felix W. H. Chan, Yongxi Chen","doi":"10.1007/s10506-023-09367-6","DOIUrl":"10.1007/s10506-023-09367-6","url":null,"abstract":"<div><p>Access to legal information is fundamental to access to justice. Yet accessibility refers not only to making legal documents available to the public, but also rendering legal information comprehensible to them. A vexing problem in bringing legal information to the public is how to turn formal legal documents such as legislation and judgments, which are often highly technical, to easily navigable and comprehensible knowledge to those without legal education. In this study, we formulate a three-step approach for bringing legal knowledge to laypersons, tackling the issues of navigability and comprehensibility. First, we translate selected sections of the law into snippets (called CLIC-pages), each being a small piece of article that focuses on explaining certain technical legal concept in layperson’s terms. Second, we construct a <i>Legal Question Bank</i>, which is a collection of legal questions whose answers can be found in the CLIC-pages. Third, we design an interactive <i>CLIC Recommender</i>. Given a user’s verbal description of a legal situation that requires a legal solution, CRec interprets the user’s input and shortlists questions from the question bank that are most likely relevant to the given legal situation and recommends their corresponding CLIC pages where relevant legal knowledge can be found. In this paper we focus on the technical aspects of creating an LQB. We show how large-scale pre-trained language models, such as GPT-3, can be used to generate legal questions. We compare machine-generated questions against human-composed questions and find that MGQs are more scalable, cost-effective, and more diversified, while HCQs are more precise. We also show a prototype of CRec and illustrate through an example how our 3-step approach effectively brings relevant legal knowledge to the public.</p></div>","PeriodicalId":51336,"journal":{"name":"Artificial Intelligence and Law","volume":"32 3","pages":"769 - 805"},"PeriodicalIF":3.1,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42058228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}