Retrieve–Revise–Refine: A novel framework for retrieval of concise entailing legal article set

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-01-01 Epub Date: 2024-11-07 DOI:10.1016/j.ipm.2024.103949

Chau Nguyen, Phuong Nguyen, Le-Minh Nguyen

{"title":"Retrieve–Revise–Refine: A novel framework for retrieval of concise entailing legal article set","authors":"Chau Nguyen, Phuong Nguyen, Le-Minh Nguyen","doi":"10.1016/j.ipm.2024.103949","DOIUrl":null,"url":null,"abstract":"<div><div>The retrieval of entailing legal article sets aims to identify a concise set of legal articles that holds an entailment relationship with a legal query or its negation. Unlike traditional information retrieval that focuses on relevance ranking, this task demands conciseness. However, prior research has inadequately addressed this need by employing traditional methods. To bridge this gap, we propose a three-stage Retrieve–Revise–Refine framework which explicitly addresses the need for conciseness by utilizing both small and large language models (LMs) in distinct yet complementary roles. Empirical evaluations on the COLIEE 2022 and 2023 datasets demonstrate that our framework significantly enhances performance, achieving absolute increases in the macro F2 score by 3.17% and 4.24% over previous state-of-the-art methods, respectively. Specifically, our Retrieve stage, employing various tailored fine-tuning strategies for small LMs, achieved a recall rate exceeding 0.90 in the top-5 results alone—ensuring comprehensive coverage of entailing articles. In the subsequent Revise stage, large LMs narrow this set, improving precision while sacrificing minimal coverage. The Refine stage further enhances precision by leveraging specialized insights from small LMs, resulting in a relative improvement of up to 19.15% in the number of concise article sets retrieved compared to previous methods. Our framework offers a promising direction for further research on specialized methods for retrieving concise sets of entailing legal articles, thereby more effectively meeting the task’s demands.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103949"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030645732400308X","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The retrieval of entailing legal article sets aims to identify a concise set of legal articles that holds an entailment relationship with a legal query or its negation. Unlike traditional information retrieval that focuses on relevance ranking, this task demands conciseness. However, prior research has inadequately addressed this need by employing traditional methods. To bridge this gap, we propose a three-stage Retrieve–Revise–Refine framework which explicitly addresses the need for conciseness by utilizing both small and large language models (LMs) in distinct yet complementary roles. Empirical evaluations on the COLIEE 2022 and 2023 datasets demonstrate that our framework significantly enhances performance, achieving absolute increases in the macro F2 score by 3.17% and 4.24% over previous state-of-the-art methods, respectively. Specifically, our Retrieve stage, employing various tailored fine-tuning strategies for small LMs, achieved a recall rate exceeding 0.90 in the top-5 results alone—ensuring comprehensive coverage of entailing articles. In the subsequent Revise stage, large LMs narrow this set, improving precision while sacrificing minimal coverage. The Refine stage further enhances precision by leveraging specialized insights from small LMs, resulting in a relative improvement of up to 19.15% in the number of concise article sets retrieved compared to previous methods. Our framework offers a promising direction for further research on specialized methods for retrieving concise sets of entailing legal articles, thereby more effectively meeting the task’s demands.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

检索-修订-再完善：检索简明内涵法律文章集的新框架

蕴涵法律条文集检索的目的是找出与法律查询或其否定具有蕴涵关系的简明法律条文集。与注重相关性排序的传统信息检索不同，这项任务要求信息简洁。然而，之前的研究采用传统方法未能充分满足这一需求。为了弥补这一不足，我们提出了一个 "检索-修订-再完善 "的三阶段框架，该框架通过利用小型和大型语言模型（LMs）发挥不同但互补的作用，明确地满足了对简洁性的需求。在 COLIEE 2022 和 2023 数据集上进行的实证评估表明，我们的框架显著提高了性能，宏 F2 分数的绝对值比以前的先进方法分别提高了 3.17% 和 4.24%。具体来说，我们的 "检索"（Retrieve）阶段针对小型 LM 采用了各种量身定制的微调策略，仅在前五名结果中的召回率就超过了 0.90，确保了对相关文章的全面覆盖。在随后的 "修订 "阶段，大型 LMs 缩小了这一范围，在提高精确度的同时牺牲了最小的覆盖范围。精炼阶段利用小型 LM 的专业见解进一步提高了精确度，与以前的方法相比，检索到的简明文章集数量相对提高了 19.15%。我们的框架为进一步研究检索包含法律条文的简明文章集的专门方法提供了一个很有前景的方向，从而更有效地满足任务的需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.