Lawformer: A pre-trained language model for Chinese legal long documents

AI Open Pub Date : 2021-01-01 DOI:10.1016/j.aiopen.2021.06.003

Chaojun Xiao , Xueyu Hu , Zhiyuan Liu , Cunchao Tu , Maosong Sun

{"title":"Lawformer: A pre-trained language model for Chinese legal long documents","authors":"Chaojun Xiao , Xueyu Hu , Zhiyuan Liu , Cunchao Tu , Maosong Sun","doi":"10.1016/j.aiopen.2021.06.003","DOIUrl":null,"url":null,"abstract":"<div><p>Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to applying PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs. The code and parameters are available at <span>https://github.com/thunlp/LegalPLMs</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 79-84"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.06.003","citationCount":"88","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651021000176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

Abstract

Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to applying PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs. The code and parameters are available at https://github.com/thunlp/LegalPLMs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lawformer:中文法律长文件的预训练语言模型

法律人工智能(LegalAI)旨在利用人工智能技术，特别是自然语言处理(NLP)，使法律系统受益。最近，受到预训练语言模型(plm)在通用领域成功的启发，许多LegalAI研究人员致力于将plm应用于法律任务。然而，利用plm来处理法律任务仍然具有挑战性，因为法律文件通常由数千个令牌组成，这远远超过主流plm可以处理的长度。在本文中，我们发布了基于longformer的预训练语言模型Lawformer，用于中文法律长文件的理解。我们在各种LegalAI任务上对Lawformer进行了评估，包括判决预测、相似案例检索、法律阅读理解和法律问题回答。实验结果表明，我们的模型在以长文档为输入的任务上取得了很好的改进。代码和参数可在https://github.com/thunlp/LegalPLMs上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊