An Extraction-based Approach for Vietnamese Legal Text Summarization

2023 International Conference on System Science and Engineering (ICSSE) Pub Date : 2023-07-27 DOI:10.1109/ICSSE58758.2023.10227172

Dang Le Binh, H. Minh, Quynh Ngo Diem, Duy Tran Ngoc Bao

{"title":"An Extraction-based Approach for Vietnamese Legal Text Summarization","authors":"Dang Le Binh, H. Minh, Quynh Ngo Diem, Duy Tran Ngoc Bao","doi":"10.1109/ICSSE58758.2023.10227172","DOIUrl":null,"url":null,"abstract":"The development of extractive text summarization by the support of deep learning makes a great chance for more and more methods proposed. However, with legal text, this seems to be a great challenge. Apart from the quite large number of researches on general text summarization, there are still few on the legal text summarization. The main problem may due to the complicated structures with long length, specialized vocabulary of each sentences in a legal document. To be specific, unlike general text, legal text requires a document format containing redundant formal sentences, while the main idea is just in a few sentences but widely distributed, not just in a single or few sentences. Moreover, it is also usually structured as an imperative clause, not just a normal statement. Especially with Vietnamese language, this topic seems to be entirely new with the researchers. In this paper, we will use a framework using a pretrained model and a multi-layer classification approach with different ranking methods. We will also compare different pre-trained model versions on the Vietnamese legal text dataset in order to find the best way for the summarizing task.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The development of extractive text summarization by the support of deep learning makes a great chance for more and more methods proposed. However, with legal text, this seems to be a great challenge. Apart from the quite large number of researches on general text summarization, there are still few on the legal text summarization. The main problem may due to the complicated structures with long length, specialized vocabulary of each sentences in a legal document. To be specific, unlike general text, legal text requires a document format containing redundant formal sentences, while the main idea is just in a few sentences but widely distributed, not just in a single or few sentences. Moreover, it is also usually structured as an imperative clause, not just a normal statement. Especially with Vietnamese language, this topic seems to be entirely new with the researchers. In this paper, we will use a framework using a pretrained model and a multi-layer classification approach with different ranking methods. We will also compare different pre-trained model versions on the Vietnamese legal text dataset in order to find the best way for the summarizing task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种基于抽取的越南法律文本摘要方法

在深度学习的支持下，抽取文本摘要的发展为越来越多的方法的提出提供了很大的机会。然而，对于法律文本来说，这似乎是一个巨大的挑战。除了对一般文本摘要的研究相当多外，对法律文本摘要的研究还很少。主要的问题可能是由于法律文件中每句话的结构复杂，长度长，词汇专门。具体来说，与一般文本不同，法律文本需要包含冗余形式句的文件格式，而主要思想只是在几句话中但广泛分布，而不仅仅是在一个或几句话中。此外，它的结构也通常是祈使句，而不仅仅是一个普通的陈述句。尤其是越南语，这个话题对研究人员来说似乎是全新的。在本文中，我们将使用一个使用预训练模型和多层分类方法的框架，并采用不同的排名方法。我们还将比较越南法律文本数据集上不同的预训练模型版本，以便找到总结任务的最佳方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 International Conference on System Science and Engineering (ICSSE)

自引率

0.00%

发文量