Dang Le Binh, H. Minh, Quynh Ngo Diem, Duy Tran Ngoc Bao
{"title":"An Extraction-based Approach for Vietnamese Legal Text Summarization","authors":"Dang Le Binh, H. Minh, Quynh Ngo Diem, Duy Tran Ngoc Bao","doi":"10.1109/ICSSE58758.2023.10227172","DOIUrl":null,"url":null,"abstract":"The development of extractive text summarization by the support of deep learning makes a great chance for more and more methods proposed. However, with legal text, this seems to be a great challenge. Apart from the quite large number of researches on general text summarization, there are still few on the legal text summarization. The main problem may due to the complicated structures with long length, specialized vocabulary of each sentences in a legal document. To be specific, unlike general text, legal text requires a document format containing redundant formal sentences, while the main idea is just in a few sentences but widely distributed, not just in a single or few sentences. Moreover, it is also usually structured as an imperative clause, not just a normal statement. Especially with Vietnamese language, this topic seems to be entirely new with the researchers. In this paper, we will use a framework using a pretrained model and a multi-layer classification approach with different ranking methods. We will also compare different pre-trained model versions on the Vietnamese legal text dataset in order to find the best way for the summarizing task.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The development of extractive text summarization by the support of deep learning makes a great chance for more and more methods proposed. However, with legal text, this seems to be a great challenge. Apart from the quite large number of researches on general text summarization, there are still few on the legal text summarization. The main problem may due to the complicated structures with long length, specialized vocabulary of each sentences in a legal document. To be specific, unlike general text, legal text requires a document format containing redundant formal sentences, while the main idea is just in a few sentences but widely distributed, not just in a single or few sentences. Moreover, it is also usually structured as an imperative clause, not just a normal statement. Especially with Vietnamese language, this topic seems to be entirely new with the researchers. In this paper, we will use a framework using a pretrained model and a multi-layer classification approach with different ranking methods. We will also compare different pre-trained model versions on the Vietnamese legal text dataset in order to find the best way for the summarizing task.