BERT-VBD:越南语多文档摘要框架

Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong
{"title":"BERT-VBD:越南语多文档摘要框架","authors":"Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong","doi":"arxiv-2409.12134","DOIUrl":null,"url":null,"abstract":"In tackling the challenge of Multi-Document Summarization (MDS), numerous\nmethods have been proposed, spanning both extractive and abstractive\nsummarization techniques. However, each approach has its own limitations,\nmaking it less effective to rely solely on either one. An emerging and\npromising strategy involves a synergistic fusion of extractive and abstractive\nsummarization methods. Despite the plethora of studies in this domain, research\non the combined methodology remains scarce, particularly in the context of\nVietnamese language processing. This paper presents a novel Vietnamese MDS\nframework leveraging a two-component pipeline architecture that integrates\nextractive and abstractive techniques. The first component employs an\nextractive approach to identify key sentences within each document. This is\nachieved by a modification of the pre-trained BERT network, which derives\nsemantically meaningful phrase embeddings using siamese and triplet network\nstructures. The second component utilizes the VBD-LLaMA2-7B-50b model for\nabstractive summarization, ultimately generating the final summary document.\nOur proposed framework demonstrates a positive performance, attaining ROUGE-2\nscores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art\nbaselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BERT-VBD: Vietnamese Multi-Document Summarization Framework\",\"authors\":\"Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong\",\"doi\":\"arxiv-2409.12134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In tackling the challenge of Multi-Document Summarization (MDS), numerous\\nmethods have been proposed, spanning both extractive and abstractive\\nsummarization techniques. However, each approach has its own limitations,\\nmaking it less effective to rely solely on either one. An emerging and\\npromising strategy involves a synergistic fusion of extractive and abstractive\\nsummarization methods. Despite the plethora of studies in this domain, research\\non the combined methodology remains scarce, particularly in the context of\\nVietnamese language processing. This paper presents a novel Vietnamese MDS\\nframework leveraging a two-component pipeline architecture that integrates\\nextractive and abstractive techniques. The first component employs an\\nextractive approach to identify key sentences within each document. This is\\nachieved by a modification of the pre-trained BERT network, which derives\\nsemantically meaningful phrase embeddings using siamese and triplet network\\nstructures. The second component utilizes the VBD-LLaMA2-7B-50b model for\\nabstractive summarization, ultimately generating the final summary document.\\nOur proposed framework demonstrates a positive performance, attaining ROUGE-2\\nscores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art\\nbaselines.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在应对多文档摘要(MDS)这一挑战的过程中,人们提出了许多方法,其中既有提取摘要技术,也有抽象摘要技术。然而,每种方法都有其自身的局限性,因此仅依靠其中一种方法的效果并不理想。一种新兴的、有前途的策略涉及提取和抽象摘要方法的协同融合。尽管在这一领域有大量的研究,但关于融合方法的研究仍然很少,尤其是在越南语语言处理方面。本文介绍了一种新颖的越南语 MDS 框架,该框架采用双组件流水线架构,整合了提取和抽象技术。第一部分采用提取方法来识别每个文档中的关键句。这是通过修改预先训练的 BERT 网络来实现的,该网络使用连体和三连体网络结构推导出有意义的短语嵌入。第二部分利用 VBD-LaMA2-7B-50b 模型进行抽象总结,最终生成最终的总结文档。我们提出的框架表现出了积极的性能,在 VN-MDS 数据集上的 ROUGE-2 分数达到了 39.6%,超过了现有的基准线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BERT-VBD: Vietnamese Multi-Document Summarization Framework
In tackling the challenge of Multi-Document Summarization (MDS), numerous methods have been proposed, spanning both extractive and abstractive summarization techniques. However, each approach has its own limitations, making it less effective to rely solely on either one. An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods. Despite the plethora of studies in this domain, research on the combined methodology remains scarce, particularly in the context of Vietnamese language processing. This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture that integrates extractive and abstractive techniques. The first component employs an extractive approach to identify key sentences within each document. This is achieved by a modification of the pre-trained BERT network, which derives semantically meaningful phrase embeddings using siamese and triplet network structures. The second component utilizes the VBD-LLaMA2-7B-50b model for abstractive summarization, ultimately generating the final summary document. Our proposed framework demonstrates a positive performance, attaining ROUGE-2 scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1