A Robust Approach to Fine-tune Pre-trained Transformer-based models for Text Summarization through Latent Space Compression

Ala Alam Falaki, R. Gras
{"title":"A Robust Approach to Fine-tune Pre-trained Transformer-based models for Text Summarization through Latent Space Compression","authors":"Ala Alam Falaki, R. Gras","doi":"10.1109/ICMLA55696.2022.00030","DOIUrl":null,"url":null,"abstract":"We proposed a technique to reduce the decoder’s number of parameters in a sequence-to-sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained Autoencoder (AE) trained on top of an encoder’s output to reduce its embedding dimension, which significantly reduces the summarizer model’s decoder size. Two experiments were performed to validate the idea: a custom seq2seq architecture with various pre-trained encoders and incorporating the approach in an encoder-decoder model (BART) for text summarization. Both studies showed promising results in terms of ROUGE score. However, the impressive outcome is the 54% decrease in the inference time and a 57% drop in GPU memory usage while fine-tuning with minimal quality loss (4.5% R1 score). It significantly reduces the hardware requirement to fine-tune large-scale pre-trained models. It is also shown that our approach can be combined with other network size reduction techniques (e.g. Distillation) to further reduce any encoder-decoder model parameters count. The implementation and checkpoints are available on GitHub.1","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We proposed a technique to reduce the decoder’s number of parameters in a sequence-to-sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained Autoencoder (AE) trained on top of an encoder’s output to reduce its embedding dimension, which significantly reduces the summarizer model’s decoder size. Two experiments were performed to validate the idea: a custom seq2seq architecture with various pre-trained encoders and incorporating the approach in an encoder-decoder model (BART) for text summarization. Both studies showed promising results in terms of ROUGE score. However, the impressive outcome is the 54% decrease in the inference time and a 57% drop in GPU memory usage while fine-tuning with minimal quality loss (4.5% R1 score). It significantly reduces the hardware requirement to fine-tune large-scale pre-trained models. It is also shown that our approach can be combined with other network size reduction techniques (e.g. Distillation) to further reduce any encoder-decoder model parameters count. The implementation and checkpoints are available on GitHub.1
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于潜在空间压缩的文本摘要预训练模型的鲁棒微调方法
我们提出了一种在序列到序列(sequence-to-sequence, seq2seq)架构中减少解码器参数数量的技术,用于自动文本摘要。这种方法使用预训练的自动编码器(AE),在编码器的输出上进行训练,以减少其嵌入维度,从而显着减少摘要器模型的解码器大小。我们进行了两个实验来验证这个想法:一个带有各种预训练编码器的自定义seq2seq架构,并将该方法合并到用于文本摘要的编码器-解码器模型(BART)中。两项研究在ROUGE评分方面都显示出令人鼓舞的结果。然而,令人印象深刻的结果是推理时间减少了54%,GPU内存使用减少了57%,而微调的质量损失最小(R1分数为4.5%)。它大大减少了对大规模预训练模型进行微调的硬件需求。研究还表明,我们的方法可以与其他网络大小缩减技术(例如蒸馏)相结合,以进一步减少任何编码器-解码器模型参数计数。实现和检查点可以在GitHub.1上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Approximate Orthogonal Spectral Autoencoders for Community Analysis in Social Networks DeepReject and DeepRoad: Road Condition Recognition and Classification Under Adversarial Conditions Improving Aquaculture Systems using AI: Employing predictive models for Biomass Estimation on Sonar Images ICDARTS: Improving the Stability of Cyclic DARTS Symbolic Semantic Memory in Transformer Language Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1