A Robust Approach to Fine-tune Pre-trained Transformer-based models for Text Summarization through Latent Space Compression

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2022-12-01 DOI:10.1109/ICMLA55696.2022.00030

Ala Alam Falaki, R. Gras

{"title":"A Robust Approach to Fine-tune Pre-trained Transformer-based models for Text Summarization through Latent Space Compression","authors":"Ala Alam Falaki, R. Gras","doi":"10.1109/ICMLA55696.2022.00030","DOIUrl":null,"url":null,"abstract":"We proposed a technique to reduce the decoder’s number of parameters in a sequence-to-sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained Autoencoder (AE) trained on top of an encoder’s output to reduce its embedding dimension, which significantly reduces the summarizer model’s decoder size. Two experiments were performed to validate the idea: a custom seq2seq architecture with various pre-trained encoders and incorporating the approach in an encoder-decoder model (BART) for text summarization. Both studies showed promising results in terms of ROUGE score. However, the impressive outcome is the 54% decrease in the inference time and a 57% drop in GPU memory usage while fine-tuning with minimal quality loss (4.5% R1 score). It significantly reduces the hardware requirement to fine-tune large-scale pre-trained models. It is also shown that our approach can be combined with other network size reduction techniques (e.g. Distillation) to further reduce any encoder-decoder model parameters count. The implementation and checkpoints are available on GitHub.1","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We proposed a technique to reduce the decoder’s number of parameters in a sequence-to-sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained Autoencoder (AE) trained on top of an encoder’s output to reduce its embedding dimension, which significantly reduces the summarizer model’s decoder size. Two experiments were performed to validate the idea: a custom seq2seq architecture with various pre-trained encoders and incorporating the approach in an encoder-decoder model (BART) for text summarization. Both studies showed promising results in terms of ROUGE score. However, the impressive outcome is the 54% decrease in the inference time and a 57% drop in GPU memory usage while fine-tuning with minimal quality loss (4.5% R1 score). It significantly reduces the hardware requirement to fine-tune large-scale pre-trained models. It is also shown that our approach can be combined with other network size reduction techniques (e.g. Distillation) to further reduce any encoder-decoder model parameters count. The implementation and checkpoints are available on GitHub.1

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于潜在空间压缩的文本摘要预训练模型的鲁棒微调方法

我们提出了一种在序列到序列(sequence-to-sequence, seq2seq)架构中减少解码器参数数量的技术，用于自动文本摘要。这种方法使用预训练的自动编码器(AE)，在编码器的输出上进行训练，以减少其嵌入维度，从而显着减少摘要器模型的解码器大小。我们进行了两个实验来验证这个想法:一个带有各种预训练编码器的自定义seq2seq架构，并将该方法合并到用于文本摘要的编码器-解码器模型(BART)中。两项研究在ROUGE评分方面都显示出令人鼓舞的结果。然而，令人印象深刻的结果是推理时间减少了54%，GPU内存使用减少了57%，而微调的质量损失最小(R1分数为4.5%)。它大大减少了对大规模预训练模型进行微调的硬件需求。研究还表明，我们的方法可以与其他网络大小缩减技术(例如蒸馏)相结合，以进一步减少任何编码器-解码器模型参数计数。实现和检查点可以在GitHub.1上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量