Differentiable Slimming for Memory-Efficient Transformers

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Embedded Systems Letters Pub Date : 2023-09-25 DOI:10.1109/LES.2023.3299638

Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel

{"title":"Differentiable Slimming for Memory-Efficient Transformers","authors":"Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel","doi":"10.1109/LES.2023.3299638","DOIUrl":null,"url":null,"abstract":"Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"186-189"},"PeriodicalIF":2.0000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10261943/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内存高效变压器的可微分瘦身

变压器模型在广泛的基准测试中不断实现最先进的性能。为了满足苛刻的性能目标，模型参数的数量不断增加。因此，最先进的变形金刚需要大量的计算资源，因此无法在消费级硬件上部署它们。在文献中，过度参数化的变压器在修剪策略的帮助下成功地减小了尺寸。现有的工作缺乏以完全可微分的方式在不产生重大开销的情况下优化整个架构的能力。我们的工作提出了一种单阶段方法，用于训练Transformer进行内存效率推断和各种资源约束场景。变压器块扩展为可训练的栅极参数，这些栅极参数具有重要属性并控制信息流。它们集成到一个可微分的修剪感知训练方案中，允许在运行时以最小的性能下降提取极其稀疏的子网络。在注意头和层级别上的评估剪枝结果说明了我们训练的子网络在不同内存预算下的内存效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Embedded Systems Letters Engineering-Control and Systems Engineering

CiteScore

3.30

自引率

0.00%

发文量

期刊介绍： The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.

期刊最新文献

Editorial Table of Contents IEEE Embedded Systems Letters Publication Information Table of Contents IEEE Embedded Systems Letters Publication Information