A Lightweight Transformer with Convolutional Attention

2020 11th International Conference on Awareness Science and Technology (iCAST) Pub Date : 2020-12-07 DOI:10.1109/iCAST51195.2020.9319489

Kungan Zeng, Incheon Paik

引用次数: 2

Abstract

Neural machine translation (NMT) goes through rapid development because of the application of various deep learning techs. Especially, how to construct a more effective structure of NMT attracts more and more attention. Transformer is a state-of-the-art architecture in NMT. It replies on the self-attention mechanism exactly instead of recurrent neural networks (RNN). The Multi-head attention is a crucial part that implements the self-attention mechanism, and it also dramatically affects the scale of the model. In this paper, we present a new Multi-head attention by combining convolution operation. In comparison with the base Transformer, our approach can reduce the number of parameters effectively. And we perform a reasoned experiment. The result shows that the performance of the new model is similar to the base model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有卷积注意的轻量级变压器

由于各种深度学习技术的应用，神经机器翻译(NMT)得到了快速发展。特别是如何构建一个更有效的网络翻译结构越来越受到人们的关注。Transformer是NMT中最先进的架构。它完全依赖于自注意机制，而不是循环神经网络(RNN)。多头注意是实现自注意机制的关键部分，它对模型的尺度影响很大。本文结合卷积运算，提出了一种新的多头注意算法。与基本变压器相比，我们的方法可以有效地减少参数的数量。我们做了一个合理的实验。结果表明，新模型的性能与基本模型相近。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 11th International Conference on Awareness Science and Technology (iCAST)

自引率

0.00%

发文量

期刊最新文献

Skeleton Guided Conflict-Free Hand Gesture Recognition for Robot Control Improved Spiking Neural Networks with multiple neurons for digit recognition A Lightweight Transformer with Convolutional Attention Social Media Mining with Dynamic Clustering: A Case Study by COVID-19 Tweets A Visual-SLAM based Line Laser Scanning System using Semantically Segmented Images