Real block-circulant matrices and DCT-DST algorithm for transformer neural network

IF 1.3 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Frontiers in Applied Mathematics and Statistics Pub Date : 2023-12-12 DOI:10.3389/fams.2023.1260187
Euis Asriani, I. Muchtadi-Alamsyah, Ayu Purwarianti
{"title":"Real block-circulant matrices and DCT-DST algorithm for transformer neural network","authors":"Euis Asriani, I. Muchtadi-Alamsyah, Ayu Purwarianti","doi":"10.3389/fams.2023.1260187","DOIUrl":null,"url":null,"abstract":"In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transform (DCT-DST) algorithm, to be implemented in a transformer. We explore three transformer models that combine the use of real block-circulant matrices with different algorithms. We start from generating two orthogonal matrices, U and Q. The matrix U is spanned by the combination of the reals and imaginary parts of eigenvectors of the real block-circulant matrix, whereas Q is defined such that the matrix multiplication QU can be represented in the shape of a DCT-DST matrix. The final step is defining the Schur form of the real block-circulant matrix. We find that the matrix-vector multiplication using the DCT-DST algorithm can be defined by assigning the Kronecker product between the DCT-DST matrix and an orthogonal matrix in the same order as the dimension of the circulant matrix that spanned the real block circulant. According to the experiment's findings, the dense-real block circulant DCT-DST model with largest matrix dimension was able to reduce the number of model parameters up to 41%. The same model of 128 matrix dimension gained 26.47 of BLEU score, higher compared to the other two models on the same matrix dimensions.","PeriodicalId":36662,"journal":{"name":"Frontiers in Applied Mathematics and Statistics","volume":"52 14","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Applied Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fams.2023.1260187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transform (DCT-DST) algorithm, to be implemented in a transformer. We explore three transformer models that combine the use of real block-circulant matrices with different algorithms. We start from generating two orthogonal matrices, U and Q. The matrix U is spanned by the combination of the reals and imaginary parts of eigenvectors of the real block-circulant matrix, whereas Q is defined such that the matrix multiplication QU can be represented in the shape of a DCT-DST matrix. The final step is defining the Schur form of the real block-circulant matrix. We find that the matrix-vector multiplication using the DCT-DST algorithm can be defined by assigning the Kronecker product between the DCT-DST matrix and an orthogonal matrix in the same order as the dimension of the circulant matrix that spanned the real block circulant. According to the experiment's findings, the dense-real block circulant DCT-DST model with largest matrix dimension was able to reduce the number of model parameters up to 41%. The same model of 128 matrix dimension gained 26.47 of BLEU score, higher compared to the other two models on the same matrix dimensions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于变压器神经网络的实块循环矩阵和 DCT-DST 算法
在变压器神经网络的编码和解码过程中,每个多头注意子层和前馈子层都会发生权重矩阵-向量乘法。分配适当的权重矩阵和算法可以提高变换器的性能,尤其是在机器翻译任务中。在本研究中,我们研究了在转换器中使用实块循环矩阵和常用快速傅立叶变换(FFT)算法的替代算法,即离散余弦变换-离散正弦变换(DCT-DST)算法。我们探索了三种变压器模型,将实块环形矩阵的使用与不同的算法相结合。我们首先生成两个正交矩阵 U 和 Q。矩阵 U 的跨度是实块环矩阵特征向量的实部和虚部的组合,而 Q 的定义是矩阵乘法 QU 可以用 DCT-DST 矩阵的形状表示。最后一步是定义实块环矩阵的舒尔形式。我们发现,使用 DCT-DST 算法的矩阵向量乘法可以通过分配 DCT-DST 矩阵与正交矩阵之间的 Kronecker 乘积来定义,其顺序与跨实块环形矩阵的环形矩阵维数相同。实验结果表明,矩阵维度最大的密集实块环行 DCT-DST 模型能够减少高达 41% 的模型参数数量。矩阵维数为 128 的同一模型获得了 26.47 的 BLEU 分,高于矩阵维数相同的其他两个模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Applied Mathematics and Statistics
Frontiers in Applied Mathematics and Statistics Mathematics-Statistics and Probability
CiteScore
1.90
自引率
7.10%
发文量
117
审稿时长
14 weeks
期刊最新文献
Third-degree B-spline collocation method for singularly perturbed time delay parabolic problem with two parameters Item response theory to discriminate COVID-19 knowledge and attitudes among university students Editorial: Justified modeling frameworks and novel interpretations of ecological and epidemiological systems Pneumonia and COVID-19 co-infection modeling with optimal control analysis Enhanced corn seed disease classification: leveraging MobileNetV2 with feature augmentation and transfer learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1