{"title":"Hadamard估计注意变压器(HEAT):利用Hadamard积的低秩投影快速逼近变压器的点积自注意","authors":"Jasper Kyle Catapang","doi":"10.1109/ISCMI56532.2022.10068484","DOIUrl":null,"url":null,"abstract":"In this paper, the author proposes a new transformer model called Hadamard Estimated Attention Transformer or HEAT, that utilizes a low-rank projection of the Hadamard product to approximate the self-attention mechanism in standard transformer architectures and thus aiming to speedup transformer training, finetuning, and inference altogether. The study shows how it is significantly better than the original transformer that uses dot product self-attention by offering a faster way to compute the original self-attention mechanism while maintaining and ultimately surpassing the quality of the original transformer architecture. It also bests Linformer and Nyströmformer in several machine translation tasks while matching and even outperforming Nyströmformer's accuracy in various text classification tasks.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"316 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hadamard Estimated Attention Transformer (HEAT): Fast Approximation of Dot Product Self-attention for Transformers Using Low-Rank Projection of Hadamard Product\",\"authors\":\"Jasper Kyle Catapang\",\"doi\":\"10.1109/ISCMI56532.2022.10068484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the author proposes a new transformer model called Hadamard Estimated Attention Transformer or HEAT, that utilizes a low-rank projection of the Hadamard product to approximate the self-attention mechanism in standard transformer architectures and thus aiming to speedup transformer training, finetuning, and inference altogether. The study shows how it is significantly better than the original transformer that uses dot product self-attention by offering a faster way to compute the original self-attention mechanism while maintaining and ultimately surpassing the quality of the original transformer architecture. It also bests Linformer and Nyströmformer in several machine translation tasks while matching and even outperforming Nyströmformer's accuracy in various text classification tasks.\",\"PeriodicalId\":340397,\"journal\":{\"name\":\"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"volume\":\"316 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCMI56532.2022.10068484\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI56532.2022.10068484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hadamard Estimated Attention Transformer (HEAT): Fast Approximation of Dot Product Self-attention for Transformers Using Low-Rank Projection of Hadamard Product
In this paper, the author proposes a new transformer model called Hadamard Estimated Attention Transformer or HEAT, that utilizes a low-rank projection of the Hadamard product to approximate the self-attention mechanism in standard transformer architectures and thus aiming to speedup transformer training, finetuning, and inference altogether. The study shows how it is significantly better than the original transformer that uses dot product self-attention by offering a faster way to compute the original self-attention mechanism while maintaining and ultimately surpassing the quality of the original transformer architecture. It also bests Linformer and Nyströmformer in several machine translation tasks while matching and even outperforming Nyströmformer's accuracy in various text classification tasks.