Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç
{"title":"Fast-FNet:通过高效傅立叶层加速变压器编码器模型","authors":"Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç","doi":"arxiv-2209.12816","DOIUrl":null,"url":null,"abstract":"Transformer-based language models utilize the attention mechanism for\nsubstantial performance improvements in almost all natural language processing\n(NLP) tasks. Similar attention structures are also extensively studied in\nseveral other areas. Although the attention mechanism enhances the model\nperformances significantly, its quadratic complexity prevents efficient\nprocessing of long sequences. Recent works focused on eliminating the\ndisadvantages of computational inefficiency and showed that transformer-based\nmodels can still reach competitive results without the attention layer. A\npioneering study proposed the FNet, which replaces the attention layer with the\nFourier Transform (FT) in the transformer encoder architecture. FNet achieves\ncompetitive performances concerning the original transformer encoder model\nwhile accelerating training process by removing the computational burden of the\nattention mechanism. However, the FNet model ignores essential properties of\nthe FT from the classical signal processing that can be leveraged to increase\nmodel efficiency further. We propose different methods to deploy FT efficiently\nin transformer encoder models. Our proposed architectures have smaller number\nof model parameters, shorter training times, less memory usage, and some\nadditional performance improvements. We demonstrate these improvements through\nextensive experiments on common benchmarks.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers\",\"authors\":\"Nurullah Sevim, Ege Ozan Özyedek, Furkan Şahinuç, Aykut Koç\",\"doi\":\"arxiv-2209.12816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer-based language models utilize the attention mechanism for\\nsubstantial performance improvements in almost all natural language processing\\n(NLP) tasks. Similar attention structures are also extensively studied in\\nseveral other areas. Although the attention mechanism enhances the model\\nperformances significantly, its quadratic complexity prevents efficient\\nprocessing of long sequences. Recent works focused on eliminating the\\ndisadvantages of computational inefficiency and showed that transformer-based\\nmodels can still reach competitive results without the attention layer. A\\npioneering study proposed the FNet, which replaces the attention layer with the\\nFourier Transform (FT) in the transformer encoder architecture. FNet achieves\\ncompetitive performances concerning the original transformer encoder model\\nwhile accelerating training process by removing the computational burden of the\\nattention mechanism. However, the FNet model ignores essential properties of\\nthe FT from the classical signal processing that can be leveraged to increase\\nmodel efficiency further. We propose different methods to deploy FT efficiently\\nin transformer encoder models. Our proposed architectures have smaller number\\nof model parameters, shorter training times, less memory usage, and some\\nadditional performance improvements. We demonstrate these improvements through\\nextensive experiments on common benchmarks.\",\"PeriodicalId\":501533,\"journal\":{\"name\":\"arXiv - CS - General Literature\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - General Literature\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2209.12816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2209.12816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Transformer-based language models utilize the attention mechanism for
substantial performance improvements in almost all natural language processing
(NLP) tasks. Similar attention structures are also extensively studied in
several other areas. Although the attention mechanism enhances the model
performances significantly, its quadratic complexity prevents efficient
processing of long sequences. Recent works focused on eliminating the
disadvantages of computational inefficiency and showed that transformer-based
models can still reach competitive results without the attention layer. A
pioneering study proposed the FNet, which replaces the attention layer with the
Fourier Transform (FT) in the transformer encoder architecture. FNet achieves
competitive performances concerning the original transformer encoder model
while accelerating training process by removing the computational burden of the
attention mechanism. However, the FNet model ignores essential properties of
the FT from the classical signal processing that can be leveraged to increase
model efficiency further. We propose different methods to deploy FT efficiently
in transformer encoder models. Our proposed architectures have smaller number
of model parameters, shorter training times, less memory usage, and some
additional performance improvements. We demonstrate these improvements through
extensive experiments on common benchmarks.