{"title":"Augmenting interaction effects in convolutional networks with taylor polynomial gated units","authors":"Ligeng Zou , Qi Liu , Jianhua Dai","doi":"10.1016/j.neunet.2025.107262","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer-based vision models are often assumed to have an advantage over traditional convolutional neural networks (CNNs) due to their ability to model long-range dependencies and interactions between inputs. However, the remarkable success of pure convolutional models such as ConvNeXt, which incorporates architectural elements from Vision Transformers (ViTs), challenge the prevailing assumption about the intrinsic superiority of Transformers. In this work, we aim to explore an alternative path to efficiently express interactions between inputs without an attention module by delving into the interaction effects in ConvNeXt. This exploration leads to the proposal of a new activation function, i.e., the Taylor Polynomial Gated Unit (TPGU). The TPGU substitutes the cumulative distribution function in the Gaussian Error Linear Unit (GELU) with a learnable Taylor polynomial, so that it not only can flexibly adjust the strength of each order of interactions but also does not require additional normalization or regularization of the input and output. Comprehensive experiments demonstrate that swapping out GELUs for TPGUs notably boosts model performance under identical training settings. Moreover, empirical evidence highlights the particularly favorable impact of the TPGU on pure convolutional networks, such that it enhances the performance of ConvNeXt-T by 0.7 % on ImageNet-1K. Our findings encourage revisiting the potential utility of polynomials within contemporary neural network architectures. The code for our implementation has been made publicly available at <span><span>https://github.com/LQandlq/tpgu</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107262"},"PeriodicalIF":6.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001418","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer-based vision models are often assumed to have an advantage over traditional convolutional neural networks (CNNs) due to their ability to model long-range dependencies and interactions between inputs. However, the remarkable success of pure convolutional models such as ConvNeXt, which incorporates architectural elements from Vision Transformers (ViTs), challenge the prevailing assumption about the intrinsic superiority of Transformers. In this work, we aim to explore an alternative path to efficiently express interactions between inputs without an attention module by delving into the interaction effects in ConvNeXt. This exploration leads to the proposal of a new activation function, i.e., the Taylor Polynomial Gated Unit (TPGU). The TPGU substitutes the cumulative distribution function in the Gaussian Error Linear Unit (GELU) with a learnable Taylor polynomial, so that it not only can flexibly adjust the strength of each order of interactions but also does not require additional normalization or regularization of the input and output. Comprehensive experiments demonstrate that swapping out GELUs for TPGUs notably boosts model performance under identical training settings. Moreover, empirical evidence highlights the particularly favorable impact of the TPGU on pure convolutional networks, such that it enhances the performance of ConvNeXt-T by 0.7 % on ImageNet-1K. Our findings encourage revisiting the potential utility of polynomials within contemporary neural network architectures. The code for our implementation has been made publicly available at https://github.com/LQandlq/tpgu.
基于变压器的视觉模型通常被认为比传统的卷积神经网络(cnn)有优势,因为它们能够模拟输入之间的长期依赖关系和相互作用。然而,纯卷积模型的显著成功,如ConvNeXt,它结合了视觉变形器(ViTs)的架构元素,挑战了关于变形器内在优势的普遍假设。在这项工作中,我们的目标是通过深入研究ConvNeXt中的交互效果,探索一种替代路径,在没有注意模块的情况下有效地表达输入之间的交互。这一探索导致了一种新的激活函数的提出,即泰勒多项式门控单元(TPGU)。TPGU用可学习的泰勒多项式代替高斯误差线性单元(Gaussian Error Linear Unit, GELU)中的累积分布函数,不仅可以灵活地调整各阶相互作用的强度,而且不需要对输入和输出进行额外的归一化或正则化。综合实验表明,在相同的训练设置下,将gelu替换为tpgu显著提高了模型的性能。此外,经验证据强调了TPGU对纯卷积网络的特别有利的影响,例如它在ImageNet-1K上将ConvNeXt-T的性能提高了0.7%。我们的发现鼓励人们重新审视多项式在当代神经网络架构中的潜在效用。我们实现的代码已在https://github.com/LQandlq/tpgu上公开提供。
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.