Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas
{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":null,"url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\nalternatives to gradient backpropagation, replacing the backward step of this\nlatter solver with an additional contrastive forward pass. Among these\napproaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\nachieve competitive levels of performance in terms of generalization and\ncomplexity. Networks trained using FFA learn to contrastively maximize a\nlayer-wise defined goodness score when presented with real data (denoted as\npositive samples) and to minimize it when processing synthetic data (corr.\nnegative samples). However, this algorithm still faces weaknesses that\nnegatively affect the model accuracy and training stability, primarily due to a\ngradient imbalance between positive and negative samples. To overcome this\nissue, in this work we propose a novel implementation of the FFA algorithm,\ndenoted as Polar-FFA, which extends the original formulation by introducing a\nneural division (\\emph{polarization}) between positive and negative instances.\nNeurons in each of these groups aim to maximize their goodness when presented\nwith their respective data type, thereby creating a symmetric gradient\nbehavior. To empirically gauge the improved learning capabilities of our\nproposed Polar-FFA, we perform several systematic experiments using different\nactivation and goodness functions over image classification datasets. Our\nresults demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\nconvergence speed. Furthermore, its lower reliance on hyperparameters reduces\nthe need for hyperparameter tuning to guarantee optimal generalization\ncapabilities, thereby allowing for a broader range of neural network\nconfigurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Forward-only learning algorithms have recently gained attention as
alternatives to gradient backpropagation, replacing the backward step of this
latter solver with an additional contrastive forward pass. Among these
approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to
achieve competitive levels of performance in terms of generalization and
complexity. Networks trained using FFA learn to contrastively maximize a
layer-wise defined goodness score when presented with real data (denoted as
positive samples) and to minimize it when processing synthetic data (corr.
negative samples). However, this algorithm still faces weaknesses that
negatively affect the model accuracy and training stability, primarily due to a
gradient imbalance between positive and negative samples. To overcome this
issue, in this work we propose a novel implementation of the FFA algorithm,
denoted as Polar-FFA, which extends the original formulation by introducing a
neural division (\emph{polarization}) between positive and negative instances.
Neurons in each of these groups aim to maximize their goodness when presented
with their respective data type, thereby creating a symmetric gradient
behavior. To empirically gauge the improved learning capabilities of our
proposed Polar-FFA, we perform several systematic experiments using different
activation and goodness functions over image classification datasets. Our
results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and
convergence speed. Furthermore, its lower reliance on hyperparameters reduces
the need for hyperparameter tuning to guarantee optimal generalization
capabilities, thereby allowing for a broader range of neural network
configurations.