Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete
{"title":"Breaking Neural Network Scaling Laws with Modularity","authors":"Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete","doi":"arxiv-2409.05780","DOIUrl":null,"url":null,"abstract":"Modular neural networks outperform nonmodular neural networks on tasks\nranging from visual question answering to robotics. These performance\nimprovements are thought to be due to modular networks' superior ability to\nmodel the compositional and combinatorial structure of real-world problems.\nHowever, a theoretical explanation of how modularity improves generalizability,\nand how to leverage task modularity while training networks remains elusive.\nUsing recent theoretical progress in explaining neural network generalization,\nwe investigate how the amount of training data required to generalize on a task\nvaries with the intrinsic dimensionality of a task's input. We show\ntheoretically that when applied to modularly structured tasks, while nonmodular\nnetworks require an exponential number of samples with task dimensionality,\nmodular networks' sample complexity is independent of task dimensionality:\nmodular networks can generalize in high dimensions. We then develop a novel\nlearning rule for modular networks to exploit this advantage and empirically\nshow the improved generalization of the rule, both in- and out-of-distribution,\non high-dimensional, modular tasks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modular neural networks outperform nonmodular neural networks on tasks
ranging from visual question answering to robotics. These performance
improvements are thought to be due to modular networks' superior ability to
model the compositional and combinatorial structure of real-world problems.
However, a theoretical explanation of how modularity improves generalizability,
and how to leverage task modularity while training networks remains elusive.
Using recent theoretical progress in explaining neural network generalization,
we investigate how the amount of training data required to generalize on a task
varies with the intrinsic dimensionality of a task's input. We show
theoretically that when applied to modularly structured tasks, while nonmodular
networks require an exponential number of samples with task dimensionality,
modular networks' sample complexity is independent of task dimensionality:
modular networks can generalize in high dimensions. We then develop a novel
learning rule for modular networks to exploit this advantage and empirically
show the improved generalization of the rule, both in- and out-of-distribution,
on high-dimensional, modular tasks.