How Does Promoting the Minority Fraction Affect Generalization? A Theoretical Study of One-Hidden-Layer Neural Network on Group Imbalance

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-03-07 DOI:10.1109/JSTSP.2024.3374593

Hongkang Li;Shuai Zhang;Yihua Zhang;Meng Wang;Sijia Liu;Pin-Yu Chen

{"title":"How Does Promoting the Minority Fraction Affect Generalization? A Theoretical Study of One-Hidden-Layer Neural Network on Group Imbalance","authors":"Hongkang Li;Shuai Zhang;Yihua Zhang;Meng Wang;Sijia Liu;Pin-Yu Chen","doi":"10.1109/JSTSP.2024.3374593","DOIUrl":null,"url":null,"abstract":"Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high \n<italic>average</i>\n accuracy is accompanied by low accuracy in a \n<italic>minority</i>\n group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 2","pages":"216-231"},"PeriodicalIF":8.7000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10462147/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

提高少数群体比例对泛化有何影响？关于群体失衡的单隐藏层神经网络理论研究

群体不平衡一直是经验风险最小化（ERM）中的一个已知问题，即在获得高平均准确率的同时，少数群体的准确率却很低。尽管在算法上努力提高少数群体的准确率，但在单个群体上对 ERM 的理论概括分析仍未实现。通过用高斯混杂模型提出组不平衡问题，本文量化了单个组对样本复杂度、收敛速度、平均测试性能和组级测试性能的影响。虽然我们的理论框架以使用单隐层神经网络的二元分类为中心，但据我们所知，除了通常研究的平均泛化性能外，我们还首次对 ERM 的组级泛化进行了理论分析。我们的理论结果的样本启示包括：当所有组级共变都处于中等水平且所有均值都接近于零时，学习性能最理想，即样本复杂度小、训练速度快、平均和组级测试精度高。此外，我们还证明，增加训练数据中少数群体的比例并不一定能提高少数群体的泛化性能。我们的理论结果在合成数据集和经验数据集上都得到了验证，如图像分类中的 CelebA 和 CIFAR-10。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Selected Topics in Signal Processing 工程技术-工程：电子与电气

CiteScore

19.00

自引率

1.30%

发文量

135

审稿时长

3 months

期刊介绍： The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.

期刊最新文献

Table of Contents Front Cover IEEE Signal Processing Society Publication Information IEEE Signal Processing Society Information 2024 Index IEEE Journal of Selected Topics in Signal Processing Vol. 18