用通用难度度量理解基于难度的样本加权

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2023-01-12 DOI:10.48550/arXiv.2301.04850

Xiaoling Zhou, Ou Wu, Weiyao Zhu, Ziyang Liang

{"title":"用通用难度度量理解基于难度的样本加权","authors":"Xiaoling Zhou, Ou Wu, Weiyao Zhu, Ziyang Liang","doi":"10.48550/arXiv.2301.04850","DOIUrl":null,"url":null,"abstract":"Sample weighting is widely used in deep learning. A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights. In this study, this scheme is called difficulty-based weighting. Two important issues arise when explaining this scheme. First, a unified difficulty measure that can be theoretically guaranteed for training samples does not exist. The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty. Nevertheless, existing measures only consider a single factor or in part, but not in their entirety. Second, a comprehensive theoretical explanation is lacking with respect to demonstrating why difficulty-based weighting schemes are effective in deep learning. In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure. Furthermore, we provide formal theoretical justifications on the role of difficulty-based weighting for deep learning, consequently revealing its positive influences on both the optimization dynamics and generalization performance of deep models, which is instructive to existing weighting schemes.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"24 1","pages":"68-84"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure\",\"authors\":\"Xiaoling Zhou, Ou Wu, Weiyao Zhu, Ziyang Liang\",\"doi\":\"10.48550/arXiv.2301.04850\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sample weighting is widely used in deep learning. A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights. In this study, this scheme is called difficulty-based weighting. Two important issues arise when explaining this scheme. First, a unified difficulty measure that can be theoretically guaranteed for training samples does not exist. The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty. Nevertheless, existing measures only consider a single factor or in part, but not in their entirety. Second, a comprehensive theoretical explanation is lacking with respect to demonstrating why difficulty-based weighting schemes are effective in deep learning. In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure. Furthermore, we provide formal theoretical justifications on the role of difficulty-based weighting for deep learning, consequently revealing its positive influences on both the optimization dynamics and generalization performance of deep models, which is instructive to existing weighting schemes.\",\"PeriodicalId\":74091,\"journal\":{\"name\":\"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)\",\"volume\":\"24 1\",\"pages\":\"68-84\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.04850\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.04850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

样本加权在深度学习中有着广泛的应用。大量的加权方法本质上是利用训练样本的学习难度来计算其权重。在本研究中，这种方案被称为基于难度的加权。在解释这个方案时，出现了两个重要问题。首先，不存在理论上可以保证训练样本的统一难度度量。样本的学习困难是由噪声水平、不平衡程度、裕度和不确定性等多种因素决定的。然而，现有的措施只考虑单一因素或部分因素，而不是全部因素。其次，缺乏一个全面的理论解释来证明为什么基于难度的加权方案在深度学习中是有效的。在本研究中，我们从理论上证明了样本的泛化误差可以作为通用的难度度量。此外，我们提供了基于困难度的权重在深度学习中的作用的形式化理论证明，从而揭示了其对深度模型的优化动力学和泛化性能的积极影响，这对现有的加权方案具有指导意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure

Sample weighting is widely used in deep learning. A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights. In this study, this scheme is called difficulty-based weighting. Two important issues arise when explaining this scheme. First, a unified difficulty measure that can be theoretically guaranteed for training samples does not exist. The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty. Nevertheless, existing measures only consider a single factor or in part, but not in their entirety. Second, a comprehensive theoretical explanation is lacking with respect to demonstrating why difficulty-based weighting schemes are effective in deep learning. In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure. Furthermore, we provide formal theoretical justifications on the role of difficulty-based weighting for deep learning, consequently revealing its positive influences on both the optimization dynamics and generalization performance of deep models, which is instructive to existing weighting schemes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

自引率

0.00%

发文量