基于核的最大相关熵准则的随机梯度下降。

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY Entropy Pub Date : 2024-12-17 DOI:10.3390/e26121104

Tiankai Li, Baobin Wang, Chaoquan Peng, Hong Yin

{"title":"基于核的最大相关熵准则的随机梯度下降。","authors":"Tiankai Li, Baobin Wang, Chaoquan Peng, Hong Yin","doi":"10.3390/e26121104","DOIUrl":null,"url":null,"abstract":"Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 12","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11675914/pdf/","citationCount":"0","resultStr":"{\"title\":\"Stochastic Gradient Descent for Kernel-Based Maximum Correntropy Criterion.\",\"authors\":\"Tiankai Li, Baobin Wang, Chaoquan Peng, Hong Yin\",\"doi\":\"10.3390/e26121104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"26 12\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11675914/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e26121104\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26121104","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

最大熵准则（MCC）已成功地应用于各种非高斯噪声场景，成为机器学习和信号处理领域的重要方法。经典最小二乘法（LS）只考虑模型的二阶矩，属于凸优化问题，与LS相比，MCC捕获了模型的高阶信息，这些信息在鲁棒学习中起着至关重要的作用，通常伴随着求解非凸优化问题。我们知道，对凸优化的理论研究已经取得了显著的成果，而对非凸优化的理论认识还远远不够成熟。受随机梯度下降法（SGD）在求解非凸问题中的广泛应用的启发，本文考虑将SGD应用于MCC的核版本，该方法已被证明对非线性结构模型中的异常值和非高斯数据具有鲁棒性。针对SGD算法应用于核MCC的现有理论结果尚不完善的问题，本文对其收敛行为进行了严格的分析，并给出了一些标准条件下的显式收敛速率。我们的工作填补了迭代过程中优化过程和收敛过程之间的空白：迭代需要收敛到全局最小值，而得到的估计量在学习过程中不能保证全局最优性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Stochastic Gradient Descent for Kernel-Based Maximum Correntropy Criterion.

Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.