分数奈维贝叶 (FNB):针对简明加权选择性奈维贝叶分类器的非凸优化

Carine Hue, Marc Boullé
{"title":"分数奈维贝叶 (FNB):针对简明加权选择性奈维贝叶分类器的非凸优化","authors":"Carine Hue, Marc Boullé","doi":"arxiv-2409.11100","DOIUrl":null,"url":null,"abstract":"We study supervised classification for datasets with a very large number of\ninput variables. The na\\\"ive Bayes classifier is attractive for its simplicity,\nscalability and effectiveness in many real data applications. When the strong\nna\\\"ive Bayes assumption of conditional independence of the input variables\ngiven the target variable is not valid, variable selection and model averaging\nare two common ways to improve the performance. In the case of the na\\\"ive\nBayes classifier, the resulting weighting scheme on the models reduces to a\nweighting scheme on the variables. Here we focus on direct estimation of\nvariable weights in such a weighted na\\\"ive Bayes classifier. We propose a\nsparse regularization of the model log-likelihood, which takes into account\nprior penalization costs related to each input variable. Compared to averaging\nbased classifiers used up until now, our main goal is to obtain parsimonious\nrobust models with less variables and equivalent performance. The direct\nestimation of the variable weights amounts to a non-convex optimization problem\nfor which we propose and compare several two-stage algorithms. First, the\ncriterion obtained by convex relaxation is minimized using several variants of\nstandard gradient methods. Then, the initial non-convex optimization problem is\nsolved using local optimization methods initialized with the result of the\nfirst stage. The various proposed algorithms result in optimization-based\nweighted na\\\"ive Bayes classifiers, that are evaluated on benchmark datasets\nand positioned w.r.t. to a reference averaging-based classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier\",\"authors\":\"Carine Hue, Marc Boullé\",\"doi\":\"arxiv-2409.11100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study supervised classification for datasets with a very large number of\\ninput variables. The na\\\\\\\"ive Bayes classifier is attractive for its simplicity,\\nscalability and effectiveness in many real data applications. When the strong\\nna\\\\\\\"ive Bayes assumption of conditional independence of the input variables\\ngiven the target variable is not valid, variable selection and model averaging\\nare two common ways to improve the performance. In the case of the na\\\\\\\"ive\\nBayes classifier, the resulting weighting scheme on the models reduces to a\\nweighting scheme on the variables. Here we focus on direct estimation of\\nvariable weights in such a weighted na\\\\\\\"ive Bayes classifier. We propose a\\nsparse regularization of the model log-likelihood, which takes into account\\nprior penalization costs related to each input variable. Compared to averaging\\nbased classifiers used up until now, our main goal is to obtain parsimonious\\nrobust models with less variables and equivalent performance. The direct\\nestimation of the variable weights amounts to a non-convex optimization problem\\nfor which we propose and compare several two-stage algorithms. First, the\\ncriterion obtained by convex relaxation is minimized using several variants of\\nstandard gradient methods. Then, the initial non-convex optimization problem is\\nsolved using local optimization methods initialized with the result of the\\nfirst stage. The various proposed algorithms result in optimization-based\\nweighted na\\\\\\\"ive Bayes classifiers, that are evaluated on benchmark datasets\\nand positioned w.r.t. to a reference averaging-based classifier.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们研究了具有大量输入变量的数据集的监督分类。贝叶斯分类器因其简单性、可扩展性和在许多真实数据应用中的有效性而极具吸引力。当给定目标变量的输入变量条件独立的强贝叶斯假设不成立时,变量选择和模型平均是提高性能的两种常用方法。对于贝叶斯分类器来说,由此产生的模型加权方案可以简化为变量加权方案。在这里,我们将重点放在直接估计这种加权贝叶斯分类器中的变量权重上。我们提出了模型对数似然的稀疏正则化,它考虑了与每个输入变量相关的先前惩罚成本。与迄今为止使用的基于平均值的分类器相比,我们的主要目标是以更少的变量和同等的性能获得简洁的稳健模型。变量权重的直接估计相当于一个非凸优化问题,为此我们提出并比较了几种两阶段算法。首先,使用标准梯度法的几种变体,将凸松弛得到的标准最小化。然后,使用以第一阶段结果为初始化的局部优化方法解决初始非凸优化问题。所提出的各种算法产生了基于优化的加权贝叶斯分类器,这些分类器在基准数据集上进行了评估,并与基于参考平均的分类器进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier
We study supervised classification for datasets with a very large number of input variables. The na\"ive Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong na\"ive Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the na\"ive Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted na\"ive Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted na\"ive Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fitting Multilevel Factor Models Cartan moving frames and the data manifolds Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks Recurrent Interpolants for Probabilistic Time Series Prediction PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1