一个解决逻辑回归最佳子集选择问题的自动化精确解框架

IF 0.4 Q4 STATISTICS & PROBABILITY SOUTH AFRICAN STATISTICAL JOURNAL Pub Date : 2023-01-01 DOI:10.37920/sasj.2023.57.2.2
Thomas van Niekerk, Jacques V. Venter, Stephanus E. Terblanche
{"title":"一个解决逻辑回归最佳子集选择问题的自动化精确解框架","authors":"Thomas van Niekerk, Jacques V. Venter, Stephanus E. Terblanche","doi":"10.37920/sasj.2023.57.2.2","DOIUrl":null,"url":null,"abstract":"An automated logistic regression solution framework (ALRSF) is proposed to solve a mixed integer programming (MIP) formulation of the well known logistic regression best subset selection problem. The solution framework firstly determines the optimal number of independent variables that should be included in the model using an automated cardinality parameter selection procedure. The cardinality parameter dictates the size of the subset of variables and can be problem-specific. A novel regression parameter fixing heuristic that utilises a Benders decomposition algorithm is applied to prune the solution search space such that the optimal regression parameter values are found faster. An optimality gap is subsequently calculated to quantify the quality of the final regression model by considering the distance between the best possible log-likelihood value and a log-likelihood value that is calculated using the current parameter values. Attempts are then made to reduce the optimality gap by adjusting regression parameter values. The ALRSF serves as a holistic variable selection framework that enables the user to consider larger datasets when solving the best subset selection logistic regression problem by significantly reducing the memory requirements associated with its mixed integer programming formulation. Furthermore, the automated framework requires minimal user intervention during model training and hyperparameter tuning. Improvements in quality of the final model (when considering both the optimality gap and computing resources required to achieve a result) are observed when the ALRSF is applied to well-known real-world UCI machine learning datasets. Keywords: Best subset selection, Independent variable selection, Logistic regression, Mixed integer programming","PeriodicalId":53997,"journal":{"name":"SOUTH AFRICAN STATISTICAL JOURNAL","volume":"1 1","pages":"0"},"PeriodicalIF":0.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An automated exact solution framework towards solving the logistic regression best subset selection problem\",\"authors\":\"Thomas van Niekerk, Jacques V. Venter, Stephanus E. Terblanche\",\"doi\":\"10.37920/sasj.2023.57.2.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An automated logistic regression solution framework (ALRSF) is proposed to solve a mixed integer programming (MIP) formulation of the well known logistic regression best subset selection problem. The solution framework firstly determines the optimal number of independent variables that should be included in the model using an automated cardinality parameter selection procedure. The cardinality parameter dictates the size of the subset of variables and can be problem-specific. A novel regression parameter fixing heuristic that utilises a Benders decomposition algorithm is applied to prune the solution search space such that the optimal regression parameter values are found faster. An optimality gap is subsequently calculated to quantify the quality of the final regression model by considering the distance between the best possible log-likelihood value and a log-likelihood value that is calculated using the current parameter values. Attempts are then made to reduce the optimality gap by adjusting regression parameter values. The ALRSF serves as a holistic variable selection framework that enables the user to consider larger datasets when solving the best subset selection logistic regression problem by significantly reducing the memory requirements associated with its mixed integer programming formulation. Furthermore, the automated framework requires minimal user intervention during model training and hyperparameter tuning. Improvements in quality of the final model (when considering both the optimality gap and computing resources required to achieve a result) are observed when the ALRSF is applied to well-known real-world UCI machine learning datasets. Keywords: Best subset selection, Independent variable selection, Logistic regression, Mixed integer programming\",\"PeriodicalId\":53997,\"journal\":{\"name\":\"SOUTH AFRICAN STATISTICAL JOURNAL\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SOUTH AFRICAN STATISTICAL JOURNAL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37920/sasj.2023.57.2.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SOUTH AFRICAN STATISTICAL JOURNAL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37920/sasj.2023.57.2.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

提出了一种自动逻辑回归求解框架(ALRSF),用于求解混合整数规划(MIP)形式的逻辑回归最优子集选择问题。求解框架首先使用自动基数参数选择过程确定应包含在模型中的自变量的最优数量。基数参数指示变量子集的大小,可以是特定于问题的。利用Benders分解算法,采用一种新颖的回归参数确定启发式算法对解搜索空间进行剪枝,从而更快地找到最优回归参数值。随后计算最优性差距,通过考虑最佳可能对数似然值与使用当前参数值计算的对数似然值之间的距离来量化最终回归模型的质量。然后尝试通过调整回归参数值来减小最优性差距。ALRSF作为一个整体变量选择框架,通过显著降低与其混合整数规划公式相关的内存需求,使用户能够在解决最佳子集选择逻辑回归问题时考虑更大的数据集。此外,自动化框架在模型训练和超参数调优期间需要最少的用户干预。当将ALRSF应用于众所周知的现实世界的UCI机器学习数据集时,可以观察到最终模型质量的改进(考虑到最优性差距和实现结果所需的计算资源)。关键词:最佳子集选择,自变量选择,逻辑回归,混合整数规划
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An automated exact solution framework towards solving the logistic regression best subset selection problem
An automated logistic regression solution framework (ALRSF) is proposed to solve a mixed integer programming (MIP) formulation of the well known logistic regression best subset selection problem. The solution framework firstly determines the optimal number of independent variables that should be included in the model using an automated cardinality parameter selection procedure. The cardinality parameter dictates the size of the subset of variables and can be problem-specific. A novel regression parameter fixing heuristic that utilises a Benders decomposition algorithm is applied to prune the solution search space such that the optimal regression parameter values are found faster. An optimality gap is subsequently calculated to quantify the quality of the final regression model by considering the distance between the best possible log-likelihood value and a log-likelihood value that is calculated using the current parameter values. Attempts are then made to reduce the optimality gap by adjusting regression parameter values. The ALRSF serves as a holistic variable selection framework that enables the user to consider larger datasets when solving the best subset selection logistic regression problem by significantly reducing the memory requirements associated with its mixed integer programming formulation. Furthermore, the automated framework requires minimal user intervention during model training and hyperparameter tuning. Improvements in quality of the final model (when considering both the optimality gap and computing resources required to achieve a result) are observed when the ALRSF is applied to well-known real-world UCI machine learning datasets. Keywords: Best subset selection, Independent variable selection, Logistic regression, Mixed integer programming
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
SOUTH AFRICAN STATISTICAL JOURNAL
SOUTH AFRICAN STATISTICAL JOURNAL STATISTICS & PROBABILITY-
CiteScore
0.30
自引率
0.00%
发文量
18
期刊介绍: The journal will publish innovative contributions to the theory and application of statistics. Authoritative review articles on topics of general interest which are not readily accessible in a coherent form, will be also be considered for publication. Articles on applications or of a general nature will be published in separate sections and an author should indicate which of these sections an article is intended for. An applications article should normally consist of the analysis of actual data and need not necessarily contain new theory. The data should be made available with the article but need not necessarily be part of it.
期刊最新文献
An automated exact solution framework towards solving the logistic regression best subset selection problem Covariate construction of nonconvex windows for spatial point patterns Time-variant nonparametric extreme quantile estimation with application to US temperature data On the variance and skewness of the swap rate in a stochastic volatility interest rate model Advantages of using factorisation machines as a statistical modelling technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1