Journal of Machine Learning Research最新文献

英文中文

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction. 用于高维风险预测的替代物辅助半监督推理。

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Jue Hou, Zijian Guo, Tianxi Cai

Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high-dimensional predictors. We further conduct a one-step bias correction to enable interval estimation for the risk prediction. Our inference procedure is valid even if both the imputation and risk prediction models are misspecified. Our novel way of ultilizing unlabelled data enables the high-dimensional statistical inference for the challenging setting with a dense risk prediction model. We present an extensive simulation study to demonstrate the superiority of our approach compared to existing supervised methods. We apply the method to genetic risk prediction of type-2 diabetes mellitus using an EHR biobank cohort.

由于无法直接观察疾病结果和高维预测因子，利用电子健康记录（EHR）数据进行风险建模具有挑战性。在本文中，我们开发了一种代用数据辅助的半监督学习方法，该方法利用了带有注释结果的小标签数据以及大量未标签的结果代用数据和高维预测因子。我们建议通过利用结果代理和高维预测因子构建稀疏估算模型来估算未观察到的结果。我们还进一步进行了一步纠偏，以实现风险预测的区间估计。即使估算模型和风险预测模型都被错误地指定，我们的推断程序也是有效的。我们采用新颖的方法来充分利用未标注数据，从而能够在具有高密度风险预测模型的挑战性环境中进行高维统计推断。我们进行了广泛的模拟研究，以证明我们的方法与现有的监督方法相比具有优越性。我们利用电子病历生物库队列将该方法应用于 2 型糖尿病遗传风险预测。

引用次数: 0

Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments. 学习具有多种治疗方法的最佳小组结构个性化治疗规则

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Haixu Ma, Donglin Zeng, Yufeng Liu

Data driven individualized decision making problems have received a lot of attentions in recent years. In particular, decision makers aim to determine the optimal Individualized Treatment Rule (ITR) so that the expected specified outcome averaging over heterogeneous patient-specific characteristics is maximized. Many existing methods deal with binary or a moderate number of treatment arms and may not take potential treatment effect structure into account. However, the effectiveness of these methods may deteriorate when the number of treatment arms becomes large. In this article, we propose GRoup Outcome Weighted Learning (GROWL) to estimate the latent structure in the treatment space and the optimal group-structured ITRs through a single optimization. In particular, for estimating group-structured ITRs, we utilize the Reinforced Angle based Multicategory Support Vector Machines (RAMSVM) to learn group-based decision rules under the weighted angle based multi-class classification framework. Fisher consistency, the excess risk bound, and the convergence rate of the value function are established to provide a theoretical guarantee for GROWL. Extensive empirical results in simulation studies and real data analysis demonstrate that GROWL enjoys better performance than several other existing methods.

近年来，数据驱动的个体化决策问题受到了广泛关注。特别是，决策者的目标是确定最佳个体化治疗规则（ITR），从而最大限度地提高平均于异质性患者特异性特征的预期特定结果。许多现有方法处理二元或中等数量的治疗臂，可能不会考虑潜在的治疗效果结构。然而，当治疗臂数量变多时，这些方法的有效性可能会下降。在本文中，我们提出了 GROWL（Group Outcome Weighted Learning）方法，通过一次优化来估计治疗空间中的潜在结构和最优组结构 ITR。特别是，为了估算组结构 ITR，我们利用基于加强角的多类支持向量机（RAMSVM），在基于加权角的多类分类框架下学习基于组的决策规则。费雪一致性、超额风险约束和价值函数收敛率的建立为 GROWL 提供了理论保证。模拟研究和实际数据分析的大量实证结果表明，GROWL 比其他几种现有方法具有更好的性能。

{"title":"Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments.","authors":"Haixu Ma, Donglin Zeng, Yufeng Liu","doi":"","DOIUrl":"","url":null,"abstract":"Data driven individualized decision making problems have received a lot of attentions in recent years. In particular, decision makers aim to determine the optimal Individualized Treatment Rule (ITR) so that the expected specified outcome averaging over heterogeneous patient-specific characteristics is maximized. Many existing methods deal with binary or a moderate number of treatment arms and may not take potential treatment effect structure into account. However, the effectiveness of these methods may deteriorate when the number of treatment arms becomes large. In this article, we propose GRoup Outcome Weighted Learning (GROWL) to estimate the latent structure in the treatment space and the optimal group-structured ITRs through a single optimization. In particular, for estimating group-structured ITRs, we utilize the Reinforced Angle based Multicategory Support Vector Machines (RAMSVM) to learn group-based decision rules under the weighted angle based multi-class classification framework. Fisher consistency, the excess risk bound, and the convergence rate of the value function are established to provide a theoretical guarantee for GROWL. Extensive empirical results in simulation studies and real data analysis demonstrate that GROWL enjoys better performance than several other existing methods.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10426767/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10019590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data. 使用神经网络对有删减和无删减数据进行条件分布函数估计。

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Bingqing Hu, Bin Nan

Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates. In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumptions, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstrained optimization methods can be applied. Through simulation studies, we show that the proposed method possesses desirable performance, whereas the partial likelihood method and the traditional neural networks with $L_{2}$ loss yields biased estimates when model assumptions are violated. We further illustrate the proposed method with several real-world data sets. The implementation of the proposed methods is made available at https://github.com/bingqing0729/NNCDE.

神经网络方面的大多数研究工作都侧重于在给定一组协变量的情况下估计连续响应变量的条件均值。在本文中，我们将考虑使用神经网络估计有删减和无删减数据的条件分布函数。该算法建立在数据结构的基础上，特别是为具有时间相关协变量的 Cox 回归所构建的数据结构。在不强加任何模型假设的情况下，我们考虑了基于全似然的损失函数，其中条件危险函数是唯一未知的非参数参数，可以应用无约束优化方法。通过模拟研究，我们发现所提出的方法具有理想的性能，而部分似然法和带有 L2 损失的传统神经网络在违反模型假设时会产生有偏差的估计值。我们还用几个真实世界的数据集进一步说明了所提出的方法。建议方法的实现可在 https://github.com/bingqing0729/NNCDE 上获得。

引用次数: 0

Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks. 学习贝叶斯网络的一致二阶圆锥整数编程

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Simge Küçükyavuz, Ali Shojaie, Hasan Manzour, Linchuan Wei, Hao-Hsiang Wu

Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear "big- $M$ " constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches.

贝叶斯网络（BN）以有向无环图（DAG）的形式表示一组随机变量（节点）之间的条件概率关系，在知识发现领域有着广泛的应用。我们研究的问题是从连续观测数据中学习 BN 的稀疏 DAG 结构。这个核心问题可以建模为一个混合整数程序，其目标函数由一个凸二次损失函数和一个正则化惩罚组成，并受到线性约束。众所周知，该数学程序的最优解在某些条件下具有理想的统计特性。然而，对于中等规模的问题，最先进的优化求解器无法在合理的计算时间内获得现有数学公式的公认最优解。为解决这一难题，我们从计算和统计两个角度着手。一方面，我们提出了一个具体的早期停止准则来终止分支与边界过程，从而获得混合整数程序的近似最优解，并建立了该近似解的一致性。另一方面，我们用二阶圆锥约束取代了表示连续和二进制指标变量之间关系的线性 "big- M "约束，从而改进了现有公式。我们的数值结果证明了所提方法的有效性。

{"title":"Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks.","authors":"Simge Küçükyavuz, Ali Shojaie, Hasan Manzour, Linchuan Wei, Hao-Hsiang Wu","doi":"","DOIUrl":"","url":null,"abstract":"Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear \"big- <math><mi>M</mi></math> \" constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141724946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays. 广义矩阵因式分解：为大型数据阵列拟合广义线性潜变量模型的高效算法。

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-11-01

Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

心理学、生态学和医学等多个领域都在研究多变量测量之间的相关性。对于高斯测量，有一些经典的工具，如因子分析或主成分分析，具有成熟的理论和快速的算法。广义线性潜变量模型（GLLVM）将这些因子模型推广到非高斯响应。然而，目前在 GLLVMs 中估计模型参数的算法需要大量计算，无法扩展到包含数千个观察单元或反应的大型数据集。在本文中，我们提出了一种将 GLLVM 拟合到高维数据集的新方法，该方法基于使用惩罚准似然法逼近模型，然后使用牛顿方法和费雪评分来学习模型参数。在计算上，我们的方法明显更快、更稳定，能对比起以前更大的矩阵进行 GLLVM 拟合。我们在一个包含 48,000 个观测单元的数据集上应用了我们的方法，每个单元中有超过 2,000 个观测物种，结果发现大部分变异性都可以用少数几个因子来解释。我们发布了我们提出的拟合算法的易用实现方法。

{"title":"Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.","authors":"Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie","doi":"","DOIUrl":"","url":null,"abstract":"Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129058/pdf/nihms-1843577.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9391635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree-based Node Aggregation in Sparse Graphical Models. 稀疏图形模型中基于树的节点聚合

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-09-01

Ines Wilms, Jacob Bien

High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications in finance and biology.

高维图形模型通常使用正则化来估算，正则化的目的是减少网络中的边的数量。在这项工作中，我们展示了如何通过聚合图形模型的节点来生成更简单的网络。我们开发了一种新的凸正则化方法，称为树状聚合图形套索（tree-aggregated graphical lasso）或标签套索（tag-lasso），可估算边缘稀疏且节点聚合的图形模型。聚合是以数据驱动的方式进行的，它利用树形的侧信息来编码节点的相似性，并方便解释所产生的聚合节点。我们通过使用局部自适应交替方向乘法提供了标签拉索的有效实现方法，并在模拟以及金融和生物学应用中说明了我们的建议的实际优势。

引用次数: 0

Reinforcement Learning Algorithm for Mixed Mean Field Control Games 混合平均场控制博弈的强化学习算法

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-05-04 DOI: 10.4208/jml.220915

Andrea Angiuli, Nils Detering, J. Fouque, M. Laurière, Jimin Lin

We present a new combined textit{mean field control game} (MFCG) problem which can be interpreted as a competitive game between collaborating groups and its solution as a Nash equilibrium between groups. Players coordinate their strategies within each group. An example is a modification of the classical trader's problem. Groups of traders maximize their wealth. They face cost for their transactions, for their own terminal positions, and for the average holding within their group. The asset price is impacted by the trades of all agents. We propose a three-timescale reinforcement learning algorithm to approximate the solution of such MFCG problems. We test the algorithm on benchmark linear-quadratic specifications for which we provide analytic solutions.

本文提出了一种textit{新的组合平均场控制博弈}问题，该问题可以解释为协作群体之间的竞争博弈，其解可以解释为群体之间的纳什均衡。玩家在每个小组中协调他们的策略。一个例子是对经典交易者问题的修正。交易员群体使他们的财富最大化。他们的交易、自己的终端头寸以及集团内的平均持仓都面临成本。资产价格受所有代理人的交易影响。我们提出了一个三时间尺度的强化学习算法来近似求解这类MFCG问题。我们在基准线性二次规范上测试了该算法，并提供了解析解。

引用次数: 6

Beyond the Quadratic Approximation: The Multiscale Structure of Neural Network Loss Landscapes 超越二次逼近:神经网络损失景观的多尺度结构

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-04-24 DOI: 10.4208/jml.220404

Chao Ma, D. Kunin, Lei Wu, Lexing Ying

A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.

神经网络损失景观的二次逼近已被广泛用于研究这些网络的优化过程。虽然它通常存在于极小值的一个很小的邻域内，但它不能解释优化过程中观察到的许多现象。在这项工作中，我们研究了神经网络损失函数的结构及其对超出良好二次逼近范围的区域优化的意义。数值上，我们观察到神经网络损失函数具有多尺度结构，表现在两个方面:(1)在最小邻域内，损失混合了尺度连续体并以次二次增长;(2)在更大的区域内，损失清晰地显示出几个独立的尺度。利用次二次增长，我们能够解释梯度下降(GD)方法所观察到的稳定性边缘现象[5]。我们使用单独的量表，通过简单的例子解释了学习率衰减的工作机制。最后，我们研究了多尺度结构的起源，提出模型的非凸性和训练数据的非均匀性是多尺度结构产生的原因之一。通过构造一个两层神经网络问题，证明了不同量级的训练数据会产生不同尺度的损失函数，产生次二次增长和多个分离尺度。

{"title":"Beyond the Quadratic Approximation: The Multiscale Structure of Neural Network Loss Landscapes","authors":"Chao Ma, D. Kunin, Lei Wu, Lexing Ying","doi":"10.4208/jml.220404","DOIUrl":"https://doi.org/10.4208/jml.220404","url":null,"abstract":"A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"49 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88018799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Tree-Values: Selective Inference for Regression Trees. 树值：回归树的选择性推理

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-01-01

Anna C Neufeld, Lucy L Gao, Daniela M Witten

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

我们考虑对分类与回归树（CART）（Breiman 等人，1984 年）算法的输出结果进行推断。如果不考虑该树是根据数据估计出来的这一事实，天真的推理方法将无法实现标准保证，如第一类错误率控制和名义覆盖率。因此，我们提出了一种选择性推断框架，用于对拟合 CART 树进行推断。简而言之，我们的条件是该树是根据数据估计出来的。我们提出了一对终端节点之间平均响应差异的检验方法，以控制选择性 1 类错误率，并提出了单个终端节点内平均响应的置信区间，以实现名义选择性覆盖。我们提供了计算必要条件集的高效算法。我们将这些方法应用于模拟和一个数据集，该数据集涉及份量控制干预与热量摄入之间的关联。

引用次数: 0

Extensions to the Proximal Distance Method of Constrained Optimization. 约束优化近距离法的扩展。

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2022-01-01

Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange

The current paper studies the problem of minimizing a loss f(x) subject to constraints of the form Dx ∈ S, where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives $f (x) + \frac{ρ}{2} dist {(D x, S)}^{2}$ involving large tuning constants ρ and the squared Euclidean distance of Dx from S. The next iterate x_n+1 of the corresponding proximal distance algorithm is constructed from the current iterate x_n by minimizing the majorizing surrogate function $f (x) + \frac{ρ}{2} {‖ D x - 𝒫_{S} (D x_{n}) ‖}^{2}$ . For fixed ρ and a subanalytic loss f(x) and a subanalytic constraint set S, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare their results to those delivered by the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems. Julia code to replicate all of our experiments can be found at https://github.com/alanderos91/ProximalDistanceAlgorithms.jl.

本文研究了在形式为Dx∈S的约束下最小化损失f(x)的问题，其中S是一个闭集，可以是凸的也可以是非凸的，D是一个融合参数的矩阵。融合约束可以捕获平滑性、稀疏性或更一般的约束模式。为了解决这类问题，我们将优化的Beltrami-Courant惩罚方法与近距离原则结合起来。后者是由最小化惩罚目标f(x)+ρ2dist(Dx,S)2驱动的，涉及大的调谐常数ρ和Dx到S的平方欧几里德距离。相应的近距离算法的下一个迭代xn+1是通过最小化最大化代理函数f(x)+ρ2‖Dx-𝒫S(Dxn)‖2从当前迭代xn构建的。对于固定的ρ和一个亚解析损失f(x)和一个亚解析约束集S，我们证明了收敛到一个平稳点。在更强的假设下，我们给出了收敛率并证明了线性局部收敛。我们还构造了一个最陡下降(SD)变量，以避免昂贵的线性系统求解。为了对我们的算法进行基准测试，我们将它们的结果与乘法器交替方向方法(ADMM)提供的结果进行了比较。我们广泛的数值测试包括度量投影、凸回归、凸聚类、全变差图像去噪以及矩阵到良好条件数的投影问题。这些实验证明了我们最陡峭的变体在高维问题上的优越速度和可接受的精度。Julia复制我们所有实验的代码可以在https://github.com/alanderos91/ProximalDistanceAlgorithms.jl上找到。

{"title":"Extensions to the Proximal Distance Method of Constrained Optimization.","authors":"Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange","doi":"","DOIUrl":"","url":null,"abstract":"The current paper studies the problem of minimizing a loss f(x) subject to constraints of the form Dx ∈ S, where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives <math><mrow><mi>f</mi><mo>(</mo><mstyle><mi>x</mi></mstyle><mo>)</mo><mo>+</mo><mfrac><mi>ρ</mi><mn>2</mn></mfrac><mtext>dist</mtext><msup><mrow><mo>(</mo><mstyle><mi>D</mi><mi>x</mi></mstyle><mo>,</mo><mi>S</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math> involving large tuning constants ρ and the squared Euclidean distance of Dx from S. The next iterate xn+1 of the corresponding proximal distance algorithm is constructed from the current iterate xn by minimizing the majorizing surrogate function <math><mrow><mi>f</mi><mo>(</mo><mstyle><mi>x</mi></mstyle><mo>)</mo><mo>+</mo><mfrac><mi>ρ</mi><mn>2</mn></mfrac><msup><mrow><mrow><mo>‖</mo><mrow><mstyle><mi>D</mi><mi>x</mi></mstyle><mo>-</mo><msub><mi>𝒫</mi><mi>S</mi></msub><mrow><mo>(</mo><mrow><mstyle><mi>D</mi></mstyle><msub><mstyle><mi>x</mi></mstyle><mi>n</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>‖</mo></mrow></mrow><mn>2</mn></msup></mrow></math>. For fixed ρ and a subanalytic loss f(x) and a subanalytic constraint set S, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare their results to those delivered by the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems. Julia code to replicate all of our experiments can be found at https://github.com/alanderos91/ProximalDistanceAlgorithms.jl.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9875590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Machine Learning Research

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀