Minimax Regret Learning for Data with Heterogeneous Subgroups

arXiv - MATH - Statistics Theory Pub Date : 2024-05-02 DOI:arxiv-2405.01709

Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, Ji Zhu

{"title":"Minimax Regret Learning for Data with Heterogeneous Subgroups","authors":"Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, Ji Zhu","doi":"arxiv-2405.01709","DOIUrl":null,"url":null,"abstract":"Modern complex datasets often consist of various sub-populations. To develop\nrobust and generalizable methods in the presence of sub-population\nheterogeneity, it is important to guarantee a uniform learning performance\ninstead of an average one. In many applications, prior information is often\navailable on which sub-population or group the data points belong to. Given the\nobserved groups of data, we develop a min-max-regret (MMR) learning framework\nfor general supervised learning, which targets to minimize the worst-group\nregret. Motivated from the regret-based decision theoretic framework, the\nproposed MMR is distinguished from the value-based or risk-based robust\nlearning methods in the existing literature. The regret criterion features\nseveral robustness and invariance properties simultaneously. In terms of\ngeneralizability, we develop the theoretical guarantee for the worst-case\nregret over a super-population of the meta data, which incorporates the\nobserved sub-populations, their mixtures, as well as other unseen\nsub-populations that could be approximated by the observed ones. We demonstrate\nthe effectiveness of our method through extensive simulation studies and an\napplication to kidney transplantation data from hundreds of transplant centers.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"152 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern complex datasets often consist of various sub-populations. To develop robust and generalizable methods in the presence of sub-population heterogeneity, it is important to guarantee a uniform learning performance instead of an average one. In many applications, prior information is often available on which sub-population or group the data points belong to. Given the observed groups of data, we develop a min-max-regret (MMR) learning framework for general supervised learning, which targets to minimize the worst-group regret. Motivated from the regret-based decision theoretic framework, the proposed MMR is distinguished from the value-based or risk-based robust learning methods in the existing literature. The regret criterion features several robustness and invariance properties simultaneously. In terms of generalizability, we develop the theoretical guarantee for the worst-case regret over a super-population of the meta data, which incorporates the observed sub-populations, their mixtures, as well as other unseen sub-populations that could be approximated by the observed ones. We demonstrate the effectiveness of our method through extensive simulation studies and an application to kidney transplantation data from hundreds of transplant centers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对异质分组数据的最小回归学习

现代复杂数据集通常由各种子群组成。要想在存在子群异质性的情况下开发出稳健、可推广的方法，必须保证统一的学习性能，而不是平均性能。在许多应用中，数据点属于哪个子群或组，往往可以获得先验信息。鉴于观察到的数据组，我们开发了一种用于一般监督学习的最小-最大-遗憾（MMR）学习框架，其目标是最小化最差组遗憾。受基于遗憾的决策理论框架的启发，我们提出的 MMR 有别于现有文献中基于价值或风险的鲁棒学习方法。遗憾准则同时具有稳健性和不变性等特征。在通用性方面，我们从理论上保证了元数据超群的最差后悔值，超群包括观测到的子群、它们的混合物以及可以用观测到的子群近似的其他未知子群。我们通过大量的模拟研究和对数百个移植中心的肾移植数据的应用，证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Statistics Theory

自引率

0.00%

发文量

期刊最新文献

Precision-based designs for sequential randomized experiments Strang Splitting for Parametric Inference in Second-order Stochastic Differential Equations Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection Tuning parameter selection in econometrics Limiting Behavior of Maxima under Dependence