We show a deviation inequality for U-statistics of independent data taking values in a separable Banach space which satisfies some smoothness assumptions. We then provide applications to rates in the law of large numbers for U-statistics, a H{"o}lderian functional central limit theorem and a moment inequality for incomplete $U$-statistics.
我们展示了独立数据在可分离巴拿赫空间取值的 U 统计量的偏差不等式,该不等式满足一些平稳性假设,然后我们提供了 U 统计量大数定律中的比率、H{"o}lderian 函数中心极限定理和不完整 $U$ 统计量的矩量不等式的应用。
{"title":"Deviation and moment inequalities for Banach-valued $U$-statistics","authors":"Davide GiraudoIRMA, UNISTRA UFR MI","doi":"arxiv-2405.01902","DOIUrl":"https://doi.org/arxiv-2405.01902","url":null,"abstract":"We show a deviation inequality for U-statistics of independent data taking\u0000values in a separable Banach space which satisfies some smoothness assumptions.\u0000We then provide applications to rates in the law of large numbers for\u0000U-statistics, a H{\"o}lderian functional central limit theorem and a moment\u0000inequality for incomplete $U$-statistics.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new family of distributions indexed by the class of matrix variate contoured elliptically distribution is proposed as an extension of some bimatrix variate distributions. The termed emph{multimatrix variate distributions} open new perspectives for the classical distribution theory, usually based on probabilistic independent models and preferred untested fitting laws. Most of the multimatrix models here derived are invariant under the spherical family, a fact that solves the testing and prior knowledge of the underlying distributions and elucidates the statistical methodology in contrasts with some weakness of current studies as copulas. The paper also includes a number of diverse special cases, properties and generalisations. The new joint distributions allows several unthinkable combinations for copulas, such as scalars, vectors and matrices, all of them adjustable to the required models of the experts. The proposed joint distributions are also easily computable, then several applications are plausible. In particular, an exhaustive example in molecular docking on SARS-CoV-2 presents the results on matrix dependent samples.
{"title":"Multimatrix variate distributions","authors":"José A. Díaz-García, Francisco J. Caro-Lopera","doi":"arxiv-2405.02498","DOIUrl":"https://doi.org/arxiv-2405.02498","url":null,"abstract":"A new family of distributions indexed by the class of matrix variate\u0000contoured elliptically distribution is proposed as an extension of some\u0000bimatrix variate distributions. The termed emph{multimatrix variate\u0000distributions} open new perspectives for the classical distribution theory,\u0000usually based on probabilistic independent models and preferred untested\u0000fitting laws. Most of the multimatrix models here derived are invariant under\u0000the spherical family, a fact that solves the testing and prior knowledge of the\u0000underlying distributions and elucidates the statistical methodology in\u0000contrasts with some weakness of current studies as copulas. The paper also\u0000includes a number of diverse special cases, properties and generalisations. The\u0000new joint distributions allows several unthinkable combinations for copulas,\u0000such as scalars, vectors and matrices, all of them adjustable to the required\u0000models of the experts. The proposed joint distributions are also easily\u0000computable, then several applications are plausible. In particular, an\u0000exhaustive example in molecular docking on SARS-CoV-2 presents the results on\u0000matrix dependent samples.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antoine Godichon-BaggioniLPSM, Wei LuLMI, Bruno PortierLMI
A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of the covariance of the gradient, alongside a streaming variant for parameter updates, the study offers efficient and practical algorithms for large-scale applications. This innovative strategy significantly reduces the complexity and resource demands typically associated with full-matrix methods, enabling more effective optimization processes. Moreover, the convergence rates of the proposed estimators and their asymptotic efficiency are given. Their effectiveness is demonstrated through numerical studies.
{"title":"A Full Adagrad algorithm with O(Nd) operations","authors":"Antoine Godichon-BaggioniLPSM, Wei LuLMI, Bruno PortierLMI","doi":"arxiv-2405.01908","DOIUrl":"https://doi.org/arxiv-2405.01908","url":null,"abstract":"A novel approach is given to overcome the computational challenges of the\u0000full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic\u0000optimization. By developing a recursive method that estimates the inverse of\u0000the square root of the covariance of the gradient, alongside a streaming\u0000variant for parameter updates, the study offers efficient and practical\u0000algorithms for large-scale applications. This innovative strategy significantly\u0000reduces the complexity and resource demands typically associated with\u0000full-matrix methods, enabling more effective optimization processes. Moreover,\u0000the convergence rates of the proposed estimators and their asymptotic\u0000efficiency are given. Their effectiveness is demonstrated through numerical\u0000studies.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao
Distance correlation is a novel class of multivariate dependence measure, taking positive values between 0 and 1, and applicable to random vectors of arbitrary dimensions, not necessarily equal. It offers several advantages over the well-known Pearson correlation coefficient, the most important is that distance correlation equals zero if and only if the random vectors are independent. There are two different estimators of the distance correlation available in the literature. The first one, proposed by Sz'ekely et al. (2007), is based on an asymptotically unbiased estimator of the distance covariance which turns out to be a V-statistic. The second one builds on an unbiased estimator of the distance covariance proposed in Sz'ekely et al. (2014), proved to be an U-statistic by Sz'ekely and Huo (2016). This study evaluates their efficiency (mean squared error) and compares computational times for both methods under different dependence structures. Under conditions of independence or near-independence, the V-estimates are biased, while the U-estimator frequently cannot be computed due to negative values. To address this challenge, a convex linear combination of the former estimators is proposed and studied, yielding good results regardless of the level of dependence.
距离相关性是一类新的多元依赖性度量,其正值介于 0 和 1 之间,适用于任意维度的随机向量,不一定相等。与众所周知的皮尔逊相关系数相比,它有几个优点,其中最重要的是,如果且仅如果随机向量是独立的,则距离相关性等于零。文献中有两种不同的距离相关性估计值。第一个是 Sz'ekely 等人(2007 年)提出的,它基于距离协方差的渐近无偏估计值,该估计值被证明是一个 V 统计量。第二个估计是基于 Sz'ekely 等人(2014 年)提出的距离协方差无偏估计,Sz'ekely 和 Huo(2016 年)证明它是一个 U 统计量。本研究评估了这两种方法的效率(均方误差),并比较了这两种方法在不同依赖结构下的计算时间。在独立或近似独立的条件下,V估计值是有偏差的,而U估计值经常由于负值而无法计算。为了解决这一难题,我们提出并研究了前两种估计方法的凸线性组合,无论依赖程度如何,都能获得良好的结果。
{"title":"Improved distance correlation estimation","authors":"Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao","doi":"arxiv-2405.01958","DOIUrl":"https://doi.org/arxiv-2405.01958","url":null,"abstract":"Distance correlation is a novel class of multivariate dependence measure,\u0000taking positive values between 0 and 1, and applicable to random vectors of\u0000arbitrary dimensions, not necessarily equal. It offers several advantages over\u0000the well-known Pearson correlation coefficient, the most important is that\u0000distance correlation equals zero if and only if the random vectors are\u0000independent. There are two different estimators of the distance correlation available in\u0000the literature. The first one, proposed by Sz'ekely et al. (2007), is based on\u0000an asymptotically unbiased estimator of the distance covariance which turns out\u0000to be a V-statistic. The second one builds on an unbiased estimator of the\u0000distance covariance proposed in Sz'ekely et al. (2014), proved to be an\u0000U-statistic by Sz'ekely and Huo (2016). This study evaluates their efficiency\u0000(mean squared error) and compares computational times for both methods under\u0000different dependence structures. Under conditions of independence or\u0000near-independence, the V-estimates are biased, while the U-estimator frequently\u0000cannot be computed due to negative values. To address this challenge, a convex\u0000linear combination of the former estimators is proposed and studied, yielding\u0000good results regardless of the level of dependence.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies show that transformer-based architectures emulate gradient descent during a forward pass, contributing to in-context learning capabilities - an ability where the model adapts to new tasks based on a sequence of prompt examples without being explicitly trained or fine tuned to do so. This work investigates the generalization properties of a single step of gradient descent in the context of linear regression with well-specified models. A random design setting is considered and analytical expressions are derived for the statistical properties of generalization error in a non-asymptotic (finite sample) setting. These expressions are notable for avoiding arbitrary constants, and thus offer robust quantitative information and scaling relationships. These results are contrasted with those from classical least squares regression (for which analogous finite sample bounds are also derived), shedding light on systematic and noise components, as well as optimal step sizes. Additionally, identities involving high-order products of Gaussian random matrices are presented as a byproduct of the analysis.
{"title":"Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression","authors":"Karthik Duraisamy","doi":"arxiv-2405.02462","DOIUrl":"https://doi.org/arxiv-2405.02462","url":null,"abstract":"Recent studies show that transformer-based architectures emulate gradient\u0000descent during a forward pass, contributing to in-context learning capabilities\u0000- an ability where the model adapts to new tasks based on a sequence of prompt\u0000examples without being explicitly trained or fine tuned to do so. This work\u0000investigates the generalization properties of a single step of gradient descent\u0000in the context of linear regression with well-specified models. A random design\u0000setting is considered and analytical expressions are derived for the\u0000statistical properties of generalization error in a non-asymptotic (finite\u0000sample) setting. These expressions are notable for avoiding arbitrary\u0000constants, and thus offer robust quantitative information and scaling\u0000relationships. These results are contrasted with those from classical least\u0000squares regression (for which analogous finite sample bounds are also derived),\u0000shedding light on systematic and noise components, as well as optimal step\u0000sizes. Additionally, identities involving high-order products of Gaussian\u0000random matrices are presented as a byproduct of the analysis.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.
{"title":"Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery","authors":"Patrick Saux","doi":"arxiv-2405.01994","DOIUrl":"https://doi.org/arxiv-2405.01994","url":null,"abstract":"This thesis aims to study some of the mathematical challenges that arise in\u0000the analysis of statistical sequential decision-making algorithms for\u0000postoperative patients follow-up. Stochastic bandits (multiarmed, contextual)\u0000model the learning of a sequence of actions (policy) by an agent in an\u0000uncertain environment in order to maximise observed rewards. To learn optimal\u0000policies, bandit algorithms have to balance the exploitation of current\u0000knowledge and the exploration of uncertain actions. Such algorithms have\u0000largely been studied and deployed in industrial applications with large\u0000datasets, low-risk decisions and clear modelling assumptions, such as\u0000clickthrough rate maximisation in online advertising. By contrast, digital\u0000health recommendations call for a whole new paradigm of small samples,\u0000risk-averse agents and complex, nonparametric modelling. To this end, we\u0000developed new safe, anytime-valid concentration bounds, (Bregman, empirical\u0000Chernoff), introduced a new framework for risk-aware contextual bandits (with\u0000elicitable risk measures) and analysed a novel class of nonparametric bandit\u0000algorithms under weak assumptions (Dirichlet sampling). In addition to the\u0000theoretical guarantees, these results are supported by in-depth empirical\u0000evidence. Finally, as a first step towards personalised postoperative follow-up\u0000recommendations, we developed with medical doctors and surgeons an\u0000interpretable machine learning model to predict the long-term weight\u0000trajectories of patients after bariatric surgery.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the past decades, most work in the area of data analysis and machine learning was focused on optimizing predictive models and getting better results than what was possible with existing models. To what extent the metrics with which such improvements were measured were accurately capturing the intended goal, whether the numerical differences in the resulting values were significant, or whether uncertainty played a role in this study and if it should have been taken into account, was of secondary importance. Whereas probability theory, be it frequentist or Bayesian, used to be the gold standard in science before the advent of the supercomputer, it was quickly replaced in favor of black box models and sheer computing power because of their ability to handle large data sets. This evolution sadly happened at the expense of interpretability and trustworthiness. However, while people are still trying to improve the predictive power of their models, the community is starting to realize that for many applications it is not so much the exact prediction that is of importance, but rather the variability or uncertainty. The work in this dissertation tries to further the quest for a world where everyone is aware of uncertainty, of how important it is and how to embrace it instead of fearing it. A specific, though general, framework that allows anyone to obtain accurate uncertainty estimates is singled out and analysed. Certain aspects and applications of the framework -- dubbed `conformal prediction' -- are studied in detail. Whereas many approaches to uncertainty quantification make strong assumptions about the data, conformal prediction is, at the time of writing, the only framework that deserves the title `distribution-free'. No parametric assumptions have to be made and the nonparametric results also hold without having to resort to the law of large numbers in the asymptotic regime.
{"title":"A comparative study of conformal prediction methods for valid uncertainty quantification in machine learning","authors":"Nicolas Dewolf","doi":"arxiv-2405.02082","DOIUrl":"https://doi.org/arxiv-2405.02082","url":null,"abstract":"In the past decades, most work in the area of data analysis and machine\u0000learning was focused on optimizing predictive models and getting better results\u0000than what was possible with existing models. To what extent the metrics with\u0000which such improvements were measured were accurately capturing the intended\u0000goal, whether the numerical differences in the resulting values were\u0000significant, or whether uncertainty played a role in this study and if it\u0000should have been taken into account, was of secondary importance. Whereas\u0000probability theory, be it frequentist or Bayesian, used to be the gold standard\u0000in science before the advent of the supercomputer, it was quickly replaced in\u0000favor of black box models and sheer computing power because of their ability to\u0000handle large data sets. This evolution sadly happened at the expense of\u0000interpretability and trustworthiness. However, while people are still trying to\u0000improve the predictive power of their models, the community is starting to\u0000realize that for many applications it is not so much the exact prediction that\u0000is of importance, but rather the variability or uncertainty. The work in this dissertation tries to further the quest for a world where\u0000everyone is aware of uncertainty, of how important it is and how to embrace it\u0000instead of fearing it. A specific, though general, framework that allows anyone\u0000to obtain accurate uncertainty estimates is singled out and analysed. Certain\u0000aspects and applications of the framework -- dubbed `conformal prediction' --\u0000are studied in detail. Whereas many approaches to uncertainty quantification\u0000make strong assumptions about the data, conformal prediction is, at the time of\u0000writing, the only framework that deserves the title `distribution-free'. No\u0000parametric assumptions have to be made and the nonparametric results also hold\u0000without having to resort to the law of large numbers in the asymptotic regime.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Gapeev-Shiryaev conjecture (originating in Gapeev and Shiryaev (2011) and Gapeev and Shiryaev (2013)) can be broadly stated as follows: Monotonicity of the signal-to-noise ratio implies monotonicity of the optimal stopping boundaries. The conjecture was originally formulated both within (i) sequential testing problems for diffusion processes (where one needs to decide which of the two drifts is being indirectly observed) and (ii) quickest detection problems for diffusion processes (where one needs to detect when the initial drift changes to a new drift). In this paper we present proofs of the Gapeev-Shiryaev conjecture both in (i) the sequential testing setting (under Lipschitz/Holder coefficients of the underlying SDEs) and (ii) the quickest detection setting (under analytic coefficients of the underlying SDEs). The method of proof in the sequential testing setting relies upon a stochastic time change and pathwise comparison arguments. Both arguments break down in the quickest detection setting and get replaced by arguments arising from a stochastic maximum principle for hypoelliptic equations (satisfying Hormander's condition) that is of independent interest. Verification of the Gapeev-Shiryaev conjecture establishes the fact that sequential testing and quickest detection problems with monotone signal-to-noise ratios are amenable to known methods of solution.
{"title":"The Gapeev-Shiryaev Conjecture","authors":"Philip A. Ernst, Goran Peskir","doi":"arxiv-2405.01685","DOIUrl":"https://doi.org/arxiv-2405.01685","url":null,"abstract":"The Gapeev-Shiryaev conjecture (originating in Gapeev and Shiryaev (2011) and\u0000Gapeev and Shiryaev (2013)) can be broadly stated as follows: Monotonicity of\u0000the signal-to-noise ratio implies monotonicity of the optimal stopping\u0000boundaries. The conjecture was originally formulated both within (i) sequential\u0000testing problems for diffusion processes (where one needs to decide which of\u0000the two drifts is being indirectly observed) and (ii) quickest detection\u0000problems for diffusion processes (where one needs to detect when the initial\u0000drift changes to a new drift). In this paper we present proofs of the\u0000Gapeev-Shiryaev conjecture both in (i) the sequential testing setting (under\u0000Lipschitz/Holder coefficients of the underlying SDEs) and (ii) the quickest\u0000detection setting (under analytic coefficients of the underlying SDEs). The\u0000method of proof in the sequential testing setting relies upon a stochastic time\u0000change and pathwise comparison arguments. Both arguments break down in the\u0000quickest detection setting and get replaced by arguments arising from a\u0000stochastic maximum principle for hypoelliptic equations (satisfying Hormander's\u0000condition) that is of independent interest. Verification of the Gapeev-Shiryaev\u0000conjecture establishes the fact that sequential testing and quickest detection\u0000problems with monotone signal-to-noise ratios are amenable to known methods of\u0000solution.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, Ji Zhu
Modern complex datasets often consist of various sub-populations. To develop robust and generalizable methods in the presence of sub-population heterogeneity, it is important to guarantee a uniform learning performance instead of an average one. In many applications, prior information is often available on which sub-population or group the data points belong to. Given the observed groups of data, we develop a min-max-regret (MMR) learning framework for general supervised learning, which targets to minimize the worst-group regret. Motivated from the regret-based decision theoretic framework, the proposed MMR is distinguished from the value-based or risk-based robust learning methods in the existing literature. The regret criterion features several robustness and invariance properties simultaneously. In terms of generalizability, we develop the theoretical guarantee for the worst-case regret over a super-population of the meta data, which incorporates the observed sub-populations, their mixtures, as well as other unseen sub-populations that could be approximated by the observed ones. We demonstrate the effectiveness of our method through extensive simulation studies and an application to kidney transplantation data from hundreds of transplant centers.
{"title":"Minimax Regret Learning for Data with Heterogeneous Subgroups","authors":"Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, Ji Zhu","doi":"arxiv-2405.01709","DOIUrl":"https://doi.org/arxiv-2405.01709","url":null,"abstract":"Modern complex datasets often consist of various sub-populations. To develop\u0000robust and generalizable methods in the presence of sub-population\u0000heterogeneity, it is important to guarantee a uniform learning performance\u0000instead of an average one. In many applications, prior information is often\u0000available on which sub-population or group the data points belong to. Given the\u0000observed groups of data, we develop a min-max-regret (MMR) learning framework\u0000for general supervised learning, which targets to minimize the worst-group\u0000regret. Motivated from the regret-based decision theoretic framework, the\u0000proposed MMR is distinguished from the value-based or risk-based robust\u0000learning methods in the existing literature. The regret criterion features\u0000several robustness and invariance properties simultaneously. In terms of\u0000generalizability, we develop the theoretical guarantee for the worst-case\u0000regret over a super-population of the meta data, which incorporates the\u0000observed sub-populations, their mixtures, as well as other unseen\u0000sub-populations that could be approximated by the observed ones. We demonstrate\u0000the effectiveness of our method through extensive simulation studies and an\u0000application to kidney transplantation data from hundreds of transplant centers.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"152 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a Bayesian inference framework for two-dimensional steady-state heat conduction, focusing on the estimation of unknown distributed heat sources in a thermally-conducting medium with uniform conductivity. The goal is to infer heater locations, strengths, and shapes using temperature assimilation in the Euclidean space, employing a Fourier series to represent each heater's shape. The Markov Chain Monte Carlo (MCMC) method, incorporating the random-walk Metropolis-Hasting algorithm and parallel tempering, is utilized for posterior distribution exploration in both unbounded and wall-bounded domains. Strong correlations between heat strength and heater area prompt caution against simultaneously estimating these two quantities. It is found that multiple solutions arise in cases where the number of temperature sensors is less than the number of unknown states. Moreover, smaller heaters introduce greater uncertainty in estimated strength. The diffusive nature of heat conduction smooths out any deformations in the temperature contours, especially in the presence of multiple heaters positioned near each other, impacting convergence. In wall-bounded domains with Neumann boundary conditions, the inference of heater parameters tends to be more accurate than in unbounded domains.
{"title":"Bayesian Inference for Estimating Heat Sources through Temperature Assimilation","authors":"Hanieh Mousavi, Jeff D. Eldredge","doi":"arxiv-2405.02319","DOIUrl":"https://doi.org/arxiv-2405.02319","url":null,"abstract":"This paper introduces a Bayesian inference framework for two-dimensional\u0000steady-state heat conduction, focusing on the estimation of unknown distributed\u0000heat sources in a thermally-conducting medium with uniform conductivity. The\u0000goal is to infer heater locations, strengths, and shapes using temperature\u0000assimilation in the Euclidean space, employing a Fourier series to represent\u0000each heater's shape. The Markov Chain Monte Carlo (MCMC) method, incorporating\u0000the random-walk Metropolis-Hasting algorithm and parallel tempering, is\u0000utilized for posterior distribution exploration in both unbounded and\u0000wall-bounded domains. Strong correlations between heat strength and heater area\u0000prompt caution against simultaneously estimating these two quantities. It is\u0000found that multiple solutions arise in cases where the number of temperature\u0000sensors is less than the number of unknown states. Moreover, smaller heaters\u0000introduce greater uncertainty in estimated strength. The diffusive nature of\u0000heat conduction smooths out any deformations in the temperature contours,\u0000especially in the presence of multiple heaters positioned near each other,\u0000impacting convergence. In wall-bounded domains with Neumann boundary\u0000conditions, the inference of heater parameters tends to be more accurate than\u0000in unbounded domains.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}