In this paper, we prove strong consistency of an estimator by the truncated singular value decomposition for a multivariate errors-in-variables linear regression model with collinearity. This result is an extension of Gleser's proof of the strong consistency of total least squares solutions to the case with modern rank constraints. While the usual discussion of consistency in the absence of solution uniqueness deals with the minimal norm solution, the contribution of this study is to develop a theory that shows the strong consistency of a set of solutions. The proof is based on properties of orthogonal projections, specifically properties of the Rayleigh-Ritz procedure for computing eigenvalues. This makes it suitable for targeting problems where some row vectors of the matrices do not contain noise. Therefore, this paper gives a proof for the regression model with the above condition on the row vectors, resulting in a natural generalization of the strong consistency for the standard TLS estimator.
{"title":"Strong consistency of an estimator by the truncated singular value decomposition for an errors-in-variables regression model with collinearity","authors":"Kensuke Aishima","doi":"arxiv-2311.17407","DOIUrl":"https://doi.org/arxiv-2311.17407","url":null,"abstract":"In this paper, we prove strong consistency of an estimator by the truncated\u0000singular value decomposition for a multivariate errors-in-variables linear\u0000regression model with collinearity. This result is an extension of Gleser's\u0000proof of the strong consistency of total least squares solutions to the case\u0000with modern rank constraints. While the usual discussion of consistency in the\u0000absence of solution uniqueness deals with the minimal norm solution, the\u0000contribution of this study is to develop a theory that shows the strong\u0000consistency of a set of solutions. The proof is based on properties of\u0000orthogonal projections, specifically properties of the Rayleigh-Ritz procedure\u0000for computing eigenvalues. This makes it suitable for targeting problems where\u0000some row vectors of the matrices do not contain noise. Therefore, this paper\u0000gives a proof for the regression model with the above condition on the row\u0000vectors, resulting in a natural generalization of the strong consistency for\u0000the standard TLS estimator.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"93 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our companion paper cite{Stojnicnflgscompyx23} introduced a very powerful emph{fully lifted} (fl) statistical interpolating/comparison mechanism for bilinearly indexed random processes. Here, we present a particular realization of such fl mechanism that relies on a stationarization along the interpolating path concept. A collection of very fundamental relations among the interpolating parameters is uncovered, contextualized, and presented. As a nice bonus, in particular special cases, we show that the introduced machinery allows various simplifications to forms readily usable in practice. Given how many well known random structures and optimization problems critically rely on the results of the type considered here, the range of applications is pretty much unlimited. We briefly point to some of these opportunities as well.
{"title":"Bilinearly indexed random processes -- emph{stationarization} of fully lifted interpolation","authors":"Mihailo Stojnic","doi":"arxiv-2311.18097","DOIUrl":"https://doi.org/arxiv-2311.18097","url":null,"abstract":"Our companion paper cite{Stojnicnflgscompyx23} introduced a very powerful\u0000emph{fully lifted} (fl) statistical interpolating/comparison mechanism for\u0000bilinearly indexed random processes. Here, we present a particular realization\u0000of such fl mechanism that relies on a stationarization along the interpolating\u0000path concept. A collection of very fundamental relations among the\u0000interpolating parameters is uncovered, contextualized, and presented. As a nice\u0000bonus, in particular special cases, we show that the introduced machinery\u0000allows various simplifications to forms readily usable in practice. Given how\u0000many well known random structures and optimization problems critically rely on\u0000the results of the type considered here, the range of applications is pretty\u0000much unlimited. We briefly point to some of these opportunities as well.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Researchers often hold the belief that random forests are "the cure to the world's ills" (Bickel, 2010). But how exactly do they achieve this? Focused on the recently introduced causal forests (Athey and Imbens, 2016; Wager and Athey, 2018), this manuscript aims to contribute to an ongoing research trend towards answering this question, proving that causal forests can adapt to the unknown covariate manifold structure. In particular, our analysis shows that a causal forest estimator can achieve the optimal rate of convergence for estimating the conditional average treatment effect, with the covariate dimension automatically replaced by the manifold dimension. These findings align with analogous observations in the realm of deep learning and resonate with the insights presented in Peter Bickel's 2004 Rietz lecture.
{"title":"On the adaptation of causal forests to manifold data","authors":"Yiyi Huo, Yingying Fan, Fang Han","doi":"arxiv-2311.16486","DOIUrl":"https://doi.org/arxiv-2311.16486","url":null,"abstract":"Researchers often hold the belief that random forests are \"the cure to the\u0000world's ills\" (Bickel, 2010). But how exactly do they achieve this? Focused on\u0000the recently introduced causal forests (Athey and Imbens, 2016; Wager and\u0000Athey, 2018), this manuscript aims to contribute to an ongoing research trend\u0000towards answering this question, proving that causal forests can adapt to the\u0000unknown covariate manifold structure. In particular, our analysis shows that a\u0000causal forest estimator can achieve the optimal rate of convergence for\u0000estimating the conditional average treatment effect, with the covariate\u0000dimension automatically replaced by the manifold dimension. These findings\u0000align with analogous observations in the realm of deep learning and resonate\u0000with the insights presented in Peter Bickel's 2004 Rietz lecture.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"92 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-frac{2beta}{2beta+1}}$ in classical nonparametric regression, where $M$ is the sample size and $beta$ represents the smoothness exponent of the radial kernel. Our study confirms this alignment for systems with a finite number of particles. We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions. The tLSE bridges the smallest eigenvalue of random matrices and Sobolev embedding. This estimator relies on nonasymptotic estimates for the left tail probability of the smallest eigenvalue of the normal matrix. The lower minimax rate is derived using the Fano-Tsybakov hypothesis testing method. Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff, and the optimal minimax rate remains intact. Our tLSE method offers a straightforward approach for establishing the optimal minimax rate for models with either local or nonlocal dependency.
{"title":"Optimal minimax rate of learning interaction kernels","authors":"Xiong Wang, Inbar Seroussi, Fei Lu","doi":"arxiv-2311.16852","DOIUrl":"https://doi.org/arxiv-2311.16852","url":null,"abstract":"Nonparametric estimation of nonlocal interaction kernels is crucial in\u0000various applications involving interacting particle systems. The inference\u0000challenge, situated at the nexus of statistical learning and inverse problems,\u0000comes from the nonlocal dependency. A central question is whether the optimal\u0000minimax rate of convergence for this problem aligns with the rate of\u0000$M^{-frac{2beta}{2beta+1}}$ in classical nonparametric regression, where $M$\u0000is the sample size and $beta$ represents the smoothness exponent of the radial\u0000kernel. Our study confirms this alignment for systems with a finite number of\u0000particles. We introduce a tamed least squares estimator (tLSE) that attains the optimal\u0000convergence rate for a broad class of exchangeable distributions. The tLSE\u0000bridges the smallest eigenvalue of random matrices and Sobolev embedding. This\u0000estimator relies on nonasymptotic estimates for the left tail probability of\u0000the smallest eigenvalue of the normal matrix. The lower minimax rate is derived\u0000using the Fano-Tsybakov hypothesis testing method. Our findings reveal that\u0000provided the inverse problem in the large sample limit satisfies a coercivity\u0000condition, the left tail probability does not alter the bias-variance tradeoff,\u0000and the optimal minimax rate remains intact. Our tLSE method offers a\u0000straightforward approach for establishing the optimal minimax rate for models\u0000with either local or nonlocal dependency.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"91 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, variable acceptance sampling plans under Type I hybrid censoring is designed for a lot of independent and identical units with exponential lifetimes using Bayesian estimate of the parameter $vartheta$. This approach is new from the conventional methods in acceptance sampling plan which relay on maximum likelihood estimate and minimising of Bayes risk. Bayesian estimate is obtained using squared error loss and Linex loss functions. Optimisation problem is solved for minimising the testing cost under each methods and optimal values of the plan parameters $n, t_1$ and $t_2$ are calculated. The proposed plans are illustrated using various examples and a real life case study is also conducted. Expected testing cost of the sampling plan obtained using squared error loss function is much lower than the cost of existing plans using maximum likelihood estimate.
{"title":"Optimal variable acceptance sampling plan for exponential distribution using Bayesian estimate under Type I hybrid censoring","authors":"Ashlyn Maria Mathai, Mahesh Kumar","doi":"arxiv-2311.16693","DOIUrl":"https://doi.org/arxiv-2311.16693","url":null,"abstract":"In this study, variable acceptance sampling plans under Type I hybrid\u0000censoring is designed for a lot of independent and identical units with\u0000exponential lifetimes using Bayesian estimate of the parameter $vartheta$.\u0000This approach is new from the conventional methods in acceptance sampling plan\u0000which relay on maximum likelihood estimate and minimising of Bayes risk.\u0000Bayesian estimate is obtained using squared error loss and Linex loss\u0000functions. Optimisation problem is solved for minimising the testing cost under\u0000each methods and optimal values of the plan parameters $n, t_1$ and $t_2$ are\u0000calculated. The proposed plans are illustrated using various examples and a\u0000real life case study is also conducted. Expected testing cost of the sampling\u0000plan obtained using squared error loss function is much lower than the cost of\u0000existing plans using maximum likelihood estimate.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"85 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shreehari Anand Bodas, Michel Mandjes, Liron Ravner
We study a multi-server queueing system with a periodic arrival rate and customers whose joining decision is based on their patience and a delay proxy. Specifically, each customer has a patience level sampled from a common distribution. Upon arrival, they receive an estimate of their delay before joining service and then join the system only if this delay is not more than their patience, otherwise they balk. The main objective is to estimate the parameters pertaining to the arrival rate and patience distribution. Here the complication factor is that this inference should be performed based on the observed process only, i.e., balking customers remain unobserved. We set up a likelihood function of the state dependent effective arrival process (i.e., corresponding to the customers who join), establish strong consistency of the MLE, and derive the asymptotic distribution of the estimation error. Due to the intrinsic non-stationarity of the Poisson arrival process, the proof techniques used in previous work become inapplicable. The novelty of the proving mechanism in this paper lies in the procedure of constructing i.i.d. objects from dependent samples by decomposing the sample path into i.i.d. regeneration cycles. The feasibility of the MLE-approach is discussed via a sequence of numerical experiments, for multiple choices of functions which provide delay estimates. In particular, it is observed that the arrival rate is best estimated at high service capacities, and the patience distribution is best estimated at lower service capacities.
{"title":"Statistical inference for a service system with non-stationary arrivals and unobserved balking","authors":"Shreehari Anand Bodas, Michel Mandjes, Liron Ravner","doi":"arxiv-2311.16884","DOIUrl":"https://doi.org/arxiv-2311.16884","url":null,"abstract":"We study a multi-server queueing system with a periodic arrival rate and\u0000customers whose joining decision is based on their patience and a delay proxy.\u0000Specifically, each customer has a patience level sampled from a common\u0000distribution. Upon arrival, they receive an estimate of their delay before\u0000joining service and then join the system only if this delay is not more than\u0000their patience, otherwise they balk. The main objective is to estimate the\u0000parameters pertaining to the arrival rate and patience distribution. Here the\u0000complication factor is that this inference should be performed based on the\u0000observed process only, i.e., balking customers remain unobserved. We set up a\u0000likelihood function of the state dependent effective arrival process (i.e.,\u0000corresponding to the customers who join), establish strong consistency of the\u0000MLE, and derive the asymptotic distribution of the estimation error. Due to the\u0000intrinsic non-stationarity of the Poisson arrival process, the proof techniques\u0000used in previous work become inapplicable. The novelty of the proving mechanism\u0000in this paper lies in the procedure of constructing i.i.d. objects from\u0000dependent samples by decomposing the sample path into i.i.d. regeneration\u0000cycles. The feasibility of the MLE-approach is discussed via a sequence of\u0000numerical experiments, for multiple choices of functions which provide delay\u0000estimates. In particular, it is observed that the arrival rate is best\u0000estimated at high service capacities, and the patience distribution is best\u0000estimated at lower service capacities.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"82 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In an acceptance monitoring system, acceptance sampling techniques are used to increase production, enhance control, and deliver higher-quality products at a lesser cost. It might not always be possible to define the acceptance sampling plan parameters as exact values, especially, when data has uncertainty. In this work, acceptance sampling plans for a large number of identical units with exponential lifetimes are obtained by treating acceptable quality life, rejectable quality life, consumer's risk, and producer's risk as fuzzy parameters. To obtain plan parameters of sequential sampling plans and repetitive group sampling plans, fuzzy hypothesis test is considered. To validate the sampling plans obtained in this work, some examples are presented. Our results are compared with existing results in the literature. Finally, to demonstrate the application of the resulting sampling plans, a real-life case study is presented.
{"title":"Design of variable acceptance sampling plan for exponential distribution under uncertainty","authors":"Mahesh Kumar, Ashlyn Maria Mathai","doi":"arxiv-2311.17111","DOIUrl":"https://doi.org/arxiv-2311.17111","url":null,"abstract":"In an acceptance monitoring system, acceptance sampling techniques are used\u0000to increase production, enhance control, and deliver higher-quality products at\u0000a lesser cost. It might not always be possible to define the acceptance\u0000sampling plan parameters as exact values, especially, when data has\u0000uncertainty. In this work, acceptance sampling plans for a large number of\u0000identical units with exponential lifetimes are obtained by treating acceptable\u0000quality life, rejectable quality life, consumer's risk, and producer's risk as\u0000fuzzy parameters. To obtain plan parameters of sequential sampling plans and\u0000repetitive group sampling plans, fuzzy hypothesis test is considered. To\u0000validate the sampling plans obtained in this work, some examples are presented.\u0000Our results are compared with existing results in the literature. Finally, to\u0000demonstrate the application of the resulting sampling plans, a real-life case\u0000study is presented.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"82 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing digital transformation of the worldwide economy, cyber risk has become a major issue. As 1 % of the world's GDP (around $1,000 billion) is allegedly lost to cybercrime every year, IT systems continue to get increasingly interconnected, making them vulnerable to accumulation phenomena that undermine the pooling mechanism of insurance. As highlighted in the literature, Hawkes processes appear to be suitable models to capture contagion phenomena and clustering features of cyber events. This paper extends the standard Hawkes modeling of cyber risk frequency by adding external shocks, modelled by the publication of cyber vulnerabilities that are deemed to increase the likelihood of attacks in the short term. The aim of the proposed model is to provide a better quantification of contagion effects since, while the standard Hawkes model allocates all the clustering phenomena to self-excitation, our model allows to capture the external common factors that may explain part of the systemic pattern. We propose a Hawkes model with two kernels, one for the endogenous factor (the contagion from other cyber events) and one for the exogenous component (cyber vulnerability publications). We use parametric exponential specifications for both the internal and exogenous intensity kernels, and we compare different methods to tackle the inference problem based on public datasets containing features of cyber attacks found in the Hackmageddon database and cyber vulnerabilities from the Known Exploited Vulnerability database and the National Vulnerability Dataset. By refining the external excitation database selection, the degree of endogeneity of the model is nearly halved. We illustrate our model with simulations and discuss the impact of taking into account the external factor driven by vulnerabilities. Once an attack has occurred, response measures are implemented to limit the effects of an attack. These measures include patching vulnerabilities and reducing the attack's contagion. We use an augmented version of the model by adding a second phase modeling a reduction in the contagion pattern from the remediation measures. Based on this model, we explore various scenarios and quantify the effect of mitigation measures of an insurance company that aims to mitigate the effects of a cyber pandemic in its insured portfolio.
{"title":"Cyber risk modeling using a two-phase Hawkes process with external excitation","authors":"Alexandre BoumezouedCREST, Yousra CherkaouiCREST, Caroline HillairetCREST","doi":"arxiv-2311.15701","DOIUrl":"https://doi.org/arxiv-2311.15701","url":null,"abstract":"With the growing digital transformation of the worldwide economy, cyber risk\u0000has become a major issue. As 1 % of the world's GDP (around $1,000 billion) is\u0000allegedly lost to cybercrime every year, IT systems continue to get\u0000increasingly interconnected, making them vulnerable to accumulation phenomena\u0000that undermine the pooling mechanism of insurance. As highlighted in the\u0000literature, Hawkes processes appear to be suitable models to capture contagion\u0000phenomena and clustering features of cyber events. This paper extends the\u0000standard Hawkes modeling of cyber risk frequency by adding external shocks,\u0000modelled by the publication of cyber vulnerabilities that are deemed to\u0000increase the likelihood of attacks in the short term. The aim of the proposed\u0000model is to provide a better quantification of contagion effects since, while\u0000the standard Hawkes model allocates all the clustering phenomena to\u0000self-excitation, our model allows to capture the external common factors that\u0000may explain part of the systemic pattern. We propose a Hawkes model with two\u0000kernels, one for the endogenous factor (the contagion from other cyber events)\u0000and one for the exogenous component (cyber vulnerability publications). We use\u0000parametric exponential specifications for both the internal and exogenous\u0000intensity kernels, and we compare different methods to tackle the inference\u0000problem based on public datasets containing features of cyber attacks found in\u0000the Hackmageddon database and cyber vulnerabilities from the Known Exploited\u0000Vulnerability database and the National Vulnerability Dataset. By refining the\u0000external excitation database selection, the degree of endogeneity of the model\u0000is nearly halved. We illustrate our model with simulations and discuss the\u0000impact of taking into account the external factor driven by vulnerabilities.\u0000Once an attack has occurred, response measures are implemented to limit the\u0000effects of an attack. These measures include patching vulnerabilities and\u0000reducing the attack's contagion. We use an augmented version of the model by\u0000adding a second phase modeling a reduction in the contagion pattern from the\u0000remediation measures. Based on this model, we explore various scenarios and\u0000quantify the effect of mitigation measures of an insurance company that aims to\u0000mitigate the effects of a cyber pandemic in its insured portfolio.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"63 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin
A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.
现代机器学习系统的一个关键挑战是实现out -of- distribution (OOD)泛化——泛化到分布与源数据不同的目标数据。尽管具有重要意义,但即使在协变量移位的标准设定下,“什么是最有效的OOD泛化算法”这一基本问题仍然存在。本文通过证明,令人惊讶的是,纯粹使用源数据(未经任何修改)的经典最大似然估计(MLE)在良好指定的设置下实现了协变量移位的最小最大最优性,从而解决了这个基本问题。也就是说,在这种情况下,没有算法比MLE表现得更好(直到一个常数因子),证明MLE是您所需要的。我们的结果适用于一类非常丰富的参数模型,并且不需要密度比的有界条件。我们通过实例化三个具体的例子来说明我们的框架的广泛适用性——线性回归、逻辑回归和相位检索。本文进一步证明了在错误设定下,最大加权似然估计(MWLE)不再是最优选择,而在某些情况下,最大加权似然估计(MWLE)出现为最小最大最优。
{"title":"Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift","authors":"Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin","doi":"arxiv-2311.15961","DOIUrl":"https://doi.org/arxiv-2311.15961","url":null,"abstract":"A key challenge of modern machine learning systems is to achieve\u0000Out-of-Distribution (OOD) generalization -- generalizing to target data whose\u0000distribution differs from that of source data. Despite its significant\u0000importance, the fundamental question of ``what are the most effective\u0000algorithms for OOD generalization'' remains open even under the standard\u0000setting of covariate shift. This paper addresses this fundamental question by\u0000proving that, surprisingly, classical Maximum Likelihood Estimation (MLE)\u0000purely using source data (without any modification) achieves the minimax\u0000optimality for covariate shift under the well-specified setting. That is, no\u0000algorithm performs better than MLE in this setting (up to a constant factor),\u0000justifying MLE is all you need. Our result holds for a very rich class of\u0000parametric models, and does not require any boundedness condition on the\u0000density ratio. We illustrate the wide applicability of our framework by\u0000instantiating it to three concrete examples -- linear regression, logistic\u0000regression, and phase retrieval. This paper further complement the study by\u0000proving that, under the misspecified setting, MLE is no longer the optimal\u0000choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax\u0000optimal in certain scenarios.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Su, Panxu Yuan, Qingyang Sun, Mengxi Yi, Gaorong Li
The recently proposed fixed-X knockoff is a powerful variable selection procedure that controls the false discovery rate (FDR) in any finite-sample setting, yet its theoretical insights are difficult to show beyond Gaussian linear models. In this paper, we make the first attempt to extend the fixed-X knockoff to partially linear models by using generalized knockoff features, and propose a new stability generalized knockoff (Stab-GKnock) procedure by incorporating selection probability as feature importance score. We provide FDR control and power guarantee under some regularity conditions. In addition, we propose a two-stage method under high dimensionality by introducing a new joint feature screening procedure, with guaranteed sure screening property. Extensive simulation studies are conducted to evaluate the finite-sample performance of the proposed method. A real data example is also provided for illustration.
{"title":"Stab-GKnock: Controlled variable selection for partially linear models using generalized knockoffs","authors":"Han Su, Panxu Yuan, Qingyang Sun, Mengxi Yi, Gaorong Li","doi":"arxiv-2311.15982","DOIUrl":"https://doi.org/arxiv-2311.15982","url":null,"abstract":"The recently proposed fixed-X knockoff is a powerful variable selection\u0000procedure that controls the false discovery rate (FDR) in any finite-sample\u0000setting, yet its theoretical insights are difficult to show beyond Gaussian\u0000linear models. In this paper, we make the first attempt to extend the fixed-X\u0000knockoff to partially linear models by using generalized knockoff features, and\u0000propose a new stability generalized knockoff (Stab-GKnock) procedure by\u0000incorporating selection probability as feature importance score. We provide FDR\u0000control and power guarantee under some regularity conditions. In addition, we\u0000propose a two-stage method under high dimensionality by introducing a new joint\u0000feature screening procedure, with guaranteed sure screening property. Extensive\u0000simulation studies are conducted to evaluate the finite-sample performance of\u0000the proposed method. A real data example is also provided for illustration.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"45 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}