Pub Date : 2023-12-09DOI: 10.1016/j.jspi.2023.106134
Lianqiang Yang , Ying Jing , Teng Li
Maximum correntropy criterion regression (MCCR) models have been well studied within the theoretical framework of statistical learning when the scale parameters take fixed values or go to infinity. This paper studies MCCR models with tending-to-zero scale parameters. It is revealed that the optimal learning rate of MCCR models is in the asymptotic sense when the sample size goes to infinity. In the case of finite samples, the performance and robustness of MCCR, Huber and the least square regression models are compared. The applications of these three methods to real data are also demonstrated.
最大熵准则回归(MCCR)模型在尺度参数取固定值或无穷大时的统计学习理论框架内得到了很好的研究。本文研究了尺度参数趋于零的 MCCR 模型。研究发现,当样本量 n 变为无穷大时,MCCR 模型的最优学习率在渐近意义上为 O(n-1)。在有限样本的情况下,比较了 MCCR、Huber 和最小平方回归模型的性能和鲁棒性。同时还展示了这三种方法在实际数据中的应用。
{"title":"Maximum correntropy criterion regression models with tending-to-zero scale parameters","authors":"Lianqiang Yang , Ying Jing , Teng Li","doi":"10.1016/j.jspi.2023.106134","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106134","url":null,"abstract":"<div><p>Maximum correntropy criterion regression (MCCR) models have been well studied within the theoretical framework of statistical learning when the scale parameters take fixed values or go to infinity. This paper studies MCCR models with tending-to-zero scale parameters. It is revealed that the optimal learning rate of MCCR models is <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> in the asymptotic sense when the sample size <span><math><mi>n</mi></math></span> goes to infinity. In the case of finite samples, the performance and robustness of MCCR, Huber and the least square regression models are compared. The applications of these three methods to real data are also demonstrated.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138564583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-07DOI: 10.1016/j.jspi.2023.106132
Ivan Kojadinovic , Bingqing Yi
We investigate the validity of two resampling techniques when carrying out inference on the underlying unknown copula using a recently proposed class of smooth, possibly data-adaptive nonparametric estimators that contains empirical Bernstein copulas (and thus the empirical beta copula). Following Kiriliouk et al. (2021), the first resampling technique is based on drawing samples from the smooth estimator and can only can be used in the case of independent observations. The second technique is a smooth extension of the so-called sequential dependent multiplier bootstrap and can thus be used in a time series setting and, possibly, for change-point analysis. The two studied resampling schemes are applied to confidence interval construction and the offline detection of changes in the cross-sectional dependence of multivariate time series, respectively. Monte Carlo experiments confirm the possible advantages of such smooth inference procedures over their non-smooth counterparts. A by-product of this work is the study of the weak consistency and finite-sample performance of two classes of smooth estimators of the first-order partial derivatives of a copula which can have applications in mean and quantile regression.
{"title":"Resampling techniques for a class of smooth, possibly data-adaptive empirical copulas","authors":"Ivan Kojadinovic , Bingqing Yi","doi":"10.1016/j.jspi.2023.106132","DOIUrl":"10.1016/j.jspi.2023.106132","url":null,"abstract":"<div><p>We investigate the validity of two resampling techniques when carrying out inference on the underlying unknown copula<span> using a recently proposed class of smooth, possibly data-adaptive nonparametric estimators that contains empirical Bernstein copulas (and thus the empirical beta copula). Following Kiriliouk et al. (2021), the first resampling technique is based on drawing samples from the smooth estimator and can only can be used in the case of independent observations. The second technique is a smooth extension of the so-called sequential dependent multiplier bootstrap<span> and can thus be used in a time series setting and, possibly, for change-point analysis. The two studied resampling schemes are applied to confidence interval construction and the offline detection of changes in the cross-sectional dependence of multivariate time series, respectively. Monte Carlo experiments confirm the possible advantages of such smooth inference procedures over their non-smooth counterparts. A by-product of this work is the study of the weak consistency and finite-sample performance of two classes of smooth estimators of the first-order partial derivatives of a copula which can have applications in mean and quantile regression.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138554564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-05DOI: 10.1016/j.jspi.2023.106133
Tianming Zhu , Jin-Ting Zhang , Ming-Yen Cheng
Multivariate functional data are prevalent in various fields such as biology, climatology, and finance. Motivated by the World Health Data applications, in this study, we propose and examine a global test for assessing the equality of multiple mean functions in multivariate functional data. This test addresses the one-way Functional Multivariate Analysis of Variance (FMANOVA) problem, which is a fundamental issue in the analysis of multivariate functional data. While numerous analysis of variance tests have been proposed and studied for univariate functional data, only a limited number of methods have been developed for the one-way FMANOVA problem. Furthermore, our global test has the ability to handle heteroscedasticity in the unknown covariance function matrices that underlie the multivariate functional data, which is not possible with existing methods. We establish the asymptotic null distribution of the test statistic as a chi-squared-type mixture, which depends on the eigenvalues of the covariance function matrices. To approximate the null distribution, we introduce a Welch–Satterthwaite type chi-squared-approximation with consistent parameter estimation. The proposed test exhibits root- consistency, meaning it possesses nontrivial power against a local alternative. Additionally, it offers superior computational efficiency compared to several permutation-based tests. Through simulation studies and applications to the World Health Data, we highlight the advantages of our global test.
多元函数数据普遍存在于生物学、气候学和金融学等多个领域。受世界卫生数据应用的启发,在本研究中,我们提出并研究了一种用于评估多元函数数据中多个均值函数相等性的全局检验。该检验解决了单向函数多元方差分析(FMANOVA)问题,这是多元函数数据分析中的一个基本问题。虽然针对单变量函数数据提出并研究了许多方差分析检验方法,但针对单向 FMANOVA 问题开发的方法数量有限。此外,我们的全局检验能够处理多元函数数据未知协方差函数矩阵中的异方差,这是现有方法无法做到的。我们将检验统计量的渐近零分布确定为一个奇平方型混合物,它取决于协方差函数矩阵的特征值。为了近似 null 分布,我们引入了具有一致参数估计的 Welch-Satterthwaite 型奇平方近似。所提出的检验具有根 n 一致性,这意味着它对局部替代方案具有非同一般的威力。此外,与几种基于置换的检验相比,它还具有更高的计算效率。通过模拟研究和在世界健康数据中的应用,我们强调了全局检验的优势。
{"title":"A global test for heteroscedastic one-way FMANOVA with applications","authors":"Tianming Zhu , Jin-Ting Zhang , Ming-Yen Cheng","doi":"10.1016/j.jspi.2023.106133","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106133","url":null,"abstract":"<div><p><span><span>Multivariate functional data are prevalent in various fields such as biology, climatology, and finance. Motivated by the World Health Data applications, in this study, we propose and examine a global test for assessing the equality of multiple mean functions in multivariate functional data. This test addresses the one-way Functional Multivariate Analysis of Variance<span> (FMANOVA) problem, which is a fundamental issue in the analysis of multivariate functional data. While numerous analysis of variance tests have been proposed and studied for univariate functional data, only a limited number of methods have been developed for the one-way FMANOVA problem. Furthermore, our global test has the ability to handle heteroscedasticity<span> in the unknown covariance function matrices that underlie the multivariate functional data, which is not possible with existing methods. We establish the asymptotic </span></span></span>null distribution of the test statistic as a chi-squared-type mixture, which depends on the eigenvalues of the covariance function matrices. To approximate the null distribution, we introduce a Welch–Satterthwaite type chi-squared-approximation with consistent parameter estimation. The proposed test exhibits root-</span><span><math><mi>n</mi></math></span> consistency, meaning it possesses nontrivial power against a local alternative. Additionally, it offers superior computational efficiency compared to several permutation-based tests. Through simulation studies and applications to the World Health Data, we highlight the advantages of our global test.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138490271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Misclassification of binary responses, if ignored, may severely bias the maximum likelihood estimators (MLEs) of regression parameters. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification probabilities with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.
{"title":"Inference on regression model with misclassified binary response","authors":"Arindam Chatterjee , Tathagata Bandyopadhyay , Ayoushman Bhattacharya","doi":"10.1016/j.jspi.2023.106121","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106121","url":null,"abstract":"<div><p><span>Misclassification of binary responses, if ignored, may severely bias the </span>maximum likelihood estimators<span><span> (MLEs) of regression parameters<span>. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification </span></span>probabilities<span><span> with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the </span>asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138465636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-23DOI: 10.1016/j.jspi.2023.106131
Bo Hu , Dongying Wang , Fasheng Sun
Modern experiments typically involve a very large number of variables. Screening designs allow experimenters to identify active factors in a minimum number of trials. To save costs, only low-level factorial designs are considered for screening experiments, especially two- and three-level designs. In this article, we provide a systematic method to construct screening designs that contain both two- and three-level factors based on Hadamard matrices with the fold-over structure. The proposed designs have good performance in terms of D-optimal and A-optimal criteria, and the estimates of the main effects are unbiased by the second-order effects, making them very suitable for screening experiments. Besides, some theoretical results on D- and A-optimality are obtained as a by-product.
{"title":"Construction of mixed-level screening designs using Hadamard matrices","authors":"Bo Hu , Dongying Wang , Fasheng Sun","doi":"10.1016/j.jspi.2023.106131","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106131","url":null,"abstract":"<div><p>Modern experiments typically involve a very large number of variables. Screening designs allow experimenters to identify active factors in a minimum number of trials. To save costs, only low-level factorial designs are considered for screening experiments, especially two- and three-level designs. In this article, we provide a systematic method to construct screening designs that contain both two- and three-level factors based on Hadamard matrices with the fold-over structure. The proposed designs have good performance in terms of D-optimal and A-optimal criteria, and the estimates of the main effects are unbiased by the second-order effects, making them very suitable for screening experiments. Besides, some theoretical results on D- and A-optimality are obtained as a by-product.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138448366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-22DOI: 10.1016/j.jspi.2023.106130
Hengkun Zhu, Guohua Zou
Missing data is a common problem in real data analysis. In this paper, a Mallows model averaging method based on kernel regression imputation is proposed for the linear regression models with responses missing at random. We prove that our method asymptotically achieves the lowest possible squared error. Compared with the existing model averaging methods, the new method does not require the use of a parameter model to characterize the missing generation mechanism. The Monte Carlo simulation and a practical application demonstrate the usefulness of the proposed method.
{"title":"Mallows model averaging based on kernel regression imputation with responses missing at random","authors":"Hengkun Zhu, Guohua Zou","doi":"10.1016/j.jspi.2023.106130","DOIUrl":"10.1016/j.jspi.2023.106130","url":null,"abstract":"<div><p>Missing data is a common problem in real data analysis. In this paper, a Mallows model averaging method based on kernel regression imputation is proposed for the linear regression models with responses missing at random. We prove that our method asymptotically achieves the lowest possible squared error. Compared with the existing model averaging methods, the new method does not require the use of a parameter model to characterize the missing generation mechanism. The Monte Carlo simulation and a practical application demonstrate the usefulness of the proposed method.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138506779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-20DOI: 10.1016/j.jspi.2023.106123
Sameera Hewage, Yongli Sang
The categorical Gini correlation, , was proposed by Dang et al. (2021) to measure the dependence between a categorical variable, , and a numerical variable, . It has been shown that has more appealing properties than current existing dependence measurements. In this paper, we develop the jackknife empirical likelihood (JEL) method for . Confidence intervals for the Gini correlation are constructed without estimating the asymptotic variance. Adjusted and weighted JEL are explored to improve the performance of the standard JEL. Simulation studies show that our methods are competitive to existing methods in terms of coverage accuracy and shortness of confidence intervals. The proposed methods are illustrated in an application on two real datasets.
{"title":"Jackknife empirical likelihood confidence intervals for the categorical Gini correlation","authors":"Sameera Hewage, Yongli Sang","doi":"10.1016/j.jspi.2023.106123","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106123","url":null,"abstract":"<div><p>The categorical Gini correlation, <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span><span>, was proposed by Dang et al. (2021) to measure the dependence between a categorical variable, </span><span><math><mi>Y</mi></math></span>, and a numerical variable, <span><math><mi>X</mi></math></span>. It has been shown that <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span> has more appealing properties than current existing dependence measurements. In this paper, we develop the jackknife empirical likelihood (JEL) method for <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span><span>. Confidence intervals for the Gini correlation are constructed without estimating the asymptotic variance. Adjusted and weighted JEL are explored to improve the performance of the standard JEL. Simulation studies show that our methods are competitive to existing methods in terms of coverage accuracy and shortness of confidence intervals. The proposed methods are illustrated in an application on two real datasets.</span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138430387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-18DOI: 10.1016/j.jspi.2023.106122
Isadora Antoniano-Villalobos , Cristiano Villa , Stephen G. Walker
The construction of objective priors is, at best, challenging for multidimensional parameter spaces. A common practice is to assume independence and set up the joint prior as the product of marginal distributions obtained via “standard” objective methods, such as Jeffreys or reference priors. However, the assumption of independence a priori is not always reasonable, and whether it can be viewed as strictly objective is still open to discussion. In this paper, by extending a previously proposed objective approach based on scoring rules for the one dimensional case, we propose a novel objective prior for multidimensional parameter spaces which yields a dependence structure. The proposed prior has the appealing property of being proper and does not depend on the chosen model; only on the parameter space considered.
{"title":"A multidimensional objective prior distribution from a scoring rule","authors":"Isadora Antoniano-Villalobos , Cristiano Villa , Stephen G. Walker","doi":"10.1016/j.jspi.2023.106122","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106122","url":null,"abstract":"<div><p>The construction of objective priors is, at best, challenging for multidimensional parameter spaces. A common practice is to assume independence and set up the joint prior as the product of marginal distributions obtained via “standard” objective methods, such as Jeffreys or reference priors. However, the assumption of independence a priori is not always reasonable, and whether it can be viewed as strictly objective is still open to discussion. In this paper, by extending a previously proposed objective approach based on scoring rules for the one dimensional case, we propose a novel objective prior for multidimensional parameter spaces which yields a dependence structure. The proposed prior has the appealing property of being proper and does not depend on the chosen model; only on the parameter space considered.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138395335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-15DOI: 10.1016/j.jspi.2023.106120
Yifan Sun , Ziyi Liu , Wu Wang
Classical functional linear regression models the relationship between a scalar response and a functional covariate, where the coefficient function is assumed to be identical for all subjects. In this paper, the classical model is extended to allow heterogeneous coefficient functions across different subgroups of subjects. The greatest challenge is that the subgroup structure is usually unknown to us. To this end, we develop a penalization-based approach which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and coefficient functions within each subgroup. An effective computational algorithm is derived. We also establish the oracle properties and estimation consistency. Extensive numerical simulations demonstrate its superiority compared to several competing methods. The analysis of an air quality dataset leads to interesting findings and improved predictions.
{"title":"Subgroup analysis for the functional linear model","authors":"Yifan Sun , Ziyi Liu , Wu Wang","doi":"10.1016/j.jspi.2023.106120","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106120","url":null,"abstract":"<div><p>Classical functional linear regression models the relationship between a scalar response and a functional covariate, where the coefficient function is assumed to be identical for all subjects. In this paper, the classical model is extended to allow heterogeneous coefficient functions across different subgroups of subjects. The greatest challenge is that the subgroup structure is usually unknown to us. To this end, we develop a penalization-based approach which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and coefficient functions within each subgroup. An effective computational algorithm is derived. We also establish the oracle properties and estimation consistency. Extensive numerical simulations demonstrate its superiority compared to several competing methods. The analysis of an air quality dataset leads to interesting findings and improved predictions.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-13DOI: 10.1016/j.jspi.2023.106119
Mehrdad Pournaderi, Yu Xiang
The fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate (FDR) control in linear models with arbitrary design matrices (of full column rank) and it allows for finite-sample selective inference via the Lasso estimates. In this paper, we extend the theory of the knockoff procedure to tests with composite null hypotheses, which are usually more relevant to real-world problems. The main technical challenge lies in handling composite nulls in tandem with dependent features from arbitrary designs. We develop two methods for composite inference with the knockoffs, namely, shifted ordinary least-squares (S-OLS) and feature-response product perturbation (FRPP), building on new structural properties of test statistics under composite nulls. We also propose two heuristic variants of S-OLS method that outperform the celebrated Benjamini–Hochberg (BH) procedure for composite nulls, which serves as a heuristic baseline under dependent test statistics. Finally, we analyze the loss in FDR when the original knockoff procedure is naively applied on composite tests.
{"title":"Variable selection with the knockoffs: Composite null hypotheses","authors":"Mehrdad Pournaderi, Yu Xiang","doi":"10.1016/j.jspi.2023.106119","DOIUrl":"10.1016/j.jspi.2023.106119","url":null,"abstract":"<div><p>The fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate<span> (FDR) control in linear models with arbitrary design matrices<span><span> (of full column rank) and it allows for finite-sample selective inference via the Lasso estimates. In this paper, we extend the theory of the knockoff procedure to tests with composite null hypotheses, which are usually more relevant to real-world problems. The main technical challenge lies in handling composite </span>nulls in tandem with dependent features from arbitrary designs. We develop two methods for composite inference with the knockoffs, namely, shifted ordinary least-squares (S-OLS) and feature-response product perturbation (FRPP), building on new structural properties of test statistics under composite nulls. We also propose two heuristic variants of S-OLS method that outperform the celebrated Benjamini–Hochberg (BH) procedure for composite nulls, which serves as a heuristic baseline under dependent test statistics. Finally, we analyze the loss in FDR when the original knockoff procedure is naively applied on composite tests.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}