首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Adaptively robust high-dimensional matrix factor analysis under Huber loss function 胡贝尔损失函数下的自适应鲁棒高维矩阵因子分析
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-20 DOI: 10.1016/j.jspi.2023.106137
Yinzhi Wang , Yingqiu Zhu , Qiang Sun , Lei Qin

The explosion of data volume and the expansion in data dimensionality have led to a critical challenge in analyzing high-dimensional matrix time series for big data-related applications. In this regard, factor models for matrix-valued high-dimensional time series provide a powerful tool, that reduces the dimensionality of the variables with low-rank structures. However, existing high-dimensional matrix factor models suffer from two limitations in complex scenarios. One is that it is difficult to make robust inferences for datasets with heavy-tailed distributions. The other is that existing models require additional parameters for fine-tuning to guarantee performance. We propose an adaptively robust high-dimensional matrix factor model based on a specified Huber loss function to tackle the challenges mentioned above. An efficient iterative algorithm is provided to consistently determine the additional parameters of our model for robust estimation. The robustness of the model estimation is greatly improved by incorporating the Huber loss. Furthermore, we theoretically investigate the proposed method and derive the convergence rates of the robust estimators to examine its performance. Simulations show that the proposed method outperforms previous models in the estimation of heavy-tailed data. A real-world data analysis on a financial portfolio dataset illustrates that the method can be used to extract useful knowledge from high-dimensional matrix time series.

数据量的爆炸式增长和数据维度的扩大,给大数据相关应用中的高维矩阵时间序列分析带来了严峻挑战。在这方面,用于矩阵值高维时间序列的因子模型提供了一个强大的工具,可以降低具有低秩结构的变量的维度。然而,现有的高维矩阵因子模型在复杂场景中存在两个局限性。一是难以对重尾分布的数据集进行稳健推断。另一个是现有模型需要额外的参数进行微调才能保证性能。我们提出了一种基于指定 Huber 损失函数的自适应稳健高维矩阵因子模型,以应对上述挑战。我们提供了一种高效的迭代算法,以持续确定模型的附加参数,从而实现稳健估计。加入 Huber 损失后,模型估计的稳健性大大提高。此外,我们从理论上研究了所提出的方法,并推导出稳健估计器的收敛率,以检验其性能。模拟结果表明,在重尾数据的估计中,所提出的方法优于之前的模型。对金融投资组合数据集的实际数据分析表明,该方法可用于从高维矩阵时间序列中提取有用的知识。
{"title":"Adaptively robust high-dimensional matrix factor analysis under Huber loss function","authors":"Yinzhi Wang ,&nbsp;Yingqiu Zhu ,&nbsp;Qiang Sun ,&nbsp;Lei Qin","doi":"10.1016/j.jspi.2023.106137","DOIUrl":"10.1016/j.jspi.2023.106137","url":null,"abstract":"<div><p>The explosion of data volume and the expansion in data dimensionality have led to a critical challenge in analyzing high-dimensional matrix time series for big data-related applications. In this regard, factor models for matrix-valued high-dimensional time series provide a powerful tool, that reduces the dimensionality of the variables with low-rank structures. However, existing high-dimensional matrix factor models suffer from two limitations in complex scenarios. One is that it is difficult to make robust inferences for datasets with heavy-tailed distributions. The other is that existing models require additional parameters for fine-tuning to guarantee performance. We propose an adaptively robust high-dimensional matrix factor model based on a specified Huber loss function to tackle the challenges mentioned above. An efficient iterative algorithm is provided to consistently determine the additional parameters of our model for robust estimation. The robustness of the model estimation is greatly improved by incorporating the Huber loss. Furthermore, we theoretically investigate the proposed method and derive the convergence rates of the robust estimators to examine its performance. Simulations show that the proposed method outperforms previous models in the estimation of heavy-tailed data. A real-world data analysis on a financial portfolio dataset illustrates that the method can be used to extract useful knowledge from high-dimensional matrix time series.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106137"},"PeriodicalIF":0.9,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138817793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal subsampling for the Cox proportional hazards model with massive survival data 大量生存数据的考克斯比例危害模型的最佳子采样
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-19 DOI: 10.1016/j.jspi.2023.106136
Nan Qiao , Wangcheng Li , Feng Xiao , Cunjie Lin

Massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for Cox proportional hazards model with time-dependent covariates when the sample size is extraordinarily large but the computing resources are relatively limited. A subsample estimator is developed by maximizing a weighted partial likelihood, and shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expression. Simulation studies show that the proposed method has satisfactory performances in approximating the full data estimator. The proposed method is applied to the corporate loan data and breast cancer data, with different censoring rates, and the outcome also confirms the practical advantages.

海量生存数据已成为生存分析中的常见现象。本研究提出了一种子采样算法,用于具有时间依赖协变量的 Cox 比例危险模型,当样本量超大但计算资源相对有限时。通过最大化加权部分似然建立了一个子样本估计器,并证明其具有一致性和渐近正态性。通过最小化子样本估计器的渐近均方误差,用明确的表达式提出了最优子样本概率。模拟研究表明,所提出的方法在逼近完整数据估计器方面具有令人满意的性能。该方法被应用于具有不同删失率的企业贷款数据和乳腺癌数据,结果也证实了其实用优势。
{"title":"Optimal subsampling for the Cox proportional hazards model with massive survival data","authors":"Nan Qiao ,&nbsp;Wangcheng Li ,&nbsp;Feng Xiao ,&nbsp;Cunjie Lin","doi":"10.1016/j.jspi.2023.106136","DOIUrl":"10.1016/j.jspi.2023.106136","url":null,"abstract":"<div><p><span><span>Massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for </span>Cox proportional hazards model with time-dependent </span>covariates<span> when the sample size is extraordinarily large but the computing resources are relatively limited. A subsample estimator is developed by maximizing a weighted partial likelihood, and shown to have consistency and asymptotic normality<span>. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expression. Simulation studies show that the proposed method has satisfactory performances in approximating the full data estimator. The proposed method is applied to the corporate loan data and breast cancer data, with different censoring rates, and the outcome also confirms the practical advantages.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106136"},"PeriodicalIF":0.9,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138817749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression analysis of longitudinal data with mixed synchronous and asynchronous longitudinal covariates 使用混合同步和非同步纵向协变量对纵向数据进行回归分析
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-09 DOI: 10.1016/j.jspi.2023.106135
Zhuowei Sun , Hongyuan Cao , Li Chen , Jason P. Fine

In linear models, omitting a covariate that is orthogonal to covariates in the model does not result in biased coefficient estimation. This generally does not hold for longitudinal data, where additional assumptions are needed to get an unbiased coefficient estimation in addition to the orthogonality between omitted longitudinal covariates and longitudinal covariates in the model. We propose methods to mitigate the omitted variable bias under weaker assumptions. A two-step estimation procedure is proposed to infer the asynchronous longitudinal covariates when such covariates are observed. For mixed synchronous and asynchronous longitudinal covariates, we get a parametric convergence rate for the coefficient estimation of the synchronous longitudinal covariates by the two-step method. Extensive simulation studies provide numerical support for the theoretical findings. We illustrate the performance of our method on a dataset from the Alzheimer’s Disease Neuroimaging Initiative study.

在线性模型中,省略一个与模型中协变量正交的协变量不会导致有偏差的系数估计。但纵向数据一般不存在这种情况,除了忽略的纵向协变量与模型中的纵向协变量之间的正交性之外,还需要额外的假设才能获得无偏的系数估计。我们提出了在较弱假设条件下减轻遗漏变量偏差的方法。我们提出了一个两步估计程序,用于在观测到非同步纵向协变量时推断此类协变量。对于混合同步和非同步纵向协变量,我们通过两步法得到了同步纵向协变量系数估计的参数收敛率。大量的模拟研究为理论结论提供了数值支持。我们在阿尔茨海默病神经影像倡议研究的数据集上说明了我们的方法的性能。
{"title":"Regression analysis of longitudinal data with mixed synchronous and asynchronous longitudinal covariates","authors":"Zhuowei Sun ,&nbsp;Hongyuan Cao ,&nbsp;Li Chen ,&nbsp;Jason P. Fine","doi":"10.1016/j.jspi.2023.106135","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106135","url":null,"abstract":"<div><p><span>In linear models, omitting a covariate<span><span> that is orthogonal to covariates in the model does not result in biased coefficient estimation. This generally does not hold for longitudinal data, where additional assumptions are needed to get an unbiased coefficient estimation in addition to the </span>orthogonality<span> between omitted longitudinal covariates and longitudinal covariates in the model. We propose methods to mitigate the omitted variable bias under weaker assumptions. A two-step estimation procedure is proposed to infer the asynchronous longitudinal covariates when such covariates are observed. For mixed synchronous and asynchronous longitudinal covariates, we get a </span></span></span>parametric convergence rate for the coefficient estimation of the synchronous longitudinal covariates by the two-step method. Extensive simulation studies provide numerical support for the theoretical findings. We illustrate the performance of our method on a dataset from the Alzheimer’s Disease Neuroimaging Initiative study.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106135"},"PeriodicalIF":0.9,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138564584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum correntropy criterion regression models with tending-to-zero scale parameters 尺度参数趋于零的最大熵标准回归模型
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-09 DOI: 10.1016/j.jspi.2023.106134
Lianqiang Yang , Ying Jing , Teng Li

Maximum correntropy criterion regression (MCCR) models have been well studied within the theoretical framework of statistical learning when the scale parameters take fixed values or go to infinity. This paper studies MCCR models with tending-to-zero scale parameters. It is revealed that the optimal learning rate of MCCR models is O(n1) in the asymptotic sense when the sample size n goes to infinity. In the case of finite samples, the performance and robustness of MCCR, Huber and the least square regression models are compared. The applications of these three methods to real data are also demonstrated.

最大熵准则回归(MCCR)模型在尺度参数取固定值或无穷大时的统计学习理论框架内得到了很好的研究。本文研究了尺度参数趋于零的 MCCR 模型。研究发现,当样本量 n 变为无穷大时,MCCR 模型的最优学习率在渐近意义上为 O(n-1)。在有限样本的情况下,比较了 MCCR、Huber 和最小平方回归模型的性能和鲁棒性。同时还展示了这三种方法在实际数据中的应用。
{"title":"Maximum correntropy criterion regression models with tending-to-zero scale parameters","authors":"Lianqiang Yang ,&nbsp;Ying Jing ,&nbsp;Teng Li","doi":"10.1016/j.jspi.2023.106134","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106134","url":null,"abstract":"<div><p>Maximum correntropy criterion regression (MCCR) models have been well studied within the theoretical framework of statistical learning when the scale parameters take fixed values or go to infinity. This paper studies MCCR models with tending-to-zero scale parameters. It is revealed that the optimal learning rate of MCCR models is <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> in the asymptotic sense when the sample size <span><math><mi>n</mi></math></span> goes to infinity. In the case of finite samples, the performance and robustness of MCCR, Huber and the least square regression models are compared. The applications of these three methods to real data are also demonstrated.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106134"},"PeriodicalIF":0.9,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138564583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resampling techniques for a class of smooth, possibly data-adaptive empirical copulas 一类平滑、可能具有数据适应性的经验共存系数的重采样技术
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-07 DOI: 10.1016/j.jspi.2023.106132
Ivan Kojadinovic , Bingqing Yi

We investigate the validity of two resampling techniques when carrying out inference on the underlying unknown copula using a recently proposed class of smooth, possibly data-adaptive nonparametric estimators that contains empirical Bernstein copulas (and thus the empirical beta copula). Following Kiriliouk et al. (2021), the first resampling technique is based on drawing samples from the smooth estimator and can only can be used in the case of independent observations. The second technique is a smooth extension of the so-called sequential dependent multiplier bootstrap and can thus be used in a time series setting and, possibly, for change-point analysis. The two studied resampling schemes are applied to confidence interval construction and the offline detection of changes in the cross-sectional dependence of multivariate time series, respectively. Monte Carlo experiments confirm the possible advantages of such smooth inference procedures over their non-smooth counterparts. A by-product of this work is the study of the weak consistency and finite-sample performance of two classes of smooth estimators of the first-order partial derivatives of a copula which can have applications in mean and quantile regression.

我们利用最近提出的一类包含经验伯恩斯坦协方差(以及经验贝塔协方差)的平滑、可能具有数据适应性的非参数估计器,研究了在对基础未知协方差进行推断时,两种重采样技术的有效性。根据 Kiriliouk 等人(2021 年)的研究,第一种重采样技术基于从平滑估计器中抽取样本,只能用于独立观测的情况。第二种技术是所谓的序列依赖乘数自举法的平滑扩展,因此可用于时间序列设置,也可用于变化点分析。所研究的两种重采样方案分别应用于置信区间构建和离线检测多变量时间序列的横截面依赖性变化。蒙特卡洛实验证实了这种平滑推断程序相对于非平滑推断程序可能具有的优势。这项工作的一个副产品是研究了两类共轭一阶偏导数平滑估计器的弱一致性和有限样本性能,这些估计器可应用于均值回归和量化回归。
{"title":"Resampling techniques for a class of smooth, possibly data-adaptive empirical copulas","authors":"Ivan Kojadinovic ,&nbsp;Bingqing Yi","doi":"10.1016/j.jspi.2023.106132","DOIUrl":"10.1016/j.jspi.2023.106132","url":null,"abstract":"<div><p>We investigate the validity of two resampling techniques when carrying out inference on the underlying unknown copula<span> using a recently proposed class of smooth, possibly data-adaptive nonparametric estimators that contains empirical Bernstein copulas (and thus the empirical beta copula). Following Kiriliouk et al. (2021), the first resampling technique is based on drawing samples from the smooth estimator and can only can be used in the case of independent observations. The second technique is a smooth extension of the so-called sequential dependent multiplier bootstrap<span> and can thus be used in a time series setting and, possibly, for change-point analysis. The two studied resampling schemes are applied to confidence interval construction and the offline detection of changes in the cross-sectional dependence of multivariate time series, respectively. Monte Carlo experiments confirm the possible advantages of such smooth inference procedures over their non-smooth counterparts. A by-product of this work is the study of the weak consistency and finite-sample performance of two classes of smooth estimators of the first-order partial derivatives of a copula which can have applications in mean and quantile regression.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106132"},"PeriodicalIF":0.9,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138554564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A global test for heteroscedastic one-way FMANOVA with applications 异方差单向 FMANOVA 全局检验及其应用
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-12-05 DOI: 10.1016/j.jspi.2023.106133
Tianming Zhu , Jin-Ting Zhang , Ming-Yen Cheng

Multivariate functional data are prevalent in various fields such as biology, climatology, and finance. Motivated by the World Health Data applications, in this study, we propose and examine a global test for assessing the equality of multiple mean functions in multivariate functional data. This test addresses the one-way Functional Multivariate Analysis of Variance (FMANOVA) problem, which is a fundamental issue in the analysis of multivariate functional data. While numerous analysis of variance tests have been proposed and studied for univariate functional data, only a limited number of methods have been developed for the one-way FMANOVA problem. Furthermore, our global test has the ability to handle heteroscedasticity in the unknown covariance function matrices that underlie the multivariate functional data, which is not possible with existing methods. We establish the asymptotic null distribution of the test statistic as a chi-squared-type mixture, which depends on the eigenvalues of the covariance function matrices. To approximate the null distribution, we introduce a Welch–Satterthwaite type chi-squared-approximation with consistent parameter estimation. The proposed test exhibits root-n consistency, meaning it possesses nontrivial power against a local alternative. Additionally, it offers superior computational efficiency compared to several permutation-based tests. Through simulation studies and applications to the World Health Data, we highlight the advantages of our global test.

多元函数数据普遍存在于生物学、气候学和金融学等多个领域。受世界卫生数据应用的启发,在本研究中,我们提出并研究了一种用于评估多元函数数据中多个均值函数相等性的全局检验。该检验解决了单向函数多元方差分析(FMANOVA)问题,这是多元函数数据分析中的一个基本问题。虽然针对单变量函数数据提出并研究了许多方差分析检验方法,但针对单向 FMANOVA 问题开发的方法数量有限。此外,我们的全局检验能够处理多元函数数据未知协方差函数矩阵中的异方差,这是现有方法无法做到的。我们将检验统计量的渐近零分布确定为一个奇平方型混合物,它取决于协方差函数矩阵的特征值。为了近似 null 分布,我们引入了具有一致参数估计的 Welch-Satterthwaite 型奇平方近似。所提出的检验具有根 n 一致性,这意味着它对局部替代方案具有非同一般的威力。此外,与几种基于置换的检验相比,它还具有更高的计算效率。通过模拟研究和在世界健康数据中的应用,我们强调了全局检验的优势。
{"title":"A global test for heteroscedastic one-way FMANOVA with applications","authors":"Tianming Zhu ,&nbsp;Jin-Ting Zhang ,&nbsp;Ming-Yen Cheng","doi":"10.1016/j.jspi.2023.106133","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106133","url":null,"abstract":"<div><p><span><span>Multivariate functional data are prevalent in various fields such as biology, climatology, and finance. Motivated by the World Health Data applications, in this study, we propose and examine a global test for assessing the equality of multiple mean functions in multivariate functional data. This test addresses the one-way Functional Multivariate Analysis of Variance<span> (FMANOVA) problem, which is a fundamental issue in the analysis of multivariate functional data. While numerous analysis of variance tests have been proposed and studied for univariate functional data, only a limited number of methods have been developed for the one-way FMANOVA problem. Furthermore, our global test has the ability to handle heteroscedasticity<span> in the unknown covariance function matrices that underlie the multivariate functional data, which is not possible with existing methods. We establish the asymptotic </span></span></span>null distribution of the test statistic as a chi-squared-type mixture, which depends on the eigenvalues of the covariance function matrices. To approximate the null distribution, we introduce a Welch–Satterthwaite type chi-squared-approximation with consistent parameter estimation. The proposed test exhibits root-</span><span><math><mi>n</mi></math></span> consistency, meaning it possesses nontrivial power against a local alternative. Additionally, it offers superior computational efficiency compared to several permutation-based tests. Through simulation studies and applications to the World Health Data, we highlight the advantages of our global test.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106133"},"PeriodicalIF":0.9,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138490271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inference on regression model with misclassified binary response 二元响应分类错误的回归模型推理
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-29 DOI: 10.1016/j.jspi.2023.106121
Arindam Chatterjee , Tathagata Bandyopadhyay , Ayoushman Bhattacharya

Misclassification of binary responses, if ignored, may severely bias the maximum likelihood estimators (MLEs) of regression parameters. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification probabilities with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.

如果忽略二元响应的错误分类,可能会严重影响回归参数的最大似然估计(MLEs)。对于这类数据,研究人员在不同的应用环境中广泛使用了包含非微分分类误差的二元回归模型。考虑到由于未知的错误分类概率与回归参数的混淆,该模型存在严重的估计问题,因此可能导致高度偏倚的估计,我们强烈警告不要滥用该模型。为了克服这个问题,我们建议在主样本之外使用一个内部验证样本。假设分类误差存在差异,我们基于主样本和内部验证样本的联合似然来考虑回归参数的最大似然。然后,在标准假设条件下,我们对联合最大似然矩建立了严格的渐近理论。为了便于推理的实现,我们提出了渐近分布的自举近似,并证明了其一致性。模拟研究的结果表明,即使是极小的验证样本也可能导致大大改进的推理。最后,用实际调查数据说明了该方法。
{"title":"Inference on regression model with misclassified binary response","authors":"Arindam Chatterjee ,&nbsp;Tathagata Bandyopadhyay ,&nbsp;Ayoushman Bhattacharya","doi":"10.1016/j.jspi.2023.106121","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106121","url":null,"abstract":"<div><p><span>Misclassification of binary responses, if ignored, may severely bias the </span>maximum likelihood estimators<span><span> (MLEs) of regression parameters<span>. For such data, a binary regression model incorporating non-differential classification errors is extensively used by researchers in different application contexts. We strongly caution against indiscriminate use of this model considering the fact that it suffers from a serious estimation problem due to confounding of the unknown misclassification </span></span>probabilities<span><span> with the regression parameters, and thus, may lead to a highly biased estimate. To overcome this problem, we propose here the use of an internal validation sample in addition to the main sample. Assuming differential classification errors, we consider MLEs of the regression parameters based on the joint likelihood of the main sample and the internal validation sample. We then develop a rigorous asymptotic theory for the joint MLEs under standard assumptions. To facilitate its easy implementation for inference, we propose a bootstrap approximation to the </span>asymptotic distribution and prove its consistency. The results of the simulation studies suggest that even an extremely small validation sample may lead to a vastly improved inference. Finally, the methodology is illustrated with a real-life survey data.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106121"},"PeriodicalIF":0.9,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138465636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of mixed-level screening designs using Hadamard matrices 使用Hadamard矩阵构建混合水平筛选设计
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-23 DOI: 10.1016/j.jspi.2023.106131
Bo Hu , Dongying Wang , Fasheng Sun

Modern experiments typically involve a very large number of variables. Screening designs allow experimenters to identify active factors in a minimum number of trials. To save costs, only low-level factorial designs are considered for screening experiments, especially two- and three-level designs. In this article, we provide a systematic method to construct screening designs that contain both two- and three-level factors based on Hadamard matrices with the fold-over structure. The proposed designs have good performance in terms of D-optimal and A-optimal criteria, and the estimates of the main effects are unbiased by the second-order effects, making them very suitable for screening experiments. Besides, some theoretical results on D- and A-optimality are obtained as a by-product.

现代实验通常涉及大量的变量。筛选设计允许实验人员在最少数量的试验中确定积极因素。为节省成本,筛选实验只考虑低水平因子设计,特别是二水平和三水平设计。在本文中,我们提供了一种系统的方法来构建包含两层和三层因素的筛选设计,该设计基于具有折叠结构的Hadamard矩阵。所提出的设计在d-最优和a -最优准则方面具有良好的性能,并且主效应的估计不受二阶效应的偏倚,使其非常适合筛选实验。此外,还得到了D-最优性和a -最优性的理论结果。
{"title":"Construction of mixed-level screening designs using Hadamard matrices","authors":"Bo Hu ,&nbsp;Dongying Wang ,&nbsp;Fasheng Sun","doi":"10.1016/j.jspi.2023.106131","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106131","url":null,"abstract":"<div><p>Modern experiments typically involve a very large number of variables. Screening designs allow experimenters to identify active factors in a minimum number of trials. To save costs, only low-level factorial designs are considered for screening experiments, especially two- and three-level designs. In this article, we provide a systematic method to construct screening designs that contain both two- and three-level factors based on Hadamard matrices with the fold-over structure. The proposed designs have good performance in terms of D-optimal and A-optimal criteria, and the estimates of the main effects are unbiased by the second-order effects, making them very suitable for screening experiments. Besides, some theoretical results on D- and A-optimality are obtained as a by-product.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106131"},"PeriodicalIF":0.9,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138448366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mallows model averaging based on kernel regression imputation with responses missing at random 基于随机缺失响应核回归插值的Mallows模型平均
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-22 DOI: 10.1016/j.jspi.2023.106130
Hengkun Zhu, Guohua Zou

Missing data is a common problem in real data analysis. In this paper, a Mallows model averaging method based on kernel regression imputation is proposed for the linear regression models with responses missing at random. We prove that our method asymptotically achieves the lowest possible squared error. Compared with the existing model averaging methods, the new method does not require the use of a parameter model to characterize the missing generation mechanism. The Monte Carlo simulation and a practical application demonstrate the usefulness of the proposed method.

缺失数据是实际数据分析中常见的问题。针对随机缺失响应的线性回归模型,提出了一种基于核回归插值的Mallows模型平均方法。我们证明了我们的方法渐近地达到最小可能的平方误差。与现有的模型平均方法相比,该方法不需要使用参数模型来表征缺失产生机理。蒙特卡罗仿真和实际应用表明了该方法的有效性。
{"title":"Mallows model averaging based on kernel regression imputation with responses missing at random","authors":"Hengkun Zhu,&nbsp;Guohua Zou","doi":"10.1016/j.jspi.2023.106130","DOIUrl":"10.1016/j.jspi.2023.106130","url":null,"abstract":"<div><p>Missing data is a common problem in real data analysis. In this paper, a Mallows model averaging method based on kernel regression imputation is proposed for the linear regression models with responses missing at random. We prove that our method asymptotically achieves the lowest possible squared error. Compared with the existing model averaging methods, the new method does not require the use of a parameter model to characterize the missing generation mechanism. The Monte Carlo simulation and a practical application demonstrate the usefulness of the proposed method.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106130"},"PeriodicalIF":0.9,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138506779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jackknife empirical likelihood confidence intervals for the categorical Gini correlation 分类基尼相关的折刀经验似然置信区间
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-20 DOI: 10.1016/j.jspi.2023.106123
Sameera Hewage, Yongli Sang

The categorical Gini correlation, ρg, was proposed by Dang et al. (2021) to measure the dependence between a categorical variable, Y, and a numerical variable, X. It has been shown that ρg has more appealing properties than current existing dependence measurements. In this paper, we develop the jackknife empirical likelihood (JEL) method for ρg. Confidence intervals for the Gini correlation are constructed without estimating the asymptotic variance. Adjusted and weighted JEL are explored to improve the performance of the standard JEL. Simulation studies show that our methods are competitive to existing methods in terms of coverage accuracy and shortness of confidence intervals. The proposed methods are illustrated in an application on two real datasets.

分类基尼相关系数ρg由Dang等人(2021)提出,用于测量分类变量Y与数值变量x之间的相关性。研究表明,与目前现有的相关性测量值相比,ρg具有更吸引人的特性。本文建立了ρg的刀切经验似然(JEL)方法。在不估计渐近方差的情况下构建基尼相关的置信区间。为了提高标准JEL的性能,对调整和加权JEL进行了探索。仿真研究表明,我们的方法在覆盖精度和置信区间短方面与现有方法具有竞争力。在两个实际数据集上的应用说明了所提出的方法。
{"title":"Jackknife empirical likelihood confidence intervals for the categorical Gini correlation","authors":"Sameera Hewage,&nbsp;Yongli Sang","doi":"10.1016/j.jspi.2023.106123","DOIUrl":"https://doi.org/10.1016/j.jspi.2023.106123","url":null,"abstract":"<div><p>The categorical Gini correlation, <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span><span>, was proposed by Dang et al. (2021) to measure the dependence between a categorical variable, </span><span><math><mi>Y</mi></math></span>, and a numerical variable, <span><math><mi>X</mi></math></span>. It has been shown that <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span> has more appealing properties than current existing dependence measurements. In this paper, we develop the jackknife empirical likelihood (JEL) method for <span><math><msub><mrow><mi>ρ</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span><span>. Confidence intervals for the Gini correlation are constructed without estimating the asymptotic variance. Adjusted and weighted JEL are explored to improve the performance of the standard JEL. Simulation studies show that our methods are competitive to existing methods in terms of coverage accuracy and shortness of confidence intervals. The proposed methods are illustrated in an application on two real datasets.</span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106123"},"PeriodicalIF":0.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138430387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1