首页 > 最新文献

Statistica Sinica最新文献

英文 中文
Adaptive Randomization via Mahalanobis Distance 基于马氏距离的自适应随机化
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202020.0440
Yichen Qin, Y. Li, Wei Ma, Haoyu Yang, F. Hu
: In comparative studies, researchers often seek an optimal covariate balance. However, chance imbalance still exists in randomized experiments, and becomes more serious as the number of covariates increases. To address this issue, we introduce a new randomization procedure, called adaptive randomization via the Mahalanobis distance (ARM). The proposed method allocates units sequentially and adaptively, using information on the current level of imbalance and the incoming unit’s covariate. Theoretical results and numerical comparison show that with a large number of covariates or a large number of units, the proposed method shows substantial advantages over traditional methods in terms of the covariate balance, estimation accuracy, hypothesis testing power, and computational time. The proposed method attains the optimal covariate balance, in the sense that the estimated treatment effect attains its minimum variance asymptotically, and can be applied in both causal inference and clinical trials. Lastly, numerical stud-1
在比较研究中,研究人员经常寻求最佳协变量平衡。然而,随机实验中仍然存在机会不平衡现象,并且随着协变量数量的增加,机会不平衡现象更加严重。为了解决这个问题,我们引入了一种新的随机化程序,称为通过马氏距离(ARM)的自适应随机化。该方法利用当前不平衡水平和输入单元的协变量信息,自适应地顺序分配单元。理论结果和数值比较表明,在协变量较多或单位较多的情况下,本文提出的方法在协变量平衡、估计精度、假设检验能力、计算时间等方面都比传统方法有较大的优势。该方法实现了最优协变量平衡,即估计的治疗效果渐近地达到其最小方差,可以应用于因果推理和临床试验。最后,数值研究[中国统计:预印本doi:10.5705/ss.202020.0440]
{"title":"Adaptive Randomization via Mahalanobis Distance","authors":"Yichen Qin, Y. Li, Wei Ma, Haoyu Yang, F. Hu","doi":"10.5705/ss.202020.0440","DOIUrl":"https://doi.org/10.5705/ss.202020.0440","url":null,"abstract":": In comparative studies, researchers often seek an optimal covariate balance. However, chance imbalance still exists in randomized experiments, and becomes more serious as the number of covariates increases. To address this issue, we introduce a new randomization procedure, called adaptive randomization via the Mahalanobis distance (ARM). The proposed method allocates units sequentially and adaptively, using information on the current level of imbalance and the incoming unit’s covariate. Theoretical results and numerical comparison show that with a large number of covariates or a large number of units, the proposed method shows substantial advantages over traditional methods in terms of the covariate balance, estimation accuracy, hypothesis testing power, and computational time. The proposed method attains the optimal covariate balance, in the sense that the estimated treatment effect attains its minimum variance asymptotically, and can be applied in both causal inference and clinical trials. Lastly, numerical stud-1","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70936861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Regression Analysis of Spatially Correlated Event Durations With Missing Origins Annotated by Longitudinal Measures 纵向测量中缺失起源的空间相关事件持续时间回归分析
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202021.0118
Y. Xiong, W. J. Braun, T. Duchesne, X. J. Hu
This paper is concerned with event durations in situations where the study units may be spatially correlated and the time origins of the events are missing. We develop regression models based on the partly observed durations with the aid of available longitudinal information. The first-hitting-time model (e.g. Lee and Whitmore, 2006) is employed to link the data of event durations and the associated longitudinal measures with shared random effects. We present procedures for estimating the model parameters and an induced estimator of the conditional distribution of the event duration. We apply the EM algorithm and Monte Carlo methods to compute the proposed estimators. We establish consistency and asymptotic normality of the estimators, and present their variance estimation. The proposed approach is illustrated with a collection of wildfire records from Alberta, Canada. Its performance is examined numerically and compared with two competitors via simulation.
本文关注的是在研究单元可能是空间相关的,而事件的时间起源缺失的情况下的事件持续时间。我们开发回归模型基于部分观测的持续时间与可用的纵向信息的帮助。采用首次撞击时间模型(如Lee和Whitmore, 2006)将事件持续时间和相关纵向测量数据与共享随机效应联系起来。我们提出了估计模型参数的程序和事件持续时间条件分布的诱导估计器。我们应用EM算法和蒙特卡罗方法来计算所提出的估计量。我们建立了估计量的相合性和渐近正态性,并给出了它们的方差估计。所提出的方法以加拿大阿尔伯塔省的野火记录集为例。对其性能进行了数值检验,并与两种竞争产品进行了仿真比较。
{"title":"Regression Analysis of Spatially Correlated Event Durations With Missing Origins Annotated by Longitudinal Measures","authors":"Y. Xiong, W. J. Braun, T. Duchesne, X. J. Hu","doi":"10.5705/ss.202021.0118","DOIUrl":"https://doi.org/10.5705/ss.202021.0118","url":null,"abstract":"This paper is concerned with event durations in situations where the study units may be spatially correlated and the time origins of the events are missing. We develop regression models based on the partly observed durations with the aid of available longitudinal information. The first-hitting-time model (e.g. Lee and Whitmore, 2006) is employed to link the data of event durations and the associated longitudinal measures with shared random effects. We present procedures for estimating the model parameters and an induced estimator of the conditional distribution of the event duration. We apply the EM algorithm and Monte Carlo methods to compute the proposed estimators. We establish consistency and asymptotic normality of the estimators, and present their variance estimation. The proposed approach is illustrated with a collection of wildfire records from Alberta, Canada. Its performance is examined numerically and compared with two competitors via simulation.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous Functional Quantile Regression 同时功能分位数回归
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202021.0248
Boyi Hu, Xixi Hu, Hua Liu, Jinhong You, Jiguo Cao
The conventional method for functional quantile regression (FQR) is to fit the regression model for each quantile of interest separately. Therefore, the slope function of the regression, as a bivariate function of time and quantile, is estimated as a univariate function of time for each fixed quantile. However, there are several limitations to this conventional strategy. For example, it cannot guarantee the monotonicity of the conditional quantiles, nor can it control the smoothness of the slope estimator as a bivariate function. In this paper, we propose a new framework for FQR, in which we simultaneously fit the FQR model for multiple quantiles, with the help of a bivariate basis under some constraints, such that the estimated quantiles satisfy the monotonicity conditions and the smoothness of the slope estimator is controlled. The proposed estimator for the slope function is shown to be asymptotically consistent, and we establish its asymptotic normality. We use simulation to evaluate the finite-sample performance of the proposed method and compare it with that of the conventional method. We demonstrate the proposed method by analyzing the effects of Statistica Sinica: Preprint doi:10.5705/ss.202021.0248
功能分位数回归(FQR)的传统方法是对每个感兴趣的分位数分别拟合回归模型。因此,回归的斜率函数作为时间和分位数的二元函数,被估计为每个固定分位数的单变量时间函数。然而,这种传统策略有几个限制。例如,它不能保证条件分位数的单调性,也不能控制斜率估计器作为二元函数的平滑性。本文提出了一种新的FQR框架,在一定的约束条件下,利用二元基同时拟合多个分位数的FQR模型,使估计的分位数满足单调性条件,并控制斜率估计量的平滑性。证明了所提出的斜率函数的估计量是渐近一致的,并建立了其渐近正态性。通过仿真对该方法的有限样本性能进行了评价,并与传统方法进行了比较。我们通过分析中国统计:预印本doi:10.5705/ss.202021.0248的效果来证明所提出的方法
{"title":"Simultaneous Functional Quantile Regression","authors":"Boyi Hu, Xixi Hu, Hua Liu, Jinhong You, Jiguo Cao","doi":"10.5705/ss.202021.0248","DOIUrl":"https://doi.org/10.5705/ss.202021.0248","url":null,"abstract":"The conventional method for functional quantile regression (FQR) is to fit the regression model for each quantile of interest separately. Therefore, the slope function of the regression, as a bivariate function of time and quantile, is estimated as a univariate function of time for each fixed quantile. However, there are several limitations to this conventional strategy. For example, it cannot guarantee the monotonicity of the conditional quantiles, nor can it control the smoothness of the slope estimator as a bivariate function. In this paper, we propose a new framework for FQR, in which we simultaneously fit the FQR model for multiple quantiles, with the help of a bivariate basis under some constraints, such that the estimated quantiles satisfy the monotonicity conditions and the smoothness of the slope estimator is controlled. The proposed estimator for the slope function is shown to be asymptotically consistent, and we establish its asymptotic normality. We use simulation to evaluate the finite-sample performance of the proposed method and compare it with that of the conventional method. We demonstrate the proposed method by analyzing the effects of Statistica Sinica: Preprint doi:10.5705/ss.202021.0248","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Odds Rate Frailty Models for Current Status Data with Informative Censoring 具有信息过滤的当前状态数据的广义优势率脆弱模型
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202021.0411
Yang Xu, Shishun Zhao, T. Hu, Jianguo Sun
: Current-status data occur in many areas, and the analysis of such data attracted much attention. In this study, we consider a regression analysis of current-status data in the presence of informative censoring, for which most existing methods either apply only to limited situations or are computationally unstable. Here, we propose a new sieve maximum likelihood estimation procedure under the class of semiparametric generalized odds rate frailty models. The proposed method uses the latent variable to describe the informative censoring or relationship between the failure time of interest and the censoring time. We develop a novel expectation-maximization algorithm for determining the proposed estimators, and establish their asymptotic consistency and normality. The results of a simulation study show that the proposed method performs well in practical
现状数据出现在许多领域,对这些数据的分析引起了人们的广泛关注。在本研究中,我们考虑在存在信息审查的情况下对当前状态数据进行回归分析,因为大多数现有方法要么只适用于有限的情况,要么在计算上不稳定。在半参数广义优势率脆弱性模型下,我们提出了一种新的筛极大似然估计方法。该方法使用隐变量来描述信息的审查或感兴趣的失效时间与审查时间之间的关系。我们开发了一种新的期望最大化算法来确定所提出的估计量,并建立了它们的渐近相合性和正态性。仿真研究结果表明,该方法在实际应用中具有良好的性能。E-mail: hutaomath@foxmail.com中国统计:预印本doi:10.5705/ss.202021.0411
{"title":"Generalized Odds Rate Frailty Models for Current Status Data with Informative Censoring","authors":"Yang Xu, Shishun Zhao, T. Hu, Jianguo Sun","doi":"10.5705/ss.202021.0411","DOIUrl":"https://doi.org/10.5705/ss.202021.0411","url":null,"abstract":": Current-status data occur in many areas, and the analysis of such data attracted much attention. In this study, we consider a regression analysis of current-status data in the presence of informative censoring, for which most existing methods either apply only to limited situations or are computationally unstable. Here, we propose a new sieve maximum likelihood estimation procedure under the class of semiparametric generalized odds rate frailty models. The proposed method uses the latent variable to describe the informative censoring or relationship between the failure time of interest and the censoring time. We develop a novel expectation-maximization algorithm for determining the proposed estimators, and establish their asymptotic consistency and normality. The results of a simulation study show that the proposed method performs well in practical","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Inference for Mean Function of Longitudinal Imaging Data over Complicated Domains 复杂域纵向成像数据均值函数的统计推断
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202021.0415
Qirui Hu, Jie Li
We propose a novel procedure for estimating the mean function of longitudinal imaging data with inherent spatial and temporal correlation. We depict the dependence between temporally ordered images using a functional moving average, and use flexible bivariate splines over triangulations to handle the irregular domain of images which is common in imaging studies. We establish both the global and the local asymptotic properties of the bivariate spline estimator for the mean function, with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient, as though all images were entirely observed without errors. We use Monte Carlo simulation experiments to demonstrate the finite-sample performance of the proposed method, the results of which strongly corroborate the asymptotic theory. The proposed method is further illustrated by analyzing two seawater potential temperature data sets.
{"title":"Statistical Inference for Mean Function of Longitudinal Imaging Data over Complicated Domains","authors":"Qirui Hu, Jie Li","doi":"10.5705/ss.202021.0415","DOIUrl":"https://doi.org/10.5705/ss.202021.0415","url":null,"abstract":"We propose a novel procedure for estimating the mean function of longitudinal imaging data with inherent spatial and temporal correlation. We depict the dependence between temporally ordered images using a functional moving average, and use flexible bivariate splines over triangulations to handle the irregular domain of images which is common in imaging studies. We establish both the global and the local asymptotic properties of the bivariate spline estimator for the mean function, with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient, as though all images were entirely observed without errors. We use Monte Carlo simulation experiments to demonstrate the finite-sample performance of the proposed method, the results of which strongly corroborate the asymptotic theory. The proposed method is further illustrated by analyzing two seawater potential temperature data sets.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Group Testing Regression Analysis with Missing Data and Imperfect Tests 缺失数据和不完善检验的组检验回归分析
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2024-01-01 DOI: 10.5705/ss.202021.0382
A. Delaigle, Ruoxu Tan
: Estimating the prevalence of an infectious disease in a big population typically requires testing a specimen (e.g., blood, urine, or swab) for the disease. When the disease spreads quickly, time constraints and limited resources often restrict the number of tests that can be performed. In such cases, if the prevalence is not too high, the group testing procedure can be employed to save time, money, and resources. The procedure tests pooled specimens of groups of individuals, rather than testing each individual for the disease. This technique is also used in other contexts, for example, to detect abnormalities or contamination in animals, plants, food, or water. Although methods exist for estimating a prevalence conditional on the explanatory variables from the group testing data, they require the specimen to be available for all individuals, which is not always possible. Therefore, we construct new nonparametric estimators that are consistent when some of the specimens are missing. We demonstrate the numerical performance of our methods using simulations and a hepatitis B example.
估计传染病在大人群中的流行情况通常需要检测该疾病的标本(如血液、尿液或拭子)。当疾病迅速传播时,时间限制和有限的资源往往会限制可进行的检测数量。在这种情况下,如果患病率不是太高,可以采用分组测试程序来节省时间、金钱和资源。该程序测试汇集了个体群体的标本,而不是对每个个体进行疾病检测。这项技术也可用于其他场合,例如,检测动物、植物、食物或水的异常或污染。虽然现有方法可以根据群体测试数据的解释变量来估计患病率,但它们要求所有个体都可以获得样本,这并不总是可能的。因此,我们构造了新的非参数估计量,当某些样本缺失时,它是一致的。我们用模拟和一个乙型肝炎的例子来证明我们的方法的数值性能。
{"title":"Group Testing Regression Analysis with Missing Data and Imperfect Tests","authors":"A. Delaigle, Ruoxu Tan","doi":"10.5705/ss.202021.0382","DOIUrl":"https://doi.org/10.5705/ss.202021.0382","url":null,"abstract":": Estimating the prevalence of an infectious disease in a big population typically requires testing a specimen (e.g., blood, urine, or swab) for the disease. When the disease spreads quickly, time constraints and limited resources often restrict the number of tests that can be performed. In such cases, if the prevalence is not too high, the group testing procedure can be employed to save time, money, and resources. The procedure tests pooled specimens of groups of individuals, rather than testing each individual for the disease. This technique is also used in other contexts, for example, to detect abnormalities or contamination in animals, plants, food, or water. Although methods exist for estimating a prevalence conditional on the explanatory variables from the group testing data, they require the specimen to be available for all individuals, which is not always possible. Therefore, we construct new nonparametric estimators that are consistent when some of the specimens are missing. We demonstrate the numerical performance of our methods using simulations and a hepatitis B example.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Linear Errors-in-Variables Model with Unknown Heteroscedastic Measurement Errors 具有未知异方差测量误差的线性变量误差模型
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-10-21 DOI: 10.5705/ss.202022.0331
L. Nghiem, Cornelis J. Potgieter
In the classic measurement error framework, covariates are contaminated by independent additive noise. This paper considers parameter estimation in such a linear errors-in-variables model where the unknown measurement error distribution is heteroscedastic across observations. We propose a new generalized method of moment (GMM) estimator that combines a moment correction approach and a phase function-based approach. The former requires distributions to have four finite moments, while the latter relies on covariates having asymmetric distributions. The new estimator is shown to be consistent and asymptotically normal under appropriate regularity conditions. The asymptotic covariance of the estimator is derived, and the estimated standard error is computed using a fast bootstrap procedure. The GMM estimator is demonstrated to have strong finite sample performance in numerical studies, especially when the measurement errors follow non-Gaussian distributions.
在经典的测量误差框架中,协变量受到独立加性噪声的污染。本文考虑了这种线性变量误差模型中的参数估计问题,在这种模型中,未知测量误差分布在各观测值之间是异方差的。我们提出了一种新的广义矩法(GMM)估计方法,它结合了矩修正方法和基于相位函数的方法。前者要求分布具有四个有限矩,而后者则依赖于具有非对称分布的协变量。在适当的正则性条件下,新的估计器具有一致性和渐近正态性。推导出了估计器的渐近协方差,并使用快速引导程序计算了估计标准误差。在数值研究中,特别是当测量误差遵循非高斯分布时,证明了 GMM 估计器具有很强的有限样本性能。
{"title":"A Linear Errors-in-Variables Model with Unknown Heteroscedastic Measurement Errors","authors":"L. Nghiem, Cornelis J. Potgieter","doi":"10.5705/ss.202022.0331","DOIUrl":"https://doi.org/10.5705/ss.202022.0331","url":null,"abstract":"In the classic measurement error framework, covariates are contaminated by independent additive noise. This paper considers parameter estimation in such a linear errors-in-variables model where the unknown measurement error distribution is heteroscedastic across observations. We propose a new generalized method of moment (GMM) estimator that combines a moment correction approach and a phase function-based approach. The former requires distributions to have four finite moments, while the latter relies on covariates having asymmetric distributions. The new estimator is shown to be consistent and asymptotically normal under appropriate regularity conditions. The asymptotic covariance of the estimator is derived, and the estimated standard error is computed using a fast bootstrap procedure. The GMM estimator is demonstrated to have strong finite sample performance in numerical studies, especially when the measurement errors follow non-Gaussian distributions.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139315809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of random integration to test equality of high dimensional covariance matrices. 使用随机积分来检验高维协方差矩阵的相等性。
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-10-01 DOI: 10.5705/ss.202020.0486
Yunlu Jiang, Canhong Wen, Yukang Jiang, Xueqin Wang, Heping Zhang

Testing the equality of two covariance matrices is a fundamental problem in statistics, and especially challenging when the data are high-dimensional. Through a novel use of random integration, we can test the equality of high-dimensional covariance matrices without assuming parametric distributions for the two underlying populations, even if the dimension is much larger than the sample size. The asymptotic properties of our test for arbitrary number of covariates and sample size are studied in depth under a general multivariate model. The finite-sample performance of our test is evaluated through numerical studies. The empirical results demonstrate that our test is highly competitive with existing tests in a wide range of settings. In particular, our proposed test is distinctly powerful under different settings when there exist a few large or many small diagonal disturbances between the two covariance matrices.

检验两个协方差矩阵的相等性是统计学中的一个基本问题,当数据是高维时尤其具有挑战性。通过一种新的随机积分方法,我们可以在不假设两个潜在群体的参数分布的情况下测试高维协方差矩阵的相等性,即使维数远大于样本量。在一般的多元模型下,深入研究了我们对任意数量的协变量和样本大小的检验的渐近性质。通过数值研究评估了我们测试的有限样本性能。实证结果表明,我们的测试在广泛的环境中与现有测试具有很强的竞争力。特别地,当两个协方差矩阵之间存在一些大的或许多小的对角扰动时,我们提出的测试在不同的设置下是明显强大的。
{"title":"Use of random integration to test equality of high dimensional covariance matrices.","authors":"Yunlu Jiang, Canhong Wen, Yukang Jiang, Xueqin Wang, Heping Zhang","doi":"10.5705/ss.202020.0486","DOIUrl":"10.5705/ss.202020.0486","url":null,"abstract":"<p><p>Testing the equality of two covariance matrices is a fundamental problem in statistics, and especially challenging when the data are high-dimensional. Through a novel use of random integration, we can test the equality of high-dimensional covariance matrices without assuming parametric distributions for the two underlying populations, even if the dimension is much larger than the sample size. The asymptotic properties of our test for arbitrary number of covariates and sample size are studied in depth under a general multivariate model. The finite-sample performance of our test is evaluated through numerical studies. The empirical results demonstrate that our test is highly competitive with existing tests in a wide range of settings. In particular, our proposed test is distinctly powerful under different settings when there exist a few large or many small diagonal disturbances between the two covariance matrices.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41162333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leverage Classifier: Another Look at Support Vector Machine 杠杆分类器:另看支持向量机
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-08-23 DOI: 10.5705/ss.202023.0124
Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou
Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.
支持向量机(SVM)是一种流行的分类器,以其准确性、灵活性和鲁棒性而闻名。然而,其密集的计算阻碍了其在大规模数据集中的应用。在本文中,我们提出了一种新的基于线性SVM的不可分离设置下的最优杠杆分类器。我们的分类器旨在选择训练样本的信息子集,以减少数据大小,在保持高精度的同时实现高效计算。我们在一般的子采样框架下对支持向量机提出了一种新的观点,并严格研究了其统计特性。我们提出了一种两步子采样过程,包括最优子采样概率的导频估计和构造分类器的子采样步骤。我们开发了SVM系数的新的Bahadur表示,并在不给出全样本的情况下导出了无条件渐近分布和最优子采样概率。数值结果表明,我们的分类器在估计、计算和预测方面优于现有的方法。
{"title":"Leverage Classifier: Another Look at Support Vector Machine","authors":"Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou","doi":"10.5705/ss.202023.0124","DOIUrl":"https://doi.org/10.5705/ss.202023.0124","url":null,"abstract":"Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48579241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unbiased Predictor for Skewed Response Variable with Measurement Error in Covariate 具有协变量测量误差的偏态响应变量的无偏预测器
IF 1.4 3区 数学 Q2 Mathematics Pub Date : 2023-08-21 DOI: 10.5705/ss.202023.0098
Sepideh Mosaferi, M. Ghosh, S. Sugasawa
We introduce a new small area predictor when the Fay-Herriot normal error model is fitted to a logarithmically transformed response variable, and the covariate is measured with error. This framework has been previously studied by Mosaferi et al. (2023). The empirical predictor given in their manuscript cannot perform uniformly better than the direct estimator. Our proposed predictor in this manuscript is unbiased and can perform uniformly better than the one proposed in Mosaferi et al. (2023). We derive an approximation of the mean squared error (MSE) for the predictor. The prediction intervals based on the MSE suffer from coverage problems. Thus, we propose a non-parametric bootstrap prediction interval which is more accurate. This problem is of great interest in small area applications since statistical agencies and agricultural surveys are often asked to produce estimates of right skewed variables with covariates measured with errors. With Monte Carlo simulation studies and two Census Bureau's data sets, we demonstrate the superiority of our proposed methodology.
当Fay-Herriot正态误差模型被拟合到对数变换的响应变量时,我们引入了一种新的小面积预测器,并且协变量是带误差测量的。Mosaferi等人先前对该框架进行了研究。(2023)。他们手稿中给出的经验预测器不能比直接估计器表现得更好。我们在这份手稿中提出的预测因子是无偏的,并且可以比Mosaferi等人提出的预测函数表现得更好。(2023)。我们导出了预测器的均方误差(MSE)的近似值。基于MSE的预测区间存在覆盖问题。因此,我们提出了一个更准确的非参数bootstrap预测区间。这个问题在小面积应用中引起了极大的兴趣,因为统计机构和农业调查经常被要求用带有误差的协变量来产生右偏变量的估计值。通过蒙特卡洛模拟研究和人口普查局的两个数据集,我们证明了我们提出的方法的优越性。
{"title":"An Unbiased Predictor for Skewed Response Variable with Measurement Error in Covariate","authors":"Sepideh Mosaferi, M. Ghosh, S. Sugasawa","doi":"10.5705/ss.202023.0098","DOIUrl":"https://doi.org/10.5705/ss.202023.0098","url":null,"abstract":"We introduce a new small area predictor when the Fay-Herriot normal error model is fitted to a logarithmically transformed response variable, and the covariate is measured with error. This framework has been previously studied by Mosaferi et al. (2023). The empirical predictor given in their manuscript cannot perform uniformly better than the direct estimator. Our proposed predictor in this manuscript is unbiased and can perform uniformly better than the one proposed in Mosaferi et al. (2023). We derive an approximation of the mean squared error (MSE) for the predictor. The prediction intervals based on the MSE suffer from coverage problems. Thus, we propose a non-parametric bootstrap prediction interval which is more accurate. This problem is of great interest in small area applications since statistical agencies and agricultural surveys are often asked to produce estimates of right skewed variables with covariates measured with errors. With Monte Carlo simulation studies and two Census Bureau's data sets, we demonstrate the superiority of our proposed methodology.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47397122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistica Sinica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1