首页 > 最新文献

Statistica Sinica最新文献

英文 中文
Leverage Classifier: Another Look at Support Vector Machine 杠杆分类器:另看支持向量机
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-23 DOI: 10.5705/ss.202023.0124
Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou
Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.
支持向量机(SVM)是一种流行的分类器,以其准确性、灵活性和鲁棒性而闻名。然而,其密集的计算阻碍了其在大规模数据集中的应用。在本文中,我们提出了一种新的基于线性SVM的不可分离设置下的最优杠杆分类器。我们的分类器旨在选择训练样本的信息子集,以减少数据大小,在保持高精度的同时实现高效计算。我们在一般的子采样框架下对支持向量机提出了一种新的观点,并严格研究了其统计特性。我们提出了一种两步子采样过程,包括最优子采样概率的导频估计和构造分类器的子采样步骤。我们开发了SVM系数的新的Bahadur表示,并在不给出全样本的情况下导出了无条件渐近分布和最优子采样概率。数值结果表明,我们的分类器在估计、计算和预测方面优于现有的方法。
{"title":"Leverage Classifier: Another Look at Support Vector Machine","authors":"Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou","doi":"10.5705/ss.202023.0124","DOIUrl":"https://doi.org/10.5705/ss.202023.0124","url":null,"abstract":"Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48579241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Unbiased Predictor for Skewed Response Variable with Measurement Error in Covariate 具有协变量测量误差的偏态响应变量的无偏预测器
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-21 DOI: 10.5705/ss.202023.0098
Sepideh Mosaferi, M. Ghosh, S. Sugasawa
We introduce a new small area predictor when the Fay-Herriot normal error model is fitted to a logarithmically transformed response variable, and the covariate is measured with error. This framework has been previously studied by Mosaferi et al. (2023). The empirical predictor given in their manuscript cannot perform uniformly better than the direct estimator. Our proposed predictor in this manuscript is unbiased and can perform uniformly better than the one proposed in Mosaferi et al. (2023). We derive an approximation of the mean squared error (MSE) for the predictor. The prediction intervals based on the MSE suffer from coverage problems. Thus, we propose a non-parametric bootstrap prediction interval which is more accurate. This problem is of great interest in small area applications since statistical agencies and agricultural surveys are often asked to produce estimates of right skewed variables with covariates measured with errors. With Monte Carlo simulation studies and two Census Bureau's data sets, we demonstrate the superiority of our proposed methodology.
当Fay-Herriot正态误差模型被拟合到对数变换的响应变量时,我们引入了一种新的小面积预测器,并且协变量是带误差测量的。Mosaferi等人先前对该框架进行了研究。(2023)。他们手稿中给出的经验预测器不能比直接估计器表现得更好。我们在这份手稿中提出的预测因子是无偏的,并且可以比Mosaferi等人提出的预测函数表现得更好。(2023)。我们导出了预测器的均方误差(MSE)的近似值。基于MSE的预测区间存在覆盖问题。因此,我们提出了一个更准确的非参数bootstrap预测区间。这个问题在小面积应用中引起了极大的兴趣,因为统计机构和农业调查经常被要求用带有误差的协变量来产生右偏变量的估计值。通过蒙特卡洛模拟研究和人口普查局的两个数据集,我们证明了我们提出的方法的优越性。
{"title":"An Unbiased Predictor for Skewed Response Variable with Measurement Error in Covariate","authors":"Sepideh Mosaferi, M. Ghosh, S. Sugasawa","doi":"10.5705/ss.202023.0098","DOIUrl":"https://doi.org/10.5705/ss.202023.0098","url":null,"abstract":"We introduce a new small area predictor when the Fay-Herriot normal error model is fitted to a logarithmically transformed response variable, and the covariate is measured with error. This framework has been previously studied by Mosaferi et al. (2023). The empirical predictor given in their manuscript cannot perform uniformly better than the direct estimator. Our proposed predictor in this manuscript is unbiased and can perform uniformly better than the one proposed in Mosaferi et al. (2023). We derive an approximation of the mean squared error (MSE) for the predictor. The prediction intervals based on the MSE suffer from coverage problems. Thus, we propose a non-parametric bootstrap prediction interval which is more accurate. This problem is of great interest in small area applications since statistical agencies and agricultural surveys are often asked to produce estimates of right skewed variables with covariates measured with errors. With Monte Carlo simulation studies and two Census Bureau's data sets, we demonstrate the superiority of our proposed methodology.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47397122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Block Cholesky Decomposition for Sparse Inverse Covariance Estimation 稀疏逆协方差估计的块Cholesky分解
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-18 DOI: 10.5705/ss.202023.0065
Xiaoning Kang, J. Lian, Xinwei Deng
The modified Cholesky decomposition is popular for inverse covariance estimation, but often needs pre-specification on the full information of variable ordering. In this work, we propose a block Cholesky decomposition (BCD) for estimating inverse covariance matrix under the partial information of variable ordering, in the sense that the variables can be divided into several groups with available ordering among groups, but variables within each group have no orderings. The proposed BCD model provides a unified framework for several existing methods including the modified Cholesky decomposition and the Graphical lasso. By utilizing the partial information on variable ordering, the proposed BCD model guarantees the positive definiteness of the estimated matrix with statistically meaningful interpretation. Theoretical results are established under regularity conditions. Simulation and case studies are conducted to evaluate the proposed BCD model.
修正的Cholesky分解在逆协方差估计中很受欢迎,但通常需要对变量排序的全部信息进行预规范。在这项工作中,我们提出了一种在变量排序的部分信息下估计逆协方差矩阵的块Cholesky分解(BCD),从这个意义上说,变量可以分为几个组,组之间有可用的排序,但每个组中的变量没有排序。所提出的BCD模型为几种现有方法提供了一个统一的框架,包括修改的Cholesky分解和图形套索。通过利用变量排序的部分信息,所提出的BCD模型保证了具有统计意义解释的估计矩阵的正定性。理论结果是在正则性条件下建立的。通过仿真和案例研究对所提出的BCD模型进行了评估。
{"title":"On Block Cholesky Decomposition for Sparse Inverse Covariance Estimation","authors":"Xiaoning Kang, J. Lian, Xinwei Deng","doi":"10.5705/ss.202023.0065","DOIUrl":"https://doi.org/10.5705/ss.202023.0065","url":null,"abstract":"The modified Cholesky decomposition is popular for inverse covariance estimation, but often needs pre-specification on the full information of variable ordering. In this work, we propose a block Cholesky decomposition (BCD) for estimating inverse covariance matrix under the partial information of variable ordering, in the sense that the variables can be divided into several groups with available ordering among groups, but variables within each group have no orderings. The proposed BCD model provides a unified framework for several existing methods including the modified Cholesky decomposition and the Graphical lasso. By utilizing the partial information on variable ordering, the proposed BCD model guarantees the positive definiteness of the estimated matrix with statistically meaningful interpretation. Theoretical results are established under regularity conditions. Simulation and case studies are conducted to evaluate the proposed BCD model.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41456065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Data Fusion Method for Quantile Treatment Effects 一种分位数处理效果的数据融合方法
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-07-16 DOI: 10.5705/ss.202022.0288
Yijiao Zhang, Zhongyi Zhu
With the increasing availability of datasets, developing data fusion methods to leverage the strengths of different datasets to draw causal effects is of great practical importance to many scientific fields. In this paper, we consider estimating the quantile treatment effects using small validation data with fully-observed confounders and large auxiliary data with unmeasured confounders. We propose a Fused Quantile Treatment effects Estimator (FQTE) by integrating the information from two datasets based on doubly robust estimating functions. We allow for the misspecification of the models on the dataset with unmeasured confounders. Under mild conditions, we show that the proposed FQTE is asymptotically normal and more efficient than the initial QTE estimator using the validation data solely. By establishing the asymptotic linear forms of related estimators, convenient methods for covariance estimation are provided. Simulation studies demonstrate the empirical validity and improved efficiency of our fused estimators. We illustrate the proposed method with an application.
随着数据集的可用性越来越高,开发数据融合方法来利用不同数据集的优势来得出因果效应对许多科学领域都具有重要的现实意义。在本文中,我们考虑使用具有完全观察到的混杂因素的小验证数据和具有未测量混杂因素的大辅助数据来估计分位数治疗效果。提出了一种基于双鲁棒估计函数的融合分位数处理效果估计器(FQTE)。我们允许使用未测量的混杂因素对数据集上的模型进行错误规范。在温和的条件下,我们证明了所提出的FQTE是渐近正态的,并且比仅使用验证数据的初始QTE估计器更有效。通过建立相关估计量的渐近线性形式,提供了方便的协方差估计方法。仿真研究证明了该融合估计器的经验有效性和提高的效率。我们用一个应用来说明所提出的方法。
{"title":"A Data Fusion Method for Quantile Treatment Effects","authors":"Yijiao Zhang, Zhongyi Zhu","doi":"10.5705/ss.202022.0288","DOIUrl":"https://doi.org/10.5705/ss.202022.0288","url":null,"abstract":"With the increasing availability of datasets, developing data fusion methods to leverage the strengths of different datasets to draw causal effects is of great practical importance to many scientific fields. In this paper, we consider estimating the quantile treatment effects using small validation data with fully-observed confounders and large auxiliary data with unmeasured confounders. We propose a Fused Quantile Treatment effects Estimator (FQTE) by integrating the information from two datasets based on doubly robust estimating functions. We allow for the misspecification of the models on the dataset with unmeasured confounders. Under mild conditions, we show that the proposed FQTE is asymptotically normal and more efficient than the initial QTE estimator using the validation data solely. By establishing the asymptotic linear forms of related estimators, convenient methods for covariance estimation are provided. Simulation studies demonstrate the empirical validity and improved efficiency of our fused estimators. We illustrate the proposed method with an application.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42371328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PARTIALLY FUNCTIONAL LINEAR QUANTILE REGRESSION WITH MEASUREMENT ERRORS. 有测量误差的部分函数线性量回归。
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-07-01 DOI: 10.5705/ss.202021.0246
Mengli Zhang, Lan Xue, Carmen D Tekwe, Yang Bai, Annie Qu

Ignoring measurement errors in conventional regression analyses can lead to biased estimation and inference results. Reducing such bias is challenging when the error-prone covariate is a functional curve. In this paper, we propose a new corrected loss function for a partially functional linear quantile model with function-valued measurement errors. We establish the asymptotic properties of both the functional coefficient and the parametric coefficient estimators. We also demonstrate the finite-sample performance of the proposed method using simulation studies, and illustrate its advantages by applying it to data from a children obesity study.

在传统回归分析中忽略测量误差会导致估计和推断结果出现偏差。当容易产生误差的协变量是函数曲线时,减少这种偏差具有挑战性。在本文中,我们为具有函数值测量误差的部分函数线性量化模型提出了一种新的修正损失函数。我们建立了函数系数估计器和参数系数估计器的渐近特性。我们还通过模拟研究证明了所提方法的有限样本性能,并将其应用于一项儿童肥胖症研究的数据中,从而说明了该方法的优势。
{"title":"PARTIALLY FUNCTIONAL LINEAR QUANTILE REGRESSION WITH MEASUREMENT ERRORS.","authors":"Mengli Zhang, Lan Xue, Carmen D Tekwe, Yang Bai, Annie Qu","doi":"10.5705/ss.202021.0246","DOIUrl":"10.5705/ss.202021.0246","url":null,"abstract":"<p><p>Ignoring measurement errors in conventional regression analyses can lead to biased estimation and inference results. Reducing such bias is challenging when the error-prone covariate is a functional curve. In this paper, we propose a new corrected loss function for a partially functional linear quantile model with function-valued measurement errors. We establish the asymptotic properties of both the functional coefficient and the parametric coefficient estimators. We also demonstrate the finite-sample performance of the proposed method using simulation studies, and illustrate its advantages by applying it to data from a children obesity study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"2257-2280"},"PeriodicalIF":1.5,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Estimators of Mean and Its Functions in Finite Populations 有限总体中均值及其函数估计量的比较
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-05-24 DOI: 10.5705/ss.202022.0181
Anurag Dey, P. Chaudhuri
Several well known estimators of finite population mean and its functions are investigated under some standard sampling designs. Such functions of mean include the variance, the correlation coefficient and the regression coefficient in the population as special cases. We compare the performance of these estimators under different sampling designs based on their asymptotic distributions. Equivalence classes of estimators under different sampling designs are constructed so that estimators in the same class have equivalent performance in terms of asymptotic mean squared errors (MSEs). Estimators in different equivalence classes are then compared under some superpopulations satisfying linear models. It is shown that the pseudo empirical likelihood (PEML) estimator of the population mean under simple random sampling without replacement (SRSWOR) has the lowest asymptotic MSE among all the estimators under different sampling designs considered in this paper. It is also shown that for the variance, the correlation coefficient and the regression coefficient of the population, the plug-in estimators based on the PEML estimator have the lowest asymptotic MSEs among all the estimators considered in this paper under SRSWOR. On the other hand, for any high entropy $pi$PS (HE$pi$PS) sampling design, which uses the auxiliary information, the plug-in estimators of those parameters based on the H'ajek estimator have the lowest asymptotic MSEs among all the estimators considered in this paper.
在一些标准抽样设计下,研究了有限总体均值及其函数的几个已知估计量。作为特殊情况,这些均值函数包括总体中的方差、相关系数和回归系数。我们根据这些估计量的渐近分布比较了它们在不同抽样设计下的性能。构造了不同抽样设计下估计量的等价类,使同一类的估计量在渐近均方误差方面具有等价的性能。然后在满足线性模型的超总体下比较了不同等价类的估计量。结果表明,在本文考虑的不同抽样设计下,总体均值的伪经验似然估计量(PEML)具有最低的渐近均方误差。对于总体的方差、相关系数和回归系数,基于PEML估计量的插件估计量在SRSWOR下具有最低的渐近均方误差。另一方面,对于任何使用辅助信息的高熵$pi$PS (HE$pi$PS)采样设计,基于H ajek估计量的这些参数的插入估计量在本文考虑的所有估计量中具有最低的渐近均方差。
{"title":"A Comparison of Estimators of Mean and Its Functions in Finite Populations","authors":"Anurag Dey, P. Chaudhuri","doi":"10.5705/ss.202022.0181","DOIUrl":"https://doi.org/10.5705/ss.202022.0181","url":null,"abstract":"Several well known estimators of finite population mean and its functions are investigated under some standard sampling designs. Such functions of mean include the variance, the correlation coefficient and the regression coefficient in the population as special cases. We compare the performance of these estimators under different sampling designs based on their asymptotic distributions. Equivalence classes of estimators under different sampling designs are constructed so that estimators in the same class have equivalent performance in terms of asymptotic mean squared errors (MSEs). Estimators in different equivalence classes are then compared under some superpopulations satisfying linear models. It is shown that the pseudo empirical likelihood (PEML) estimator of the population mean under simple random sampling without replacement (SRSWOR) has the lowest asymptotic MSE among all the estimators under different sampling designs considered in this paper. It is also shown that for the variance, the correlation coefficient and the regression coefficient of the population, the plug-in estimators based on the PEML estimator have the lowest asymptotic MSEs among all the estimators considered in this paper under SRSWOR. On the other hand, for any high entropy $pi$PS (HE$pi$PS) sampling design, which uses the auxiliary information, the plug-in estimators of those parameters based on the H'ajek estimator have the lowest asymptotic MSEs among all the estimators considered in this paper.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47403235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis. 一种高效的高维线性判别分析贪心搜索算法。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-05-01 DOI: 10.5705/ss.202021.0028
Hannan Yang, D Y Lin, Quefeng Li

High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.

高维分类是一个重要的统计问题,在许多领域都有应用。一个广泛使用的分类器是线性判别分析(LDA)。近年来,为了解决高维分类问题,提出了许多正则化LDA分类器。然而,这些方法依赖于反转一个大矩阵或解决大规模优化问题来呈现分类规则——当维度超高时,这些方法在计算上是禁止的。随着大数据的出现,开发更高效的算法来解决高维LDA问题变得越来越重要。在本文中,我们提出了一种高效的贪婪搜索算法,该算法仅依赖于封闭形式的公式来学习高维LDA规则。从变量选择和错误率一致性两个方面建立了其统计性质的理论保证;此外,我们在一些温和的分布假设下,对LDA问题中由附加特征带来的额外信息提供了明确的解释。我们证明,与其他高维LDA方法相比,这种新算法大大提高了计算速度,同时保持了相当甚至更好的分类性能。
{"title":"An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.","authors":"Hannan Yang,&nbsp;D Y Lin,&nbsp;Quefeng Li","doi":"10.5705/ss.202021.0028","DOIUrl":"https://doi.org/10.5705/ss.202021.0028","url":null,"abstract":"<p><p>High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 SI","pages":"1343-1364"},"PeriodicalIF":1.4,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348717/pdf/nihms-1764480.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9847026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Marginal Bayesian Posterior Inference using Recurrent Neural Networks with Application to Sequential Models. 递归神经网络的边际贝叶斯后验推理及其在序列模型中的应用。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-05-01 DOI: 10.5705/ss.202020.0348
Thayer Fisher, Alex Luedtke, Marco Carone, Noah Simon

In Bayesian data analysis, it is often important to evaluate quantiles of the posterior distribution of a parameter of interest (e.g., to form posterior intervals). In multi-dimensional problems, when non-conjugate priors are used, this is often difficult generally requiring either an analytic or sampling-based approximation, such as Markov chain Monte-Carlo (MCMC), Approximate Bayesian computation (ABC) or variational inference. We discuss a general approach that reframes this as a multi-task learning problem and uses recurrent deep neural networks (RNNs) to approximately evaluate posterior quantiles. As RNNs carry information along a sequence, this application is particularly useful in time-series. An advantage of this risk-minimization approach is that we do not need to sample from the posterior or calculate the likelihood. We illustrate the proposed approach in several examples.

在贝叶斯数据分析中,通常重要的是评估感兴趣参数的后验分布的分位数(例如,形成后验区间)。在多维问题中,当使用非共轭先验时,这通常是困难的,通常需要解析或基于抽样的近似,例如马尔可夫链蒙特卡罗(MCMC),近似贝叶斯计算(ABC)或变分推理。我们讨论了一种将其重新定义为多任务学习问题的一般方法,并使用循环深度神经网络(rnn)来近似评估后验分位数。由于rnn沿着序列携带信息,因此该应用程序在时间序列中特别有用。这种风险最小化方法的一个优点是我们不需要从后验中抽样或计算可能性。我们用几个例子来说明所提出的方法。
{"title":"Marginal Bayesian Posterior Inference using Recurrent Neural Networks with Application to Sequential Models.","authors":"Thayer Fisher,&nbsp;Alex Luedtke,&nbsp;Marco Carone,&nbsp;Noah Simon","doi":"10.5705/ss.202020.0348","DOIUrl":"https://doi.org/10.5705/ss.202020.0348","url":null,"abstract":"<p><p>In Bayesian data analysis, it is often important to evaluate quantiles of the posterior distribution of a parameter of interest (e.g., to form posterior intervals). In multi-dimensional problems, when non-conjugate priors are used, this is often difficult generally requiring either an analytic or sampling-based approximation, such as Markov chain Monte-Carlo (MCMC), Approximate Bayesian computation (ABC) or variational inference. We discuss a general approach that reframes this as a multi-task learning problem and uses recurrent deep neural networks (RNNs) to approximately evaluate posterior quantiles. As RNNs carry information along a sequence, this application is particularly useful in time-series. An advantage of this risk-minimization approach is that we do not need to sample from the posterior or calculate the likelihood. We illustrate the proposed approach in several examples.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 SI","pages":"1507-1532"},"PeriodicalIF":1.4,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10321540/pdf/nihms-1807576.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10180986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error. 具有测量误差的高维向量自回归的统计推断
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-05-01 DOI: 10.5705/ss.202021.0151
Xiang Lyu, Jian Kang, Lexin Li

High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.

具有测量误差的高维向量自回归在各种科学和商业应用中经常遇到。在本文中,我们研究了在这个模型下转移矩阵的统计推断。虽然有大量文献研究转移矩阵的稀疏估计,但推理解决方案很少,尤其是在高维场景中。我们为转移矩阵的全局和同时测试开发了推理程序。我们首先开发了一种新的稀疏期望最大化算法来估计模型参数,并仔细描述了它们的估计精度。然后,经过适当的偏差和方差校正,我们构造了一个高斯矩阵,从中我们得出了测试统计数据。最后,我们开发了测试程序,并建立了它们的渐近保证。我们通过深入的模拟研究了测试的有限样本性能,并以大脑连接分析为例进行了说明。
{"title":"Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error.","authors":"Xiang Lyu, Jian Kang, Lexin Li","doi":"10.5705/ss.202021.0151","DOIUrl":"10.5705/ss.202021.0151","url":null,"abstract":"<p><p>High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":"1435-1459"},"PeriodicalIF":1.5,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44728518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Globally Adaptive Longitudinal Quantile Regression with High Dimensional Compositional Covariates. 高维组成协变量的全局自适应纵向分位数回归。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-05-01 DOI: 10.5705/ss.202021.0006
Huijuan Ma, Qi Zheng, Zhumin Zhang, Huichuan Lai, Limin Peng

In this work, we propose a longitudinal quantile regression framework that enables a robust characterization of heterogeneous covariate-response associations in the presence of high-dimensional compositional covariates and repeated measurements of both response and covariates. We develop a globally adaptive penalization procedure, which can consistently identify covariate sparsity patterns across a continuum set of quantile levels. The proposed estimation procedure properly aggregates longitudinal observations over time, and ensures the satisfaction of the sum-zero coefficient constraint that is needed for proper interpretation of the effects of compositional covariates. We establish the oracle rate of uniform convergence and weak convergence of the resulting estimators, and further justify the proposed uniform selector of the tuning parameter in terms of achieving global model selection consistency. We derive an efficient algorithm by incorporating existing R packages to facilitate stable and fast computation. Our extensive simulation studies confirm the theoretical findings. We apply the proposed method to a longitudinal study of cystic fibrosis children where the association between gut microbiome and other diet-related biomarkers is of interest.

在这项工作中,我们提出了一个纵向分位数回归框架,该框架能够在高维组成协变量和响应和协变量的重复测量中对异质协变量-响应关联进行稳健表征。我们开发了一个全局自适应的惩罚程序,它可以在连续的分位数水平上一致地识别协变量稀疏性模式。所提出的估计程序适当地汇总了随时间推移的纵向观测,并确保满足零和系数约束,这是正确解释组成协变量影响所需的。我们建立了估计量的一致收敛率和弱收敛率,并进一步从实现全局模型选择一致性的角度证明了所提出的调谐参数的一致选择。我们结合现有的R包推导出一种高效的算法,以促进稳定和快速的计算。我们广泛的模拟研究证实了这些理论发现。我们将提出的方法应用于囊性纤维化儿童的纵向研究,其中肠道微生物组和其他饮食相关生物标志物之间的关联是感兴趣的。
{"title":"Globally Adaptive Longitudinal Quantile Regression with High Dimensional Compositional Covariates.","authors":"Huijuan Ma,&nbsp;Qi Zheng,&nbsp;Zhumin Zhang,&nbsp;Huichuan Lai,&nbsp;Limin Peng","doi":"10.5705/ss.202021.0006","DOIUrl":"https://doi.org/10.5705/ss.202021.0006","url":null,"abstract":"<p><p>In this work, we propose a longitudinal quantile regression framework that enables a robust characterization of heterogeneous covariate-response associations in the presence of high-dimensional compositional covariates and repeated measurements of both response and covariates. We develop a globally adaptive penalization procedure, which can consistently identify covariate sparsity patterns across a continuum set of quantile levels. The proposed estimation procedure properly aggregates longitudinal observations over time, and ensures the satisfaction of the sum-zero coefficient constraint that is needed for proper interpretation of the effects of compositional covariates. We establish the oracle rate of uniform convergence and weak convergence of the resulting estimators, and further justify the proposed uniform selector of the tuning parameter in terms of achieving global model selection consistency. We derive an efficient algorithm by incorporating existing R packages to facilitate stable and fast computation. Our extensive simulation studies confirm the theoretical findings. We apply the proposed method to a longitudinal study of cystic fibrosis children where the association between gut microbiome and other diet-related biomarkers is of interest.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 Spec","pages":"1295-1318"},"PeriodicalIF":1.4,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361693/pdf/nihms-1757788.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9862958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistica Sinica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1