首页 > 最新文献

Statistica Sinica最新文献

英文 中文
Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources 子采样和折刀:计算资源有限的大数据分析的一种实用方便的解决方案
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-13 DOI: 10.5705/ss.202021.0257
Shuyuan Wu, Xuening Zhu, Hansheng Wang
Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most cases, they do not have powerful computational resources (e.g., Hadoop or Spark). How to practically analyze large datasets with limited computational resources then becomes a problem of great importance. To solve this problem, we propose here a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample data as if they were the population. Then, multiple subsamples with greatly reduced sizes are obtained by the method of simple random sampling with replacement. It is remarkable that we do not recommend sampling methods without replacement because this would incur a significant cost for data processing on the hard drive. Such cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory as a whole and then processed easily. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged together to form the final estimator. We theoretically show that the final estimator is consistent and asymptotically normal. Its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is simple enough to be easily implemented on most practical computer systems and thus should have very wide applicability.
现代统计分析经常遇到大数据集。对于这些数据集,传统的估计方法很难立即使用,因为从业者经常受到计算资源有限的困扰。在大多数情况下,它们没有强大的计算资源(例如Hadoop或Spark)。如何在有限的计算资源下对大型数据集进行实际的分析就成为一个非常重要的问题。为了解决这一问题,我们提出了一种新的基于次采样的jackknife方法。关键思想是把整个样本数据当作总体来对待。然后,采用简单随机抽样带替换的方法,得到尺寸大大减小的多个子样本。值得注意的是,我们不建议不进行替换的抽样方法,因为这将导致硬盘上数据处理的巨大成本。如果数据在内存中处理,则不存在这种开销。由于次采样数据的大小相对较小,因此它们可以作为一个整体轻松地读入计算机存储器,然后很容易地进行处理。基于下采样数据集,可以得到目标参数的jackknife-debiased估计量。所得的估计量在统计上是一致的,偏差极小。最后,对来自不同子样本的jackknife-debiased估计量进行平均,形成最终估计量。我们从理论上证明了最终估计量是一致的和渐近正态的。在非常温和的条件下,它的渐近统计效率可与全样本估计器的统计效率相当。所提出的方法非常简单,易于在大多数实际的计算机系统上实现,因此应该具有非常广泛的适用性。
{"title":"Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources","authors":"Shuyuan Wu, Xuening Zhu, Hansheng Wang","doi":"10.5705/ss.202021.0257","DOIUrl":"https://doi.org/10.5705/ss.202021.0257","url":null,"abstract":"Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most cases, they do not have powerful computational resources (e.g., Hadoop or Spark). How to practically analyze large datasets with limited computational resources then becomes a problem of great importance. To solve this problem, we propose here a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample data as if they were the population. Then, multiple subsamples with greatly reduced sizes are obtained by the method of simple random sampling with replacement. It is remarkable that we do not recommend sampling methods without replacement because this would incur a significant cost for data processing on the hard drive. Such cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory as a whole and then processed easily. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged together to form the final estimator. We theoretically show that the final estimator is consistent and asymptotically normal. Its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is simple enough to be easily implemented on most practical computer systems and thus should have very wide applicability.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48682548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction 高维充分降维中的无切片逆回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-13 DOI: 10.5705/ss.202022.0112
Qing Mai, X. Shao, Runmin Wang, Xin Zhang
Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee&Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method.
切片逆回归(SIR, Li, 1991)是一项开创性的工作,也是最被认可的充分降维方法。虽然在高维SIR的理论和方法方面取得了可喜的进展,但仍有两个挑战困扰着高维多变量SIR的应用。首先,在SIR中选择切片的数量是一个难题,它取决于样本量、变量分布和其他实际考虑因素。其次,SIR从单变量响应到多变量响应的扩展并非微不足道。针对与SIR相同的降维子空间,我们提出了一种新的无切片方法,该方法提供了具有高维协变量和单变量或多变量响应的充分降维的统一解。我们通过采用最近开发的鞅差分散度矩阵(MDDM, Lee&Shao 2018)和惩罚特征分解算法来实现这一点。为了建立我们的方法与高维预测器和多变量响应的一致性,我们使用u统计理论为样本MDDM在其人口对应物周围建立了一个新的浓度不等式,这可能是独立的兴趣。仿真和实际数据分析表明,该方法具有良好的有限样本性能。
{"title":"Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction","authors":"Qing Mai, X. Shao, Runmin Wang, Xin Zhang","doi":"10.5705/ss.202022.0112","DOIUrl":"https://doi.org/10.5705/ss.202022.0112","url":null,"abstract":"Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee&Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41485273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed Logistic Regression for Massive Data with Rare Events 具有罕见事件的海量数据的分布式逻辑回归
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-05 DOI: 10.5705/ss.202022.0242
Xia Li, Xuening Zhu, Hansheng Wang
Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we face the following two challenges. The first challenge is how to distribute the data. In this regard, two different distribution strategies (i.e., the RANDOM strategy and the COPY strategy) are investigated. The second challenge is how to select an appropriate type of objective function so that the best asymptotic efficiency can be achieved. Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered. Our results suggest that the COPY strategy together with the IPW objective function is the best solution for distributed logistic regression with rare events. The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Sweden Traffic Sign dataset.
大规模罕见事件数据在实践中经常遇到。为了处理大量的罕见事件数据,我们提出了一种新的分布式系统中逻辑回归的分布式估计方法。对于分布式框架,我们面临以下两个挑战。第一个挑战是如何分发数据。在这方面,研究了两种不同的分发策略(即随机策略和复制策略)。第二个挑战是如何选择合适类型的目标函数,以便达到最佳的渐近效率。然后,考虑了欠采样(US)和逆概率加权(IPW)类型的目标函数。我们的结果表明,COPY策略和IPW目标函数是具有罕见事件的分布式逻辑回归的最佳解决方案。仿真研究和真实世界的瑞典交通标志数据集证明了分布式方法的有限样本性能。
{"title":"Distributed Logistic Regression for Massive Data with Rare Events","authors":"Xia Li, Xuening Zhu, Hansheng Wang","doi":"10.5705/ss.202022.0242","DOIUrl":"https://doi.org/10.5705/ss.202022.0242","url":null,"abstract":"Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we face the following two challenges. The first challenge is how to distribute the data. In this regard, two different distribution strategies (i.e., the RANDOM strategy and the COPY strategy) are investigated. The second challenge is how to select an appropriate type of objective function so that the best asymptotic efficiency can be achieved. Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered. Our results suggest that the COPY strategy together with the IPW objective function is the best solution for distributed logistic regression with rare events. The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Sweden Traffic Sign dataset.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45849352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA. 对数据缺失的多种类型特征进行惩罚回归。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.5705/ss.202020.0401
Kin Yau Wong, Donglin Zeng, D Y Lin

Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.

最近的技术进步使得在生物医学研究中测量多种类型的许多特征成为可能。然而,由于成本或其他限制,有些数据类型或特征可能无法对所有研究对象进行测量。我们使用潜变量模型来描述数据类型间和数据类型内的关系,并从观察到的数据中推断缺失值。我们开发了一种用于变量选择和参数估计的惩罚似然法,并设计了一种高效的期望最大化算法来实现我们的方法。当特征数量以样本量的多项式速率增加时,我们建立了所提出的估计器的渐近特性。最后,我们通过大量的模拟研究证明了所提方法的实用性,并将其应用于一项激励性的多平台基因组学研究。
{"title":"PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.5705/ss.202020.0401","DOIUrl":"10.5705/ss.202020.0401","url":null,"abstract":"<p><p>Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"633-662"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187615/pdf/nihms-1764514.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9482840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data. 利用区间删失竞争风险数据对一类部分线性变换模型进行筛式估计。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0051
Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis

In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as n → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.

在本文中,我们考虑了一类具有区间删失竞争风险数据的部分线性变换模型。在特定病因累积发病率函数的半参数广义几率规范下,我们通过最大化 B-样条曲线和伯恩斯坦多项式联合跨筛空间的似然函数,获得了大量参数和非参数模型成分的最优估计值。我们的规范考虑了相对简单的有限维参数空间,近似于 n → ∞ 的无限维参数空间,从而使我们能够研究所有参数的几乎确定的一致性和收敛率,以及有限维成分的渐近分布和效率。我们通过各种情况下的模拟研究,研究了我们方法的有限样本性能。此外,我们还将我们的方法应用于撒哈拉以南非洲地区的 HIV 感染者数据集,以说明我们的方法。
{"title":"Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data.","authors":"Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis","doi":"10.5705/ss.202021.0051","DOIUrl":"10.5705/ss.202021.0051","url":null,"abstract":"<p><p>In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as <i>n</i> → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"685-704"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10208244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9526092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. 整合多来源高维数据与癌症研究应用的异质性分析。
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0002
Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma

This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.

本研究的动机是癌症研究,异质性分析在癌症研究中起着重要作用,大致可分为无监督和有监督两种。在监督异质性分析中,有限混合回归(FMR)技术被广泛使用,在该技术下,协变量对亚组反应的影响是不同的。高维分子和最近的组织病理学影像学特征被分别分析,并被证明是有效的异质性分析。为了更简单的分析,它们已被证明包含重叠但也独立的信息。在本文中,我们的目标是通过整合高维分子和组织病理成像特征,进行第一次和更有效的基于fmr的癌症异质性分析。一种惩罚方法被开发用于正则化估计,选择相关变量,同样重要的是,促进独立信息的识别。一致性属性是严格建立的。提出了一种有效的计算算法。对癌症基因组图谱(TCGA)肺癌数据的模拟和分析证明了该方法的实际有效性。总之,本研究为监督癌症异质性分析提供了一种实用的新方法。
{"title":"HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.","authors":"Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma","doi":"10.5705/ss.202021.0002","DOIUrl":"10.5705/ss.202021.0002","url":null,"abstract":"<p><p>This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"729-758"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs 多目标最优回归设计的充分必要条件
IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-08 DOI: 10.5705/ss.202022.0328
Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou
We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.
我们通常基于单一目标函数构建最优设计。为了更好地捕捉实验目标的广度,我们可以基于多个目标函数构建一个多目标优化设计。虽然已经开发了算法来寻找多目标优化设计(例如效率约束和最大优化设计),但如何验证从算法中获得的解的最优性远不太清楚。本文给出了离散设计空间上效率约束和最大优化设计的最优性的理论结果。我们演示了如何将我们的结果与线性规划算法结合使用来验证最优性。
{"title":"Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs","authors":"Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou","doi":"10.5705/ss.202022.0328","DOIUrl":"https://doi.org/10.5705/ss.202022.0328","url":null,"abstract":"We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Sliced Inverse Regression via Cholesky Matrix Penalization 基于Cholesky矩阵惩罚的稀疏切片逆回归
3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0406
Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh
We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.
我们引入了一种新的稀疏切片逆回归估计器,称为Cholesky矩阵惩罚及其自适应版本,用于在估计中心子空间的维度时实现稀疏性。新的估计器使用协变量协方差矩阵的Cholesky分解,并在目标函数中包含正则化项,以计算效率高的方式实现稀疏性。建立了实现中心子空间估计和变量选择一致性的调谐参数的理论值。此外,我们提出了一种新的投影信息准则来选择我们所提出的估计器的调优参数,并证明了新准则有助于选择的一致性。Cholesky矩阵惩罚估计器继承了矩阵Lasso和Lasso切片逆回归估计器的强度;它在数值研究中具有优越的性能,并可适用于文献中其他的充分维数方法。
{"title":"Sparse Sliced Inverse Regression via Cholesky Matrix Penalization","authors":"Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh","doi":"10.5705/ss.202020.0406","DOIUrl":"https://doi.org/10.5705/ss.202020.0406","url":null,"abstract":"We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification Prasad-Rao的鲁棒性:潜在模型错配下观测最佳预测器预测误差的均方估计
3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0325
Xiaohui Liu, Haiqiang Ma, Jiming Jiang
{"title":"That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification","authors":"Xiaohui Liu, Haiqiang Ma, Jiming Jiang","doi":"10.5705/ss.202020.0325","DOIUrl":"https://doi.org/10.5705/ss.202020.0325","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical Inference for Functional Time Series 函数时间序列的统计推断
3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0107
Jie Li, Lijian Yang
{"title":"Statistical Inference for Functional Time Series","authors":"Jie Li, Lijian Yang","doi":"10.5705/ss.202021.0107","DOIUrl":"https://doi.org/10.5705/ss.202021.0107","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135182931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Statistica Sinica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1