{"title":"Semiparametric Estimation of Non-ignorable Missingness With Refreshment Sample","authors":"Jianfei Zheng, Jing Wang, L. Xue, A. Qu","doi":"10.5705/ss.202022.0214","DOIUrl":"https://doi.org/10.5705/ss.202022.0214","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70939023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: In this paper, we propose some new necessary and sufficient conditions for identifying isomorphism in two-level fractional factorial designs, using a parallel flats structure. A new algorithm for checking isomorphism is provided accordingly. The proposed algorithm is simple and general, and can be used for either regular or nonregular designs. By taking advantage of the parallel flats structure when it exists, the method is much faster than current methods for assessing the isomorphism of nonregular two-level designs. Examples are given to illustrate the results. An efficient implementation of the proposed algorithm in Matlab can be found in the online Supplementary Material.
{"title":"A More Efficient Isomorphism Check for Two-Level Nonregular Designs","authors":"Chunyan Wang, Robert W. Mee","doi":"10.5705/ss.202022.0200","DOIUrl":"https://doi.org/10.5705/ss.202022.0200","url":null,"abstract":": In this paper, we propose some new necessary and sufficient conditions for identifying isomorphism in two-level fractional factorial designs, using a parallel flats structure. A new algorithm for checking isomorphism is provided accordingly. The proposed algorithm is simple and general, and can be used for either regular or nonregular designs. By taking advantage of the parallel flats structure when it exists, the method is much faster than current methods for assessing the isomorphism of nonregular two-level designs. Examples are given to illustrate the results. An efficient implementation of the proposed algorithm in Matlab can be found in the online Supplementary Material.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70939010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianling Wang, Thuan Nguyen, Y. Luan, Jiming Jiang
Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the
基于事件时间结果的协变量自适应随机临床试验的假设检验
{"title":"Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the AFT Model","authors":"Jianling Wang, Thuan Nguyen, Y. Luan, Jiming Jiang","doi":"10.5705/ss.202022.0011","DOIUrl":"https://doi.org/10.5705/ss.202022.0011","url":null,"abstract":"Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: We propose a test for the hypothesis that the standardized functional principal components (FPCs) of functional data are equal to a given set of orthonormal bases (e.g., the Fourier basis). Using estimates of individual trajectories that satisfy certain approximation conditions, we construct a chi-square-type statistic, and show that it is oracally e(cid:14)cient under the null hypothesis, in the sense that its limiting distribution is the same as that of an infeasible statistic using all trajectories, known as the oracle." The null limiting distribution is an in(cid:12)nite Gaussian quadratic form, and we obtain a consistent estimator of its quantile. A test statistic based on the chi-squared-type statistic and the approximate quantile of the Gaussian quadratic form is shown to be both of the nominal asymptotic signi(cid:12)cance level and asymptotically correct. It is further shown that B-spline trajectory estimates meet the required approximation conditions. Simulation studies demonstrate the superior (cid:12)nite-sample performance of the proposed testing procedure. Using electroencephalogram (EEG) data, the proposed procedure con(cid:12)rms an interesting discovery that the centered EEG data are generated from a small
{"title":"Hypotheses Testing of Functional Principal Components","authors":"Zening Song, Lijian Yang, Yuanyuan Zhang","doi":"10.5705/ss.202022.0309","DOIUrl":"https://doi.org/10.5705/ss.202022.0309","url":null,"abstract":": We propose a test for the hypothesis that the standardized functional principal components (FPCs) of functional data are equal to a given set of orthonormal bases (e.g., the Fourier basis). Using estimates of individual trajectories that satisfy certain approximation conditions, we construct a chi-square-type statistic, and show that it is oracally e(cid:14)cient under the null hypothesis, in the sense that its limiting distribution is the same as that of an infeasible statistic using all trajectories, known as the oracle.\" The null limiting distribution is an in(cid:12)nite Gaussian quadratic form, and we obtain a consistent estimator of its quantile. A test statistic based on the chi-squared-type statistic and the approximate quantile of the Gaussian quadratic form is shown to be both of the nominal asymptotic signi(cid:12)cance level and asymptotically correct. It is further shown that B-spline trajectory estimates meet the required approximation conditions. Simulation studies demonstrate the superior (cid:12)nite-sample performance of the proposed testing procedure. Using electroencephalogram (EEG) data, the proposed procedure con(cid:12)rms an interesting discovery that the centered EEG data are generated from a small","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70940204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.
{"title":"Two-Sample Tests for Relevant Differences in the Eigenfunctions of Covariance Operators","authors":"Alexander Aue, Holger Dette, Gregory Rice","doi":"10.5705/ss.202020.0365","DOIUrl":"https://doi.org/10.5705/ss.202020.0365","url":null,"abstract":"This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene expression and phenotype association can be affected by potential unmeasured confounders from multiple sources, leading to biased estimates of the associations. Since genetic variants largely explain gene expression variations, they can be used as instruments in studying the association between gene expressions and phenotype in the framework of high dimensional instrumental variable (IV) regression. However, because the dimensions of both genetic variants and gene expressions are often larger than the sample size, statistical inferences such as hypothesis testing for such high dimensional IV models are not trivial and have not been investigated in literature. The problem is more challenging since the instrumental variables (e.g., genetic variants) have to be selected among a large set of genetic variants. This paper considers the problem of hypothesis testing for sparse IV regression models and presents methods for testing single regression coefficient and multiple testing of multiple coefficients, where the test statistic for each single coefficient is constructed based on an inverse regression. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. Simulations are conducted to evaluate the performance of our proposed methods. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.
{"title":"Hypothesis Testing in High-Dimensional Instrumental Variables Regression With an Application to Genomics Data","authors":"Jiarui Lu, Hongzhe Li","doi":"10.5705/ss.202019.0408","DOIUrl":"https://doi.org/10.5705/ss.202019.0408","url":null,"abstract":"Gene expression and phenotype association can be affected by potential unmeasured confounders from multiple sources, leading to biased estimates of the associations. Since genetic variants largely explain gene expression variations, they can be used as instruments in studying the association between gene expressions and phenotype in the framework of high dimensional instrumental variable (IV) regression. However, because the dimensions of both genetic variants and gene expressions are often larger than the sample size, statistical inferences such as hypothesis testing for such high dimensional IV models are not trivial and have not been investigated in literature. The problem is more challenging since the instrumental variables (e.g., genetic variants) have to be selected among a large set of genetic variants. This paper considers the problem of hypothesis testing for sparse IV regression models and presents methods for testing single regression coefficient and multiple testing of multiple coefficients, where the test statistic for each single coefficient is constructed based on an inverse regression. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. Simulations are conducted to evaluate the performance of our proposed methods. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"53 60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135783365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional principal component analysis (FPCA) is a fundamental tool and has attracted increasing attention in recent decades, while existing methods are restricted to data with a single or finite number of random functions (much smaller than the sample size $n$). In this work, we focus on high-dimensional functional processes where the number of random functions $p$ is comparable to, or even much larger than $n$. Such data are ubiquitous in various fields such as neuroimaging analysis, and cannot be properly modeled by existing methods. We propose a new algorithm, called sparse FPCA, which is able to model principal eigenfunctions effectively under sensible sparsity regimes. While sparsity assumptions are standard in multivariate statistics, they have not been investigated in the complex context where not only is $p$ large, but also each variable itself is an intrinsically infinite-dimensional process. The sparsity structure motivates a thresholding rule that is easy to compute without nonparametric smoothing by exploiting the relationship between univariate orthonormal basis expansions and multivariate Kahunen-Lo`eve (K-L) representations. We investigate the theoretical properties of the resulting estimators, and illustrate the performance with simulated and real data examples.
功能主成分分析(Functional principal component analysis, FPCA)是一种基础工具,近几十年来受到越来越多的关注,而现有的方法仅限于具有单个或有限数量的随机函数(远小于样本量)的数据。在这项工作中,我们专注于高维函数过程,其中随机函数的数量p与n相当,甚至远远大于n。这些数据在神经影像分析等各个领域都无处不在,无法用现有方法正确建模。我们提出了一种新的算法,称为稀疏FPCA,它能够在显稀疏性条件下有效地建模主特征函数。虽然稀疏性假设在多元统计中是标准的,但它们并没有在复杂的环境中进行研究,在这种环境中,不仅$p$大,而且每个变量本身本质上是一个无限维的过程。通过利用单变量正交基展开式和多元Kahunen-Lo ' eve (K-L)表示之间的关系,稀疏性结构激发了一种无需非参数平滑即可轻松计算的阈值规则。我们研究了所得到的估计器的理论性质,并用模拟和实际数据实例说明了其性能。
{"title":"Sparse Functional Principal Component Analysis in High Dimensions","authors":"Xiaoyu Hu, Fang Yao","doi":"10.5705/ss.202020.0445","DOIUrl":"https://doi.org/10.5705/ss.202020.0445","url":null,"abstract":"Functional principal component analysis (FPCA) is a fundamental tool and has attracted increasing attention in recent decades, while existing methods are restricted to data with a single or finite number of random functions (much smaller than the sample size $n$). In this work, we focus on high-dimensional functional processes where the number of random functions $p$ is comparable to, or even much larger than $n$. Such data are ubiquitous in various fields such as neuroimaging analysis, and cannot be properly modeled by existing methods. We propose a new algorithm, called sparse FPCA, which is able to model principal eigenfunctions effectively under sensible sparsity regimes. While sparsity assumptions are standard in multivariate statistics, they have not been investigated in the complex context where not only is $p$ large, but also each variable itself is an intrinsically infinite-dimensional process. The sparsity structure motivates a thresholding rule that is easy to compute without nonparametric smoothing by exploiting the relationship between univariate orthonormal basis expansions and multivariate Kahunen-Lo`eve (K-L) representations. We investigate the theoretical properties of the resulting estimators, and illustrate the performance with simulated and real data examples.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136092729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}