Jianling Wang, Thuan Nguyen, Y. Luan, Jiming Jiang
Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the
基于事件时间结果的协变量自适应随机临床试验的假设检验
{"title":"Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the AFT Model","authors":"Jianling Wang, Thuan Nguyen, Y. Luan, Jiming Jiang","doi":"10.5705/ss.202022.0011","DOIUrl":"https://doi.org/10.5705/ss.202022.0011","url":null,"abstract":"Testing Hypotheses of Covariate-Adaptive Randomized Clinical Trials with Time-to-event Outcomes under the","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: We propose a test for the hypothesis that the standardized functional principal components (FPCs) of functional data are equal to a given set of orthonormal bases (e.g., the Fourier basis). Using estimates of individual trajectories that satisfy certain approximation conditions, we construct a chi-square-type statistic, and show that it is oracally e(cid:14)cient under the null hypothesis, in the sense that its limiting distribution is the same as that of an infeasible statistic using all trajectories, known as the oracle." The null limiting distribution is an in(cid:12)nite Gaussian quadratic form, and we obtain a consistent estimator of its quantile. A test statistic based on the chi-squared-type statistic and the approximate quantile of the Gaussian quadratic form is shown to be both of the nominal asymptotic signi(cid:12)cance level and asymptotically correct. It is further shown that B-spline trajectory estimates meet the required approximation conditions. Simulation studies demonstrate the superior (cid:12)nite-sample performance of the proposed testing procedure. Using electroencephalogram (EEG) data, the proposed procedure con(cid:12)rms an interesting discovery that the centered EEG data are generated from a small
{"title":"Hypotheses Testing of Functional Principal Components","authors":"Zening Song, Lijian Yang, Yuanyuan Zhang","doi":"10.5705/ss.202022.0309","DOIUrl":"https://doi.org/10.5705/ss.202022.0309","url":null,"abstract":": We propose a test for the hypothesis that the standardized functional principal components (FPCs) of functional data are equal to a given set of orthonormal bases (e.g., the Fourier basis). Using estimates of individual trajectories that satisfy certain approximation conditions, we construct a chi-square-type statistic, and show that it is oracally e(cid:14)cient under the null hypothesis, in the sense that its limiting distribution is the same as that of an infeasible statistic using all trajectories, known as the oracle.\" The null limiting distribution is an in(cid:12)nite Gaussian quadratic form, and we obtain a consistent estimator of its quantile. A test statistic based on the chi-squared-type statistic and the approximate quantile of the Gaussian quadratic form is shown to be both of the nominal asymptotic signi(cid:12)cance level and asymptotically correct. It is further shown that B-spline trajectory estimates meet the required approximation conditions. Simulation studies demonstrate the superior (cid:12)nite-sample performance of the proposed testing procedure. Using electroencephalogram (EEG) data, the proposed procedure con(cid:12)rms an interesting discovery that the centered EEG data are generated from a small","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70940204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene expression and phenotype association can be affected by potential unmeasured confounders from multiple sources, leading to biased estimates of the associations. Since genetic variants largely explain gene expression variations, they can be used as instruments in studying the association between gene expressions and phenotype in the framework of high dimensional instrumental variable (IV) regression. However, because the dimensions of both genetic variants and gene expressions are often larger than the sample size, statistical inferences such as hypothesis testing for such high dimensional IV models are not trivial and have not been investigated in literature. The problem is more challenging since the instrumental variables (e.g., genetic variants) have to be selected among a large set of genetic variants. This paper considers the problem of hypothesis testing for sparse IV regression models and presents methods for testing single regression coefficient and multiple testing of multiple coefficients, where the test statistic for each single coefficient is constructed based on an inverse regression. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. Simulations are conducted to evaluate the performance of our proposed methods. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.
{"title":"Hypothesis Testing in High-Dimensional Instrumental Variables Regression With an Application to Genomics Data","authors":"Jiarui Lu, Hongzhe Li","doi":"10.5705/ss.202019.0408","DOIUrl":"https://doi.org/10.5705/ss.202019.0408","url":null,"abstract":"Gene expression and phenotype association can be affected by potential unmeasured confounders from multiple sources, leading to biased estimates of the associations. Since genetic variants largely explain gene expression variations, they can be used as instruments in studying the association between gene expressions and phenotype in the framework of high dimensional instrumental variable (IV) regression. However, because the dimensions of both genetic variants and gene expressions are often larger than the sample size, statistical inferences such as hypothesis testing for such high dimensional IV models are not trivial and have not been investigated in literature. The problem is more challenging since the instrumental variables (e.g., genetic variants) have to be selected among a large set of genetic variants. This paper considers the problem of hypothesis testing for sparse IV regression models and presents methods for testing single regression coefficient and multiple testing of multiple coefficients, where the test statistic for each single coefficient is constructed based on an inverse regression. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. Simulations are conducted to evaluate the performance of our proposed methods. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135783365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.
{"title":"Two-Sample Tests for Relevant Differences in the Eigenfunctions of Covariance Operators","authors":"Alexander Aue, Holger Dette, Gregory Rice","doi":"10.5705/ss.202020.0365","DOIUrl":"https://doi.org/10.5705/ss.202020.0365","url":null,"abstract":"This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional principal component analysis (FPCA) is a fundamental tool and has attracted increasing attention in recent decades, while existing methods are restricted to data with a single or finite number of random functions (much smaller than the sample size $n$). In this work, we focus on high-dimensional functional processes where the number of random functions $p$ is comparable to, or even much larger than $n$. Such data are ubiquitous in various fields such as neuroimaging analysis, and cannot be properly modeled by existing methods. We propose a new algorithm, called sparse FPCA, which is able to model principal eigenfunctions effectively under sensible sparsity regimes. While sparsity assumptions are standard in multivariate statistics, they have not been investigated in the complex context where not only is $p$ large, but also each variable itself is an intrinsically infinite-dimensional process. The sparsity structure motivates a thresholding rule that is easy to compute without nonparametric smoothing by exploiting the relationship between univariate orthonormal basis expansions and multivariate Kahunen-Lo`eve (K-L) representations. We investigate the theoretical properties of the resulting estimators, and illustrate the performance with simulated and real data examples.
功能主成分分析(Functional principal component analysis, FPCA)是一种基础工具,近几十年来受到越来越多的关注,而现有的方法仅限于具有单个或有限数量的随机函数(远小于样本量)的数据。在这项工作中,我们专注于高维函数过程,其中随机函数的数量p与n相当,甚至远远大于n。这些数据在神经影像分析等各个领域都无处不在,无法用现有方法正确建模。我们提出了一种新的算法,称为稀疏FPCA,它能够在显稀疏性条件下有效地建模主特征函数。虽然稀疏性假设在多元统计中是标准的,但它们并没有在复杂的环境中进行研究,在这种环境中,不仅$p$大,而且每个变量本身本质上是一个无限维的过程。通过利用单变量正交基展开式和多元Kahunen-Lo ' eve (K-L)表示之间的关系,稀疏性结构激发了一种无需非参数平滑即可轻松计算的阈值规则。我们研究了所得到的估计器的理论性质,并用模拟和实际数据实例说明了其性能。
{"title":"Sparse Functional Principal Component Analysis in High Dimensions","authors":"Xiaoyu Hu, Fang Yao","doi":"10.5705/ss.202020.0445","DOIUrl":"https://doi.org/10.5705/ss.202020.0445","url":null,"abstract":"Functional principal component analysis (FPCA) is a fundamental tool and has attracted increasing attention in recent decades, while existing methods are restricted to data with a single or finite number of random functions (much smaller than the sample size $n$). In this work, we focus on high-dimensional functional processes where the number of random functions $p$ is comparable to, or even much larger than $n$. Such data are ubiquitous in various fields such as neuroimaging analysis, and cannot be properly modeled by existing methods. We propose a new algorithm, called sparse FPCA, which is able to model principal eigenfunctions effectively under sensible sparsity regimes. While sparsity assumptions are standard in multivariate statistics, they have not been investigated in the complex context where not only is $p$ large, but also each variable itself is an intrinsically infinite-dimensional process. The sparsity structure motivates a thresholding rule that is easy to compute without nonparametric smoothing by exploiting the relationship between univariate orthonormal basis expansions and multivariate Kahunen-Lo`eve (K-L) representations. We investigate the theoretical properties of the resulting estimators, and illustrate the performance with simulated and real data examples.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136092729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider sequential change-point detection in parallel data streams, where each stream has its own change point. Once a change is detected in a data stream, this stream is deactivated permanently. The goal is to maximize the normal operation of the pre-change streams, while controlling the proportion of post-change streams among the active streams at all time points. Taking a Bayesian formulation, we develop a compound decision framework for this problem. A procedure is proposed that is uniformly optimal among all sequential procedures which control the expected proportion of post-change streams at all time points. We also investigate the asymptotic behavior of the proposed method when the number of data streams grows large. Numerical examples are provided to illustrate the use and performance of the proposed method.
{"title":"Compound Sequential Change-point Detection in Parallel Data Streams","authors":"Yunxiao Chen, Xiaoou Li","doi":"10.5705/ss.202020.0508","DOIUrl":"https://doi.org/10.5705/ss.202020.0508","url":null,"abstract":"We consider sequential change-point detection in parallel data streams, where each stream has its own change point. Once a change is detected in a data stream, this stream is deactivated permanently. The goal is to maximize the normal operation of the pre-change streams, while controlling the proportion of post-change streams among the active streams at all time points. Taking a Bayesian formulation, we develop a compound decision framework for this problem. A procedure is proposed that is uniformly optimal among all sequential procedures which control the expected proportion of post-change streams at all time points. We also investigate the asymptotic behavior of the proposed method when the number of data streams grows large. Numerical examples are provided to illustrate the use and performance of the proposed method.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136297313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: This paper considers testing for two-sample mean difference with high-dimensional temporally dependent data, which is later extended to the one-sample situation. To eliminate the bias caused by the temporal dependence among the time series observations, a band-excluded U-statistic (BEU) is proposed to estimate the squared Euclidean distance between the two means, which excludes cross-products of data vectors among temporally close time points. The asymptotic normality of the BEU statistic is derived under the high-dimensional setting with “spatial” (column-wise) and temporal dependence. An estimator built on the kernel smoothed cross-time covariances is developed to estimate the variance of the BEU-statistic, which facilitates a test procedure based on the standardized BEU-statistic. The proposed test is nonparametric and adaptive to a wide range of dependence and dimensionality, and has attractive power properties relative to a self-normalized test. Numerical simulation and a real data analysis on the return and volatility of S&P 500 stocks before and after the 2008 financial crisis are conducted to demonstrate the performance and utility of the proposed test.
{"title":"Mean Tests For High-dimensional Time Series","authors":"Shuyi Zhang, Songxi Chen, Yumou Qiu","doi":"10.5705/ss.202022.0147","DOIUrl":"https://doi.org/10.5705/ss.202022.0147","url":null,"abstract":": This paper considers testing for two-sample mean difference with high-dimensional temporally dependent data, which is later extended to the one-sample situation. To eliminate the bias caused by the temporal dependence among the time series observations, a band-excluded U-statistic (BEU) is proposed to estimate the squared Euclidean distance between the two means, which excludes cross-products of data vectors among temporally close time points. The asymptotic normality of the BEU statistic is derived under the high-dimensional setting with “spatial” (column-wise) and temporal dependence. An estimator built on the kernel smoothed cross-time covariances is developed to estimate the variance of the BEU-statistic, which facilitates a test procedure based on the standardized BEU-statistic. The proposed test is nonparametric and adaptive to a wide range of dependence and dimensionality, and has attractive power properties relative to a self-normalized test. Numerical simulation and a real data analysis on the return and volatility of S&P 500 stocks before and after the 2008 financial crisis are conducted to demonstrate the performance and utility of the proposed test.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two distributions are stochastically ordered, meaning intuitively that samples from one distribution tend to be larger than those from the other. One type of stochastic order that arises in economics, biomedicine, and elsewhere is the likelihood ratio order, also known as the density ratio order, in which the ratio of the density functions of the two distributions is monotone non-decreasing. In this article, we derive and study the nonparametric maximum likelihood estimator of the individual distributions and the ratio of their densities under the likelihood ratio order. Our work applies to discrete distributions, continuous distributions, and mixed continuous-discrete distributions. We demonstrate convergence in distribution of the estimator in certain cases, and we illustrate our results using numerical experiments and an analysis of a biomarker for predicting bacterial infection in children with systemic inflammatory response syndrome.
{"title":"Nonparametric Maximum Likelihood Estimation Under a Likelihood Ratio Order","authors":"Ted Westling, Kevin J. Downes, Dylan S. Small","doi":"10.5705/ss.202020.0207","DOIUrl":"https://doi.org/10.5705/ss.202020.0207","url":null,"abstract":"Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two distributions are stochastically ordered, meaning intuitively that samples from one distribution tend to be larger than those from the other. One type of stochastic order that arises in economics, biomedicine, and elsewhere is the likelihood ratio order, also known as the density ratio order, in which the ratio of the density functions of the two distributions is monotone non-decreasing. In this article, we derive and study the nonparametric maximum likelihood estimator of the individual distributions and the ratio of their densities under the likelihood ratio order. Our work applies to discrete distributions, continuous distributions, and mixed continuous-discrete distributions. We demonstrate convergence in distribution of the estimator in certain cases, and we illustrate our results using numerical experiments and an analysis of a biomarker for predicting bacterial infection in children with systemic inflammatory response syndrome.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136008744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}