R. Barber, E. Candès, Aaditya Ramdas, R. Tibshirani
Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
{"title":"Conformal prediction beyond exchangeability","authors":"R. Barber, E. Candès, Aaditya Ramdas, R. Tibshirani","doi":"10.1214/23-aos2276","DOIUrl":"https://doi.org/10.1214/23-aos2276","url":null,"abstract":"Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84913508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tie-breaker designs trade off a statistical design objective with short-term gain from preferentially assigning a binary treatment to those with high values of a running variable $x$. The design objective is any continuous function of the expected information matrix in a two-line regression model, and short-term gain is expressed as the covariance between the running variable and the treatment indicator. We investigate how to specify design functions indicating treatment probabilities as a function of $x$ to optimize these competing objectives, under external constraints on the number of subjects receiving treatment. Our results include sharp existence and uniqueness guarantees, while accommodating the ethically appealing requirement that treatment probabilities are non-decreasing in $x$. Under such a constraint, there always exists an optimal design function that is constant below and above a single discontinuity. When the running variable distribution is not symmetric or the fraction of subjects receiving the treatment is not $1/2$, our optimal designs improve upon a $D$-optimality objective without sacrificing short-term gain, compared to the three level tie-breaker designs of Owen and Varian (2020) that fix treatment probabilities at $0$, $1/2$, and $1$. We illustrate our optimal designs with data from Head Start, an early childhood government intervention program.
{"title":"A general characterization of optimal tie-breaker designs","authors":"Harrison H. Li, A. Owen","doi":"10.1214/23-aos2275","DOIUrl":"https://doi.org/10.1214/23-aos2275","url":null,"abstract":"Tie-breaker designs trade off a statistical design objective with short-term gain from preferentially assigning a binary treatment to those with high values of a running variable $x$. The design objective is any continuous function of the expected information matrix in a two-line regression model, and short-term gain is expressed as the covariance between the running variable and the treatment indicator. We investigate how to specify design functions indicating treatment probabilities as a function of $x$ to optimize these competing objectives, under external constraints on the number of subjects receiving treatment. Our results include sharp existence and uniqueness guarantees, while accommodating the ethically appealing requirement that treatment probabilities are non-decreasing in $x$. Under such a constraint, there always exists an optimal design function that is constant below and above a single discontinuity. When the running variable distribution is not symmetric or the fraction of subjects receiving the treatment is not $1/2$, our optimal designs improve upon a $D$-optimality objective without sacrificing short-term gain, compared to the three level tie-breaker designs of Owen and Varian (2020) that fix treatment probabilities at $0$, $1/2$, and $1$. We illustrate our optimal designs with data from Head Start, an early childhood government intervention program.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73786500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to $b$ bits. We investigate both the $d$- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings, we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. We also observe that consistent nonparametric distributed testing is always possible, even with as little as $1$-bit of communication and the corresponding test outperforms the best local test using only the information available at a single local machine. Furthermore, we also derive adaptive nonparametric distributed testing strategies and the corresponding theoretical lower bounds.
{"title":"Optimal high-dimensional and nonparametric distributed testing under communication constraints","authors":"Botond Szab'o, Lasse Vuursteen, H. Zanten","doi":"10.1214/23-aos2269","DOIUrl":"https://doi.org/10.1214/23-aos2269","url":null,"abstract":"We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to $b$ bits. We investigate both the $d$- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings, we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. We also observe that consistent nonparametric distributed testing is always possible, even with as little as $1$-bit of communication and the corresponding test outperforms the best local test using only the information available at a single local machine. Furthermore, we also derive adaptive nonparametric distributed testing strategies and the corresponding theoretical lower bounds.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84697502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimax nonparametric estimation of pure quantum states","authors":"Samriddha Lahiry, M. Nussbaum","doi":"10.1214/21-aos2115","DOIUrl":"https://doi.org/10.1214/21-aos2115","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83358734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many complex networks in the real world can be formulated as hypergraphs where community detection has been widely used. However, the fundamental question of whether communities exist or not in an observed hypergraph remains unclear. This work aims to tackle this important problem. Specifically, we systematically study when a hypergraph with community structure can be successfully distinguished from its Erdős–Rényi counterpart, and propose concrete test statistics when the models are distinguishable. The main contribution of this paper is threefold. First, we discover a phase transition in the hyperedge probability for distinguishability. Second, in the bounded-degree regime, we derive a sharp signal-to-noise ratio (SNR) threshold for distinguishability in the special two-community 3uniform hypergraphs, and derive nearly tight SNR thresholds in the general two-community m-uniform hypergraphs. Third, in the dense regime, we propose a computationally feasible test based on sub-hypergraph counts, obtain its asymptotic distribution, and analyze its power. Our results are further extended to nonuniform hypergraphs in which a new test involving both edge and hyperedge information is proposed. The proofs rely on Janson’s contiguity theory (Combin. Probab. Comput. 4 (1995) 369–405), a high-moments driven asymptotic normality result by Gao and Wormald (Probab. Theory Related Fields 130 (2004) 368–376), and a truncation technique for analyzing the likelihood ratio.
{"title":"Testing community structure for hypergraphs","authors":"Mingao Yuan, Ruiqi Liu, Yang Feng, Zuofeng Shang","doi":"10.1214/21-aos2099","DOIUrl":"https://doi.org/10.1214/21-aos2099","url":null,"abstract":"Many complex networks in the real world can be formulated as hypergraphs where community detection has been widely used. However, the fundamental question of whether communities exist or not in an observed hypergraph remains unclear. This work aims to tackle this important problem. Specifically, we systematically study when a hypergraph with community structure can be successfully distinguished from its Erdős–Rényi counterpart, and propose concrete test statistics when the models are distinguishable. The main contribution of this paper is threefold. First, we discover a phase transition in the hyperedge probability for distinguishability. Second, in the bounded-degree regime, we derive a sharp signal-to-noise ratio (SNR) threshold for distinguishability in the special two-community 3uniform hypergraphs, and derive nearly tight SNR thresholds in the general two-community m-uniform hypergraphs. Third, in the dense regime, we propose a computationally feasible test based on sub-hypergraph counts, obtain its asymptotic distribution, and analyze its power. Our results are further extended to nonuniform hypergraphs in which a new test involving both edge and hyperedge information is proposed. The proofs rely on Janson’s contiguity theory (Combin. Probab. Comput. 4 (1995) 369–405), a high-moments driven asymptotic normality result by Gao and Wormald (Probab. Theory Related Fields 130 (2004) 368–376), and a truncation technique for analyzing the likelihood ratio.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82013937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a general theory and estimation methods for functional linear sufficient dimension reduction, where both the predictor and the response can be random functions, or even vectors of functions. Unlike the existing dimension reduction methods, our approach does not rely on the estimation of conditional mean and conditional variance. Instead, it is based on a new statistical construction — the weak conditional expectation, which is based on Carleman operators and their inducing functions. Weak conditional expectation is a generalization of conditional expectation. Its key advantage is to replace the projection on to an L2-space — which defines conditional expectation — by projection on to an arbitrary Hilbert space, while still maintaining the unbiasedness of the related dimension reduction methods. This flexibility is particularly important for functional data, because attempting to estimate a full-fledged conditional mean or conditional variance by slicing or smoothing over the space of vector-valued functions may be inefficient due to the curse of dimensionality. We evaluated the performances of the our new methods by simulation and in several applied settings.
{"title":"Dimension reduction for functional data based on weak conditional moments","authors":"Bing Li, Jun Song","doi":"10.1214/21-aos2091","DOIUrl":"https://doi.org/10.1214/21-aos2091","url":null,"abstract":"We develop a general theory and estimation methods for functional linear sufficient dimension reduction, where both the predictor and the response can be random functions, or even vectors of functions. Unlike the existing dimension reduction methods, our approach does not rely on the estimation of conditional mean and conditional variance. Instead, it is based on a new statistical construction — the weak conditional expectation, which is based on Carleman operators and their inducing functions. Weak conditional expectation is a generalization of conditional expectation. Its key advantage is to replace the projection on to an L2-space — which defines conditional expectation — by projection on to an arbitrary Hilbert space, while still maintaining the unbiasedness of the related dimension reduction methods. This flexibility is particularly important for functional data, because attempting to estimate a full-fledged conditional mean or conditional variance by slicing or smoothing over the space of vector-valued functions may be inefficient due to the curse of dimensionality. We evaluated the performances of the our new methods by simulation and in several applied settings.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83637993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.
{"title":"Half-trek criterion for identifiability of latent variable models","authors":"R. Barber, M. Drton, Nils Sturma, Luca Weihs","doi":"10.1214/22-aos2221","DOIUrl":"https://doi.org/10.1214/22-aos2221","url":null,"abstract":"We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79605232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. In this paper, we start with a general minimax lower bound result, which disentangles the costs of being robust against Huber's contamination and preserving LDP. We further study four concrete examples: a two-point testing problem, a potentially-diverging mean estimation problem, a nonparametric density estimation problem and a univariate median estimation problem. For each problem, we demonstrate procedures that are optimal in the presence of both contamination and LDP constraints, comment on the connections with the state-of-the-art methods that are only studied under either contamination or privacy constraints, and unveil the connections between robustness and LDP via partially answering whether LDP procedures are robust and whether robust procedures can be efficiently privatised. Overall, our work showcases a promising prospect of joint study for robustness and local differential privacy.
{"title":"On robustness and local differential privacy","authors":"Mengchu Li, Thomas B. Berrett, Yi Yu","doi":"10.1214/23-aos2267","DOIUrl":"https://doi.org/10.1214/23-aos2267","url":null,"abstract":"It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. In this paper, we start with a general minimax lower bound result, which disentangles the costs of being robust against Huber's contamination and preserving LDP. We further study four concrete examples: a two-point testing problem, a potentially-diverging mean estimation problem, a nonparametric density estimation problem and a univariate median estimation problem. For each problem, we demonstrate procedures that are optimal in the presence of both contamination and LDP constraints, comment on the connections with the state-of-the-art methods that are only studied under either contamination or privacy constraints, and unveil the connections between robustness and LDP via partially answering whether LDP procedures are robust and whether robust procedures can be efficiently privatised. Overall, our work showcases a promising prospect of joint study for robustness and local differential privacy.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84019787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the pivotal variable selection problem, we derive the exact non-asymptotic minimax selector over the class of all $s$-sparse vectors, which is also the Bayes selector with respect to the uniform prior. While this optimal selector is, in general, not realizable in polynomial time, we show that its tractable counterpart (the scan selector) attains the minimax expected Hamming risk to within factor 2, and is also exact minimax with respect to the probability of wrong recovery. As a consequence, we establish explicit lower bounds under the monotone likelihood ratio property and we obtain a tight characterization of the minimax risk in terms of the best separable selector risk. We apply these general results to derive necessary and sufficient conditions of exact and almost full recovery in the location model with light tail distributions and in the problem of group variable selection under Gaussian noise.
{"title":"Variable selection, monotone likelihood ratio and group sparsity","authors":"C. Butucea, E. Mammen, M. Ndaoud, A. Tsybakov","doi":"10.1214/22-aos2251","DOIUrl":"https://doi.org/10.1214/22-aos2251","url":null,"abstract":"In the pivotal variable selection problem, we derive the exact non-asymptotic minimax selector over the class of all $s$-sparse vectors, which is also the Bayes selector with respect to the uniform prior. While this optimal selector is, in general, not realizable in polynomial time, we show that its tractable counterpart (the scan selector) attains the minimax expected Hamming risk to within factor 2, and is also exact minimax with respect to the probability of wrong recovery. As a consequence, we establish explicit lower bounds under the monotone likelihood ratio property and we obtain a tight characterization of the minimax risk in terms of the best separable selector risk. We apply these general results to derive necessary and sufficient conditions of exact and almost full recovery in the location model with light tail distributions and in the problem of group variable selection under Gaussian noise.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82400158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiple imputation (MI) is a technique especially designed for handling missing data in public-use datasets. It allows analysts to perform incompletedata inference straightforwardly by using several already imputed datasets released by the dataset owners. However, the existing MI tests require either a restrictive assumption on the missing-data mechanism, known as equal odds of missing information (EOMI), or an infinite number of imputations. Some of them also require analysts to have access to restrictive or nonstandard computer subroutines. Besides, the existing MI testing procedures cover only Wald’s tests and likelihood ratio tests but not Rao’s score tests, therefore, these MI testing procedures are not general enough. In addition, the MI Wald’s tests and MI likelihood ratio tests are not procedurally identical, so analysts need to resort to distinct algorithms for implementation. In this paper, we propose a general MI procedure, called stacked multiple imputation (SMI), for performing Wald’s tests, likelihood ratio tests and Rao’s score tests by a unified algorithm. SMI requires neither EOMI nor an infinite number of imputations. It is particularly feasible for analysts as they just need to use a complete-data testing device for performing the corresponding incomplete-data test.
{"title":"General and feasible tests with multiply-imputed datasets","authors":"Kin Wai Chan","doi":"10.1214/21-aos2132","DOIUrl":"https://doi.org/10.1214/21-aos2132","url":null,"abstract":"Multiple imputation (MI) is a technique especially designed for handling missing data in public-use datasets. It allows analysts to perform incompletedata inference straightforwardly by using several already imputed datasets released by the dataset owners. However, the existing MI tests require either a restrictive assumption on the missing-data mechanism, known as equal odds of missing information (EOMI), or an infinite number of imputations. Some of them also require analysts to have access to restrictive or nonstandard computer subroutines. Besides, the existing MI testing procedures cover only Wald’s tests and likelihood ratio tests but not Rao’s score tests, therefore, these MI testing procedures are not general enough. In addition, the MI Wald’s tests and MI likelihood ratio tests are not procedurally identical, so analysts need to resort to distinct algorithms for implementation. In this paper, we propose a general MI procedure, called stacked multiple imputation (SMI), for performing Wald’s tests, likelihood ratio tests and Rao’s score tests by a unified algorithm. SMI requires neither EOMI nor an infinite number of imputations. It is particularly feasible for analysts as they just need to use a complete-data testing device for performing the corresponding incomplete-data test.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90771126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}