{"title":"White noise testing for functional time series","authors":"Mihyun Kim, P. Kokoszka, Gregory Rice","doi":"10.1214/23-ss143","DOIUrl":"https://doi.org/10.1214/23-ss143","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"3 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89270770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work reviews the literature on spline local basis methods for non-parametric density estimation. Particular attention is paid to B-spline density estimators which have experienced recent advances in both theory and methodology. These estimators occupy a very interesting space in statistics, which lies aptly at the cross-section of numerous statistical frameworks. New insights, experiments, and analyses are presented to cast the various estimation concepts in a unified context, while parallels and contrasts are drawn to the more familiar contexts of kernel density estimation. Unlike kernel density estimation, the study of local basis estimation is not yet fully mature, and this work also aims to highlight the gaps in existing literature which merit further investigation.
{"title":"Spline local basis methods for nonparametric density estimation","authors":"J. Lars Kirkby, Álvaro Leitao, Duy Nguyen","doi":"10.1214/23-ss142","DOIUrl":"https://doi.org/10.1214/23-ss142","url":null,"abstract":"This work reviews the literature on spline local basis methods for non-parametric density estimation. Particular attention is paid to B-spline density estimators which have experienced recent advances in both theory and methodology. These estimators occupy a very interesting space in statistics, which lies aptly at the cross-section of numerous statistical frameworks. New insights, experiments, and analyses are presented to cast the various estimation concepts in a unified context, while parallels and contrasts are drawn to the more familiar contexts of kernel density estimation. Unlike kernel density estimation, the study of local basis estimation is not yet fully mature, and this work also aims to highlight the gaps in existing literature which merit further investigation.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135585314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many real-world networks are theorized to have core-periphery structure consisting of a densely-connected core and a loosely-connected periphery. While this phenomenon has been extensively studied in a range of scientific disciplines, it has not received sufficient attention in the statistics community. In this expository article, our goal is to raise awareness about this topic and encourage statisticians to address the many open inference problems in this area. To this end, we first summarize the current research landscape by reviewing the metrics and models that have been used for quantitative studies on core-periphery structure. Next, we formulate and explore various inferential problems in this context, such as estimation, hypothesis testing, and Bayesian inference, and discuss related computational techniques. We also outline the multidisciplinary scientific impact of core-periphery structure in a number of real-world networks. Throughout the article, we provide our own interpretation of the literature from a statistical perspective, with the goal of prioritizing open problems where contribution from the statistics community will be most effective and important.
{"title":"Core-periphery structure in networks: A statistical exposition","authors":"Eric Yanchenko, Srijan Sengupta","doi":"10.1214/23-ss141","DOIUrl":"https://doi.org/10.1214/23-ss141","url":null,"abstract":"Many real-world networks are theorized to have core-periphery structure consisting of a densely-connected core and a loosely-connected periphery. While this phenomenon has been extensively studied in a range of scientific disciplines, it has not received sufficient attention in the statistics community. In this expository article, our goal is to raise awareness about this topic and encourage statisticians to address the many open inference problems in this area. To this end, we first summarize the current research landscape by reviewing the metrics and models that have been used for quantitative studies on core-periphery structure. Next, we formulate and explore various inferential problems in this context, such as estimation, hypothesis testing, and Bayesian inference, and discuss related computational techniques. We also outline the multidisciplinary scientific impact of core-periphery structure in a number of real-world networks. Throughout the article, we provide our own interpretation of the literature from a statistical perspective, with the goal of prioritizing open problems where contribution from the statistics community will be most effective and important.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"18 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89933059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Central subspaces review: methods and applications","authors":"Sabrina A. Rodrigues, Richard Huggins, B. Liquet","doi":"10.1214/22-ss138","DOIUrl":"https://doi.org/10.1214/22-ss138","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"6 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81827337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: The generation of random sequences is the basis of simulation and can be used in many different areas such as Statistics, Computer Science, Systems Management and Control, Biology, Particle Physics, Cryp- tography or Cyber-Security, among others. It is crucial that the numbers generated were random or at least, behave as such. The fundamental sta- tistical properties required for such sequences are randomness and independence and, from a cryptographic perspective, unpredictability. There is a variety of methods to generate these sequences. The main ones are physical and arithmetic methods. In this work, a detailed study of the main arith- metic methods is carried out. On the other hand, the necessity of secure sequence generation will be analyzed and new lines of ongoing research fo- cusing applications in Internet of Things and new generator designs will be described.
{"title":"A brief and understandable guide to pseudo-random number generators and specific models for security","authors":"Elena Almaraz Luengo","doi":"10.1214/22-ss136","DOIUrl":"https://doi.org/10.1214/22-ss136","url":null,"abstract":": The generation of random sequences is the basis of simulation and can be used in many different areas such as Statistics, Computer Science, Systems Management and Control, Biology, Particle Physics, Cryp- tography or Cyber-Security, among others. It is crucial that the numbers generated were random or at least, behave as such. The fundamental sta- tistical properties required for such sequences are randomness and independence and, from a cryptographic perspective, unpredictability. There is a variety of methods to generate these sequences. The main ones are physical and arithmetic methods. In this work, a detailed study of the main arith- metic methods is carried out. On the other hand, the necessity of secure sequence generation will be analyzed and new lines of ongoing research fo- cusing applications in Internet of Things and new generator designs will be described.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"16 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74907342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.
{"title":"Post-model-selection inference in linear regression models: An integrated review","authors":"Dongliang Zhang, Abbas Khalili, M. Asgharian","doi":"10.1214/22-ss135","DOIUrl":"https://doi.org/10.1214/22-ss135","url":null,"abstract":"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"1 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83355813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Many applications produce multiway data of exceedingly high dimension. Modeling such multi-way data is important in multichannel signal and video processing where sensors produce multi-indexed data, e.g. over spatial, frequency, and temporal dimensions. We will address the challenges of covariance representation of multiway data and review some of the progress in statistical modeling of multiway covariance over the past two decades, focusing on tensor-valued covariance models and their infer- ence. We will illustrate through a space weather application: predicting the evolution of solar active regions over time.
{"title":"Kronecker-structured covariance models for multiway data","authors":"Yu Wang, Zeyu Sun, Dogyoon Song, A. Hero","doi":"10.1214/22-ss139","DOIUrl":"https://doi.org/10.1214/22-ss139","url":null,"abstract":": Many applications produce multiway data of exceedingly high dimension. Modeling such multi-way data is important in multichannel signal and video processing where sensors produce multi-indexed data, e.g. over spatial, frequency, and temporal dimensions. We will address the challenges of covariance representation of multiway data and review some of the progress in statistical modeling of multiway covariance over the past two decades, focusing on tensor-valued covariance models and their infer- ence. We will illustrate through a space weather application: predicting the evolution of solar active regions over time.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"8 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73403080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"General-purpose imputation of planned missing data in social surveys: Different strategies and their effect on correlations","authors":"Julian B. Axenfeld, Christiane Bruch, C. Wolf","doi":"10.1214/22-ss137","DOIUrl":"https://doi.org/10.1214/22-ss137","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"64 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87026410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nested sampling (NS) computes parameter posterior distributions and makes Bayesian model comparison computationally feasible. Its strengths are the unsupervised navigation of complex, potentially multi-modal posteriors until a well-defined termination point. A systematic literature review of nested sampling algorithms and variants is presented. We focus on complete algorithms, including solutions to likelihood-restricted prior sampling, parallelisation, termination and diagnostics. The relation between number of live points, dimensionality and computational cost is studied for two complete algorithms. A new formulation of NS is presented, which casts the parameter space exploration as a search on a tree data structure. Previously published ways of obtaining robust error estimates and dynamic variations of the number of live points are presented as special cases of this formulation. A new online diagnostic test is presented based on previous insertion rank order work. The survey of nested sampling methods concludes with outlooks for future research.
{"title":"Nested sampling methods","authors":"J. Buchner","doi":"10.1214/23-SS144","DOIUrl":"https://doi.org/10.1214/23-SS144","url":null,"abstract":"Nested sampling (NS) computes parameter posterior distributions and makes Bayesian model comparison computationally feasible. Its strengths are the unsupervised navigation of complex, potentially multi-modal posteriors until a well-defined termination point. A systematic literature review of nested sampling algorithms and variants is presented. We focus on complete algorithms, including solutions to likelihood-restricted prior sampling, parallelisation, termination and diagnostics. The relation between number of live points, dimensionality and computational cost is studied for two complete algorithms. A new formulation of NS is presented, which casts the parameter space exploration as a search on a tree data structure. Previously published ways of obtaining robust error estimates and dynamic variations of the number of live points are presented as special cases of this formulation. A new online diagnostic test is presented based on previous insertion rank order work. The survey of nested sampling methods concludes with outlooks for future research.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"42 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77648271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is often useful to conduct inference for probability densities by constructing “plausible” sets in which the unknown density of given data may lie. Examples of such sets include pointwise intervals, simultaneous bands, or balls in a function space, and they may be frequentist or Bayesian in interpretation. For almost any density estimator, there are multiple approaches to inference available in the literature. Here we review such literature, providing a thorough overview of existing methods for density uncertainty quantification. The literature considered here comprises a spectrum from theoretical to practical ideas, and for some methods there is little commonality between these two extremes. After detailing some of the key concepts of nonparametric inference – the different types of “plausible” sets, and their interpretation and behaviour – we list the most prominent density estimators and the corresponding uncertainty quantification methods for each.
{"title":"A review of uncertainty quantification for density estimation","authors":"Shaun McDonald, D. Campbell","doi":"10.1214/21-SS130","DOIUrl":"https://doi.org/10.1214/21-SS130","url":null,"abstract":"It is often useful to conduct inference for probability densities by constructing “plausible” sets in which the unknown density of given data may lie. Examples of such sets include pointwise intervals, simultaneous bands, or balls in a function space, and they may be frequentist or Bayesian in interpretation. For almost any density estimator, there are multiple approaches to inference available in the literature. Here we review such literature, providing a thorough overview of existing methods for density uncertainty quantification. The literature considered here comprises a spectrum from theoretical to practical ideas, and for some methods there is little commonality between these two extremes. After detailing some of the key concepts of nonparametric inference – the different types of “plausible” sets, and their interpretation and behaviour – we list the most prominent density estimators and the corresponding uncertainty quantification methods for each.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"20 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81374701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}