Patrick Graham, Lucianne Varn, Matthew Hendtlass, Rebecca Green, Andrew Richens
{"title":"Bayesian dual systems population estimation for small domains","authors":"Patrick Graham, Lucianne Varn, Matthew Hendtlass, Rebecca Green, Andrew Richens","doi":"10.1214/23-ss146","DOIUrl":"https://doi.org/10.1214/23-ss146","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139638618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ross A. Maller, Sidney Resnick, Soudabeh Shemehsavar, Muzhi Zhao
{"title":"Mixture cure model methodology in survival analysis: Some recent results for the one-sample case","authors":"Ross A. Maller, Sidney Resnick, Soudabeh Shemehsavar, Muzhi Zhao","doi":"10.1214/24-ss147","DOIUrl":"https://doi.org/10.1214/24-ss147","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140525664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"White noise testing for functional time series","authors":"Mihyun Kim, P. Kokoszka, Gregory Rice","doi":"10.1214/23-ss143","DOIUrl":"https://doi.org/10.1214/23-ss143","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89270770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01Epub Date: 2023-01-17DOI: 10.1214/22-SS140
Trang Quynh Nguyen, Elizabeth L Ogburn, Ian Schmid, Elizabeth B Sarker, Noah Greifer, Ina M Koning, Elizabeth A Stuart
This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help build intuitive appreciation for robust estimation that is conducive to sound practice. We do this by visualizing the target estimand and the estimation strategies. A second goal is to provide a "menu" of estimators that practitioners can choose from for the estimation of marginal natural (in)direct effects. The estimators generated from this exercise include some that coincide or are similar to existing estimators and others that have not previously appeared in the literature. We note several different ways to estimate the weights for cross-world weighting based on three expressions of the weighting function, including one that is novel; and show how to check the resulting covariate and mediator balance. We use a random continuous weights bootstrap to obtain confidence intervals, and also derive general asymptotic variance formulas for the estimators. The estimators are illustrated using data from an adolescent alcohol use prevention study. R-code is provided.
本文旨在让因果中介分析从业者更好地了解估算选项。我们将两种熟悉的策略(加权和基于模型的预测)和一种简单的组合方法(加权模型)作为输入,并展示了如何根据不同的建模要求和稳健性属性生成一系列估计器。主要目标是帮助建立对稳健估算的直观认识,以利于合理实践。为此,我们将目标估计值和估计策略可视化。第二个目标是提供一个估算器 "菜单",供实践者在估算边际自然(内)直接效应时选择。从这项工作中产生的估算器包括一些与现有估算器相吻合或相似的估算器,以及一些以前未在文献中出现过的估算器。我们指出了几种基于加权函数三种表达式的不同方法来估计跨世界加权的权重,其中包括一种新颖的方法;并展示了如何检查所得到的协变量和中介变量的平衡。我们使用随机连续权重引导法获得置信区间,并推导出估计器的一般渐近方差公式。我们使用一项青少年酒精使用预防研究的数据对估计器进行了说明。提供 R 代码。
{"title":"Causal mediation analysis: From simple to more robust strategies for estimation of marginal natural (in)direct effects.","authors":"Trang Quynh Nguyen, Elizabeth L Ogburn, Ian Schmid, Elizabeth B Sarker, Noah Greifer, Ina M Koning, Elizabeth A Stuart","doi":"10.1214/22-SS140","DOIUrl":"10.1214/22-SS140","url":null,"abstract":"<p><p>This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help build intuitive appreciation for robust estimation that is conducive to sound practice. We do this by visualizing the target estimand and the estimation strategies. A second goal is to provide a \"menu\" of estimators that practitioners can choose from for the estimation of marginal natural (in)direct effects. The estimators generated from this exercise include some that coincide or are similar to existing estimators and others that have not previously appeared in the literature. We note several different ways to estimate the weights for cross-world weighting based on three expressions of the weighting function, including one that is novel; and show how to check the resulting covariate and mediator balance. We use a random continuous weights bootstrap to obtain confidence intervals, and also derive general asymptotic variance formulas for the estimators. The estimators are illustrated using data from an adolescent alcohol use prevention study. R-code is provided.</p>","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11052605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76943435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work reviews the literature on spline local basis methods for non-parametric density estimation. Particular attention is paid to B-spline density estimators which have experienced recent advances in both theory and methodology. These estimators occupy a very interesting space in statistics, which lies aptly at the cross-section of numerous statistical frameworks. New insights, experiments, and analyses are presented to cast the various estimation concepts in a unified context, while parallels and contrasts are drawn to the more familiar contexts of kernel density estimation. Unlike kernel density estimation, the study of local basis estimation is not yet fully mature, and this work also aims to highlight the gaps in existing literature which merit further investigation.
{"title":"Spline local basis methods for nonparametric density estimation","authors":"J. Lars Kirkby, Álvaro Leitao, Duy Nguyen","doi":"10.1214/23-ss142","DOIUrl":"https://doi.org/10.1214/23-ss142","url":null,"abstract":"This work reviews the literature on spline local basis methods for non-parametric density estimation. Particular attention is paid to B-spline density estimators which have experienced recent advances in both theory and methodology. These estimators occupy a very interesting space in statistics, which lies aptly at the cross-section of numerous statistical frameworks. New insights, experiments, and analyses are presented to cast the various estimation concepts in a unified context, while parallels and contrasts are drawn to the more familiar contexts of kernel density estimation. Unlike kernel density estimation, the study of local basis estimation is not yet fully mature, and this work also aims to highlight the gaps in existing literature which merit further investigation.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135585314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many real-world networks are theorized to have core-periphery structure consisting of a densely-connected core and a loosely-connected periphery. While this phenomenon has been extensively studied in a range of scientific disciplines, it has not received sufficient attention in the statistics community. In this expository article, our goal is to raise awareness about this topic and encourage statisticians to address the many open inference problems in this area. To this end, we first summarize the current research landscape by reviewing the metrics and models that have been used for quantitative studies on core-periphery structure. Next, we formulate and explore various inferential problems in this context, such as estimation, hypothesis testing, and Bayesian inference, and discuss related computational techniques. We also outline the multidisciplinary scientific impact of core-periphery structure in a number of real-world networks. Throughout the article, we provide our own interpretation of the literature from a statistical perspective, with the goal of prioritizing open problems where contribution from the statistics community will be most effective and important.
{"title":"Core-periphery structure in networks: A statistical exposition","authors":"Eric Yanchenko, Srijan Sengupta","doi":"10.1214/23-ss141","DOIUrl":"https://doi.org/10.1214/23-ss141","url":null,"abstract":"Many real-world networks are theorized to have core-periphery structure consisting of a densely-connected core and a loosely-connected periphery. While this phenomenon has been extensively studied in a range of scientific disciplines, it has not received sufficient attention in the statistics community. In this expository article, our goal is to raise awareness about this topic and encourage statisticians to address the many open inference problems in this area. To this end, we first summarize the current research landscape by reviewing the metrics and models that have been used for quantitative studies on core-periphery structure. Next, we formulate and explore various inferential problems in this context, such as estimation, hypothesis testing, and Bayesian inference, and discuss related computational techniques. We also outline the multidisciplinary scientific impact of core-periphery structure in a number of real-world networks. Throughout the article, we provide our own interpretation of the literature from a statistical perspective, with the goal of prioritizing open problems where contribution from the statistics community will be most effective and important.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89933059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Central subspaces review: methods and applications","authors":"Sabrina A. Rodrigues, Richard Huggins, B. Liquet","doi":"10.1214/22-ss138","DOIUrl":"https://doi.org/10.1214/22-ss138","url":null,"abstract":"","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81827337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: The generation of random sequences is the basis of simulation and can be used in many different areas such as Statistics, Computer Science, Systems Management and Control, Biology, Particle Physics, Cryp- tography or Cyber-Security, among others. It is crucial that the numbers generated were random or at least, behave as such. The fundamental sta- tistical properties required for such sequences are randomness and independence and, from a cryptographic perspective, unpredictability. There is a variety of methods to generate these sequences. The main ones are physical and arithmetic methods. In this work, a detailed study of the main arith- metic methods is carried out. On the other hand, the necessity of secure sequence generation will be analyzed and new lines of ongoing research fo- cusing applications in Internet of Things and new generator designs will be described.
{"title":"A brief and understandable guide to pseudo-random number generators and specific models for security","authors":"Elena Almaraz Luengo","doi":"10.1214/22-ss136","DOIUrl":"https://doi.org/10.1214/22-ss136","url":null,"abstract":": The generation of random sequences is the basis of simulation and can be used in many different areas such as Statistics, Computer Science, Systems Management and Control, Biology, Particle Physics, Cryp- tography or Cyber-Security, among others. It is crucial that the numbers generated were random or at least, behave as such. The fundamental sta- tistical properties required for such sequences are randomness and independence and, from a cryptographic perspective, unpredictability. There is a variety of methods to generate these sequences. The main ones are physical and arithmetic methods. In this work, a detailed study of the main arith- metic methods is carried out. On the other hand, the necessity of secure sequence generation will be analyzed and new lines of ongoing research fo- cusing applications in Internet of Things and new generator designs will be described.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74907342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Many applications produce multiway data of exceedingly high dimension. Modeling such multi-way data is important in multichannel signal and video processing where sensors produce multi-indexed data, e.g. over spatial, frequency, and temporal dimensions. We will address the challenges of covariance representation of multiway data and review some of the progress in statistical modeling of multiway covariance over the past two decades, focusing on tensor-valued covariance models and their infer- ence. We will illustrate through a space weather application: predicting the evolution of solar active regions over time.
{"title":"Kronecker-structured covariance models for multiway data","authors":"Yu Wang, Zeyu Sun, Dogyoon Song, A. Hero","doi":"10.1214/22-ss139","DOIUrl":"https://doi.org/10.1214/22-ss139","url":null,"abstract":": Many applications produce multiway data of exceedingly high dimension. Modeling such multi-way data is important in multichannel signal and video processing where sensors produce multi-indexed data, e.g. over spatial, frequency, and temporal dimensions. We will address the challenges of covariance representation of multiway data and review some of the progress in statistical modeling of multiway covariance over the past two decades, focusing on tensor-valued covariance models and their infer- ence. We will illustrate through a space weather application: predicting the evolution of solar active regions over time.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73403080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.
{"title":"Post-model-selection inference in linear regression models: An integrated review","authors":"Dongliang Zhang, Abbas Khalili, M. Asgharian","doi":"10.1214/22-ss135","DOIUrl":"https://doi.org/10.1214/22-ss135","url":null,"abstract":"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83355813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}