Pub Date : 2024-10-30DOI: 10.1146/annurev-statistics-033121-120121
M. Elizabeth Halloran
Due to dependent happenings, vaccines can have different effects in populations. In addition to direct protective effects in the vaccinated, vaccination in a population can have indirect effects in the unvaccinated individuals. Vaccination can also reduce person-to-person transmission to vaccinated individuals or from vaccinated individuals compared with unvaccinated individuals. Design of vaccine studies has a history extending back over a century. Emerging infectious diseases, such as the SARS-CoV-2 pandemic and the Ebola outbreak in West Africa, have stimulated new interest in vaccine studies. We focus on some recent developments, such as target trial emulation, test-negative design, and regression discontinuity design. Methods for evaluating durability of vaccine effects were developed in the context of both blinded and unblinded placebo crossover studies. The case-ascertained design is used to assess the transmission effects of vaccines. The novel ring vaccination trial design was first used in the Ebola outbreak in West Africa.
{"title":"Designs for Vaccine Studies","authors":"M. Elizabeth Halloran","doi":"10.1146/annurev-statistics-033121-120121","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-120121","url":null,"abstract":"Due to dependent happenings, vaccines can have different effects in populations. In addition to direct protective effects in the vaccinated, vaccination in a population can have indirect effects in the unvaccinated individuals. Vaccination can also reduce person-to-person transmission to vaccinated individuals or from vaccinated individuals compared with unvaccinated individuals. Design of vaccine studies has a history extending back over a century. Emerging infectious diseases, such as the SARS-CoV-2 pandemic and the Ebola outbreak in West Africa, have stimulated new interest in vaccine studies. We focus on some recent developments, such as target trial emulation, test-negative design, and regression discontinuity design. Methods for evaluating durability of vaccine effects were developed in the context of both blinded and unblinded placebo crossover studies. The case-ascertained design is used to assess the transmission effects of vaccines. The novel ring vaccination trial design was first used in the Ebola outbreak in West Africa.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"5 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1146/annurev-statistics-112723-034158
Weijie J. Su
Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Although differential privacy originated in the cryptographic context, in this review we argue that, fundamentally, it can be considered a pure statistical concept. We leverage Blackwell's informativeness theorem and focus on demonstrating that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of f-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and US Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared with existing alternatives.
差分隐私因其稳健而严格的保证,被广泛认为是隐私保护数据分析的正式隐私,在公共服务、学术界和工业界得到越来越广泛的应用。虽然差分隐私起源于密码学,但在本综述中,我们认为从根本上讲,它可以被视为一个纯粹的统计学概念。我们利用布莱克韦尔(Blackwell)的信息性定理,重点论证了差分隐私的定义可以从假设检验的角度正式提出,从而表明假设检验不仅方便,而且是推理差分隐私的正确语言。这一见解引出了 f 差分隐私的定义,它通过表示定理扩展了其他差分隐私定义。我们回顾了一些技术,这些技术使 f 差分隐私成为分析数据分析和机器学习中隐私边界的统一框架。我们讨论了这种差分隐私定义在私有深度学习、私有凸优化、洗牌机制和美国人口普查数据中的应用,以突出与现有替代方法相比,在此框架下分析隐私边界的优势。
{"title":"A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation, and Blackwell's Theorem","authors":"Weijie J. Su","doi":"10.1146/annurev-statistics-112723-034158","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034158","url":null,"abstract":"Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Although differential privacy originated in the cryptographic context, in this review we argue that, fundamentally, it can be considered a pure statistical concept. We leverage Blackwell's informativeness theorem and focus on demonstrating that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of f-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and US Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared with existing alternatives.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"55 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-09DOI: 10.1146/annurev-statistics-112723-034436
Mine Dogucu
Difficulties in reproducing results from scientific studies have lately been referred to as a reproducibility crisis. Scientific practice depends heavily on scientific training. What gets taught in the classroom is often practiced in labs, fields, and data analysis. The importance of reproducibility in the classroom has gained momentum in statistics education in recent years. In this article, we review the existing literature on reproducibility education. We delve into the relationship between computing tools and reproducibility through visiting historical developments in this area. We share examples for teaching reproducibility and reproducible teaching while discussing the pedagogical opportunities created by these examples as well as challenges that the instructors should be aware of. We detail the use of teaching reproducibility and reproducible teaching practices in an introductory data science course. Lastly, we provide recommendations on reproducibility education for instructors, administrators, and other members of the scientific community.
{"title":"Reproducibility in the Classroom","authors":"Mine Dogucu","doi":"10.1146/annurev-statistics-112723-034436","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034436","url":null,"abstract":"Difficulties in reproducing results from scientific studies have lately been referred to as a reproducibility crisis. Scientific practice depends heavily on scientific training. What gets taught in the classroom is often practiced in labs, fields, and data analysis. The importance of reproducibility in the classroom has gained momentum in statistics education in recent years. In this article, we review the existing literature on reproducibility education. We delve into the relationship between computing tools and reproducibility through visiting historical developments in this area. We share examples for teaching reproducibility and reproducible teaching while discussing the pedagogical opportunities created by these examples as well as challenges that the instructors should be aware of. We detail the use of teaching reproducibility and reproducible teaching practices in an introductory data science course. Lastly, we provide recommendations on reproducibility education for instructors, administrators, and other members of the scientific community.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"2 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142398006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-07DOI: 10.1146/annurev-statistics-112723-034249
Simon N. Wood
Generalized additive models are generalized linear models in which the linear predictor includes a sum of smooth functions of covariates, where the shape of the functions is to be estimated. They have also been generalized beyond the original generalized linear model setting to distributions outside the exponential family and to situations in which multiple parameters of the response distribution may depend on sums of smooth functions of covariates. The widely used computational and inferential framework in which the smooth terms are represented as latent Gaussian processes, splines, or Gaussian random effects is reviewed, paying particular attention to the case in which computational and theoretical tractability is obtained by prior rank reduction of the model terms. An empirical Bayes approach is taken, and its relatively good frequentist performance discussed, along with some more overtly frequentist approaches to model selection. Estimation of the degree of smoothness of component functions via cross validation or marginal likelihood is covered, alongside the computational strategies required in practice, including when data and models are reasonably large. It is briefly shown how the framework extends easily to location-scale modeling, and, with more effort, to techniques such as quantile regression. Also covered are the main classes of smooths of multiple covariates that may be included in models: isotropic splines and tensor product smooth interaction terms.
{"title":"Generalized Additive Models","authors":"Simon N. Wood","doi":"10.1146/annurev-statistics-112723-034249","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034249","url":null,"abstract":"Generalized additive models are generalized linear models in which the linear predictor includes a sum of smooth functions of covariates, where the shape of the functions is to be estimated. They have also been generalized beyond the original generalized linear model setting to distributions outside the exponential family and to situations in which multiple parameters of the response distribution may depend on sums of smooth functions of covariates. The widely used computational and inferential framework in which the smooth terms are represented as latent Gaussian processes, splines, or Gaussian random effects is reviewed, paying particular attention to the case in which computational and theoretical tractability is obtained by prior rank reduction of the model terms. An empirical Bayes approach is taken, and its relatively good frequentist performance discussed, along with some more overtly frequentist approaches to model selection. Estimation of the degree of smoothness of component functions via cross validation or marginal likelihood is covered, alongside the computational strategies required in practice, including when data and models are reasonably large. It is briefly shown how the framework extends easily to location-scale modeling, and, with more effort, to techniques such as quantile regression. Also covered are the main classes of smooths of multiple covariates that may be included in models: isotropic splines and tensor product smooth interaction terms.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"39 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1146/annurev-statistics-112723-034642
Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger
Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, or articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulatory phonetics, speech inversion, sound change, and speech technology), an overview of the statistical methods for analyzing their data, and an introduction to the signal processing methods commonly applied to speech recordings. A major transition in the statistical modeling of phonetic data has been the shift from fixed effects to random effects regression models, the modeling of curve data (for instance, via generalized additive mixed models or functional data analysis methods), and the use of Bayesian methods. This shift has been driven in part by the increased focus on large speech corpora in phonetics, which has arisen from machine learning methods such as forced alignment. We conclude by identifying opportunities for future research.
{"title":"Statistics in Phonetics","authors":"Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger","doi":"10.1146/annurev-statistics-112723-034642","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034642","url":null,"abstract":"Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, or articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulatory phonetics, speech inversion, sound change, and speech technology), an overview of the statistical methods for analyzing their data, and an introduction to the signal processing methods commonly applied to speech recordings. A major transition in the statistical modeling of phonetic data has been the shift from fixed effects to random effects regression models, the modeling of curve data (for instance, via generalized additive mixed models or functional data analysis methods), and the use of Bayesian methods. This shift has been driven in part by the increased focus on large speech corpora in phonetics, which has arisen from machine learning methods such as forced alignment. We conclude by identifying opportunities for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"32 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1146/annurev-statistics-112723-034304
Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre
The Hawkes process is a model for counting the number of arrivals to a system that exhibits the self-exciting property—that one arrival creates a heightened chance of further arrivals in the near future. The model and its generalizations have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed that track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This article creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction and simulation algorithms, and giving key references to the appropriate literature for a detailed treatment.
{"title":"Hawkes Models and Their Applications","authors":"Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre","doi":"10.1146/annurev-statistics-112723-034304","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034304","url":null,"abstract":"The Hawkes process is a model for counting the number of arrivals to a system that exhibits the self-exciting property—that one arrival creates a heightened chance of further arrivals in the near future. The model and its generalizations have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed that track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This article creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction and simulation algorithms, and giving key references to the appropriate literature for a detailed treatment.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"58 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-26DOI: 10.1146/annurev-statistics-112723-034721
Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small
Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that (a) is associated with the exposure, (b) has no direct effect on the outcome except through the exposure, and (c) is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditions b or c can be challenging in practice. This article reviews works where instruments may not satisfy conditions b or c, which we refer to as invalid instruments. We review identification and inference under different violations of b or c, specifically under linear models, nonlinear models, and heteroskedastic models. We conclude with an empirical comparison of various methods by reanalyzing the effect of body mass index on systolic blood pressure from the UK Biobank.
工具变量(IVs)被广泛用于研究在存在未测量混杂因素的情况下暴露对结果的因果效应。工具变量需要一个工具,一个(a)与暴露相关的变量,(b)除了通过暴露对结果没有直接影响的变量,以及(c)与未测量混杂因素无关的变量。遗憾的是,要找到满足条件 b 或 c 的变量在实践中可能很困难。本文回顾了工具可能不满足条件 b 或 c 的研究,我们称之为无效工具。我们回顾了不同的 b 或 c 条件下的识别和推断,特别是线性模型、非线性模型和异方差模型。最后,我们通过重新分析英国生物库中体重指数对收缩压的影响,对各种方法进行了实证比较。
{"title":"Identification and Inference with Invalid Instruments","authors":"Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small","doi":"10.1146/annurev-statistics-112723-034721","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034721","url":null,"abstract":"Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that (a) is associated with the exposure, (b) has no direct effect on the outcome except through the exposure, and (c) is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditions b or c can be challenging in practice. This article reviews works where instruments may not satisfy conditions b or c, which we refer to as invalid instruments. We review identification and inference under different violations of b or c, specifically under linear models, nonlinear models, and heteroskedastic models. We conclude with an empirical comparison of various methods by reanalyzing the effect of body mass index on systolic blood pressure from the UK Biobank.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"735 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1146/annurev-statistics-040522-100329
Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo
The emergence of functional magnetic resonance imaging (fMRI) marked a significant technological breakthrough in the real-time measurement of the functioning human brain in vivo. In part because of their 4D nature (three spatial dimensions and time), fMRI data have inspired a great deal of statistical development in the past couple of decades to address their unique spatiotemporal properties. This article provides an overview of the current landscape in functional brain measurement, with a particular focus on fMRI, highlighting key developments in the past decade. Furthermore, it looks ahead to the future, discussing unresolved research questions in the community and outlining potential research topics for the future.
{"title":"Measuring the Functioning Human Brain","authors":"Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo","doi":"10.1146/annurev-statistics-040522-100329","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-100329","url":null,"abstract":"The emergence of functional magnetic resonance imaging (fMRI) marked a significant technological breakthrough in the real-time measurement of the functioning human brain in vivo. In part because of their 4D nature (three spatial dimensions and time), fMRI data have inspired a great deal of statistical development in the past couple of decades to address their unique spatiotemporal properties. This article provides an overview of the current landscape in functional brain measurement, with a particular focus on fMRI, highlighting key developments in the past decade. Furthermore, it looks ahead to the future, discussing unresolved research questions in the community and outlining potential research topics for the future.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"50 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1146/annurev-statistics-112723-034315
Mengyun Wu, Yingmeng Li, Shuangge Ma
Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.
{"title":"High-Dimensional Gene–Environment Interaction Analysis","authors":"Mengyun Wu, Yingmeng Li, Shuangge Ma","doi":"10.1146/annurev-statistics-112723-034315","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034315","url":null,"abstract":"Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"28 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1146/annurev-statistics-112723-034446
Po-Ling Loh
Robust statistics is a fairly mature field that dates back to the early 1960s, with many foundational concepts having been developed in the ensuing decades. However, the field has drawn a new surge of attention in the past decade, largely due to a desire to recast robust statistical principles in the context of high-dimensional statistics. In this article, we begin by reviewing some of the central ideas in classical robust statistics. We then discuss the need for new theory in high dimensions, using recent work in high-dimensional M-estimation as an illustrative example. Next, we highlight a variety of interesting recent topics that have drawn a flurry of research activity from both statisticians and theoretical computer scientists, demonstrating the need for further research in robust estimation that embraces new estimation and contamination settings, as well as a greater emphasis on computational tractability in high dimensions.
稳健统计是一个相当成熟的领域,可追溯到 20 世纪 60 年代初,许多基础概念是在随后的几十年中发展起来的。然而,在过去的十年中,该领域吸引了新一轮的关注,这主要是由于人们希望在高维统计的背景下重塑稳健统计原理。在本文中,我们首先回顾了经典稳健统计的一些核心思想。然后,我们以最近在高维 M 估计方面的研究为例,讨论了在高维领域对新理论的需求。接下来,我们将重点介绍近期吸引了统计学家和理论计算机科学家的大量研究活动的各种有趣课题,这表明我们需要进一步研究稳健估计,包括新的估计和污染设置,以及更加重视高维度的计算可操作性。
{"title":"A Theoretical Review of Modern Robust Statistics","authors":"Po-Ling Loh","doi":"10.1146/annurev-statistics-112723-034446","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034446","url":null,"abstract":"Robust statistics is a fairly mature field that dates back to the early 1960s, with many foundational concepts having been developed in the ensuing decades. However, the field has drawn a new surge of attention in the past decade, largely due to a desire to recast robust statistical principles in the context of high-dimensional statistics. In this article, we begin by reviewing some of the central ideas in classical robust statistics. We then discuss the need for new theory in high dimensions, using recent work in high-dimensional <jats:italic>M</jats:italic>-estimation as an illustrative example. Next, we highlight a variety of interesting recent topics that have drawn a flurry of research activity from both statisticians and theoretical computer scientists, demonstrating the need for further research in robust estimation that embraces new estimation and contamination settings, as well as a greater emphasis on computational tractability in high dimensions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"15 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142022156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}