Pub Date : 2023-11-17DOI: 10.1146/annurev-statistics-040120-030919
Genevera I. Allen, Luqin Gan, Lili Zheng
New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities","authors":"Genevera I. Allen, Luqin Gan, Lili Zheng","doi":"10.1146/annurev-statistics-040120-030919","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040120-030919","url":null,"abstract":"New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 9","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138293187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.1146/annurev-statistics-033121-114601
Guido W. Imbens
Knowledge of causal effects is of great importance to decision makers in a wide variety of settings. In many cases, however, these causal effects are not known to the decision makers and need to be estimated from data. This fundamental problem has been known and studied for many years in many disciplines. In the past thirty years, however, the amount of empirical as well as methodological research in this area has increased dramatically, and so has its scope. It has become more interdisciplinary, and the focus has been more specifically on methods for credibly estimating causal effects in a wide range of both experimental and observational settings. This work has greatly impacted empirical work in the social and biomedical sciences. In this article, I review some of this work and discuss open questions.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Causal Inference in the Social Sciences","authors":"Guido W. Imbens","doi":"10.1146/annurev-statistics-033121-114601","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-114601","url":null,"abstract":"Knowledge of causal effects is of great importance to decision makers in a wide variety of settings. In many cases, however, these causal effects are not known to the decision makers and need to be estimated from data. This fundamental problem has been known and studied for many years in many disciplines. In the past thirty years, however, the amount of empirical as well as methodological research in this area has increased dramatically, and so has its scope. It has become more interdisciplinary, and the focus has been more specifically on methods for credibly estimating causal effects in a wide range of both experimental and observational settings. This work has greatly impacted empirical work in the social and biomedical sciences. In this article, I review some of this work and discuss open questions.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 8","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138293188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.1146/annurev-statistics-040522-021241
Ling Zhou, Ziyang Gong, Pengcheng Xiang
Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods—those in which all datasets are stored and processed in a central computing facility—are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Distributed Computing and Inference for Big Data","authors":"Ling Zhou, Ziyang Gong, Pengcheng Xiang","doi":"10.1146/annurev-statistics-040522-021241","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-021241","url":null,"abstract":"Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods—those in which all datasets are stored and processed in a central computing facility—are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 7","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138293189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-06DOI: 10.1146/annurev-statistics-040522-093748
Javier Carrón Duque, Domenico Marinucci
This review is devoted to recent developments in the statistical analysis of spherical data, strongly motivated by applications in cosmology. We start from a brief discussion of cosmological questions and motivations, arguing that most cosmological observables are spherical random fields. Then, we introduce some mathematical background on spherical random fields, including spectral representations and the construction of needlet and wavelet frames. We then focus on some specific issues, including tools and algorithms for map reconstruction (i.e., separating the different physical components that contribute to the observed field), geometric tools for testing the assumptions of Gaussianity and isotropy, and multiple testing methods to detect contamination in the field due to point sources. Although these tools are introduced in the cosmological context, they can be applied to other situations dealing with spherical data. Finally, we discuss more recent and challenging issues, such as the analysis of polarization data, which can be viewed as realizations of random fields taking values in spin fiber bundles.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Geometric Methods for Cosmological Data on the Sphere","authors":"Javier Carrón Duque, Domenico Marinucci","doi":"10.1146/annurev-statistics-040522-093748","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-093748","url":null,"abstract":"This review is devoted to recent developments in the statistical analysis of spherical data, strongly motivated by applications in cosmology. We start from a brief discussion of cosmological questions and motivations, arguing that most cosmological observables are spherical random fields. Then, we introduce some mathematical background on spherical random fields, including spectral representations and the construction of needlet and wavelet frames. We then focus on some specific issues, including tools and algorithms for map reconstruction (i.e., separating the different physical components that contribute to the observed field), geometric tools for testing the assumptions of Gaussianity and isotropy, and multiple testing methods to detect contamination in the field due to point sources. Although these tools are introduced in the cosmological context, they can be applied to other situations dealing with spherical data. Finally, we discuss more recent and challenging issues, such as the analysis of polarization data, which can be viewed as realizations of random fields taking values in spin fiber bundles.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"11 3","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71473809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1146/annurev-statistics-040622-023838
Paul J. Northrop
Rainfall is the main input to most hydrological systems. To assess flood risk for a catchment area, hydrologists use models that require long series of subdaily, perhaps even subhourly, rainfall data, ideally from locations that cover the area. If historical data are not sufficient for this purpose, an alternative is to simulate synthetic data from a suitably calibrated model. We review stochastic models that have a mechanistic structure, intended to mimic physical features of the rainfall processes, and are constructed using stationary point processes. We describe models for temporal and spatial-temporal rainfall and consider how they can be fitted to data. We provide an example application using a temporal model and an illustration of data simulated from a spatial-temporal model. We discuss how these models can contribute to the simulation of future rainfall that reflects our changing climate.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Stochastic Models of Rainfall","authors":"Paul J. Northrop","doi":"10.1146/annurev-statistics-040622-023838","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040622-023838","url":null,"abstract":"Rainfall is the main input to most hydrological systems. To assess flood risk for a catchment area, hydrologists use models that require long series of subdaily, perhaps even subhourly, rainfall data, ideally from locations that cover the area. If historical data are not sufficient for this purpose, an alternative is to simulate synthetic data from a suitably calibrated model. We review stochastic models that have a mechanistic structure, intended to mimic physical features of the rainfall processes, and are constructed using stationary point processes. We describe models for temporal and spatial-temporal rainfall and consider how they can be fitted to data. We provide an example application using a temporal model and an illustration of data simulated from a spatial-temporal model. We discuss how these models can contribute to the simulation of future rainfall that reflects our changing climate.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"88 23","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71435578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1146/annurev-statistics-040722-043616
Mushan Li, Yanyuan Ma
The issues caused by measurement errors have been recognized for almost 90 years, and research in this area has flourished since the 1980s. We review some of the classical methods in both density estimation and regression problems with measurement errors. In both problems, we consider when the original error-free model is parametric, nonparametric, and semiparametric, in combination with different error types. We also summarize and explain some new approaches, including recent developments and challenges in the high-dimensional setting.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"An Update on Measurement Error Modeling","authors":"Mushan Li, Yanyuan Ma","doi":"10.1146/annurev-statistics-040722-043616","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040722-043616","url":null,"abstract":"The issues caused by measurement errors have been recognized for almost 90 years, and research in this area has flourished since the 1980s. We review some of the classical methods in both density estimation and regression problems with measurement errors. In both problems, we consider when the original error-free model is parametric, nonparametric, and semiparametric, in combination with different error types. We also summarize and explain some new approaches, including recent developments and challenges in the high-dimensional setting.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 14","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50164713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1146/annurev-statistics-040522-120734
Christine B. Peterson, Satabdi Saha, Kim-Anh Do
The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Analysis of Microbiome Data","authors":"Christine B. Peterson, Satabdi Saha, Kim-Anh Do","doi":"10.1146/annurev-statistics-040522-120734","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-120734","url":null,"abstract":"The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 16","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50164711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1146/annurev-statistics-040722-053607
Nadja Klein
Flexible modeling of how an entire distribution changes with covariates is an important yet challenging generalization of mean-based regression that has seen growing interest over the past decades in both the statistics and machine learning literature. This review outlines selected state-of-the-art statistical approaches to distributional regression, complemented with alternatives from machine learning. Topics covered include the similarities and differences between these approaches, extensions, properties and limitations, estimation procedures, and the availability of software. In view of the increasing complexity and availability of large-scale data, this review also discusses the scalability of traditional estimation methods, current trends, and open challenges. Illustrations are provided using data on childhood malnutrition in Nigeria and Australian electricity prices.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Distributional Regression for Data Analysis","authors":"Nadja Klein","doi":"10.1146/annurev-statistics-040722-053607","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040722-053607","url":null,"abstract":"Flexible modeling of how an entire distribution changes with covariates is an important yet challenging generalization of mean-based regression that has seen growing interest over the past decades in both the statistics and machine learning literature. This review outlines selected state-of-the-art statistical approaches to distributional regression, complemented with alternatives from machine learning. Topics covered include the similarities and differences between these approaches, extensions, properties and limitations, estimation procedures, and the availability of software. In view of the increasing complexity and availability of large-scale data, this review also discusses the scalability of traditional estimation methods, current trends, and open challenges. Illustrations are provided using data on childhood malnutrition in Nigeria and Australian electricity prices.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"88 22","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71435579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1146/annurev-statistics-040622-033806
Zois Boukouvalas, Allison Shafer
With the evolution of social media, cyberspace has become the default medium for social media users to communicate, especially during high-impact events such as pandemics, natural disasters, terrorist attacks, and periods of political unrest. However, during such events, misinformation can spread rapidly on social media, affecting decision-making and creating social unrest. Identifying and curtailing the spread of misinformation during high-impact events are significant data challenges given the scarcity and variety of the data, the speed by which misinformation can propagate, and the fairness aspects associated with this societal problem. Recent statistical machine learning advances have shown promise for misinformation detection; however, key limitations still make this a significant challenge. These limitations relate to using representative and bias-free multimodal data and to the explainability, fairness, and reliable performance of a system that detects misinformation. In this article, we critically discuss the current state-of-the-art approaches that attempt to respond to these complex requirements and present major unsolved issues; future research directions; and the synergies among statistics, data science, and other sciences for detecting misinformation.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Role of Statistics in Detecting Misinformation: A Review of the State of the Art, Open Issues, and Future Research Directions","authors":"Zois Boukouvalas, Allison Shafer","doi":"10.1146/annurev-statistics-040622-033806","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040622-033806","url":null,"abstract":"With the evolution of social media, cyberspace has become the default medium for social media users to communicate, especially during high-impact events such as pandemics, natural disasters, terrorist attacks, and periods of political unrest. However, during such events, misinformation can spread rapidly on social media, affecting decision-making and creating social unrest. Identifying and curtailing the spread of misinformation during high-impact events are significant data challenges given the scarcity and variety of the data, the speed by which misinformation can propagate, and the fairness aspects associated with this societal problem. Recent statistical machine learning advances have shown promise for misinformation detection; however, key limitations still make this a significant challenge. These limitations relate to using representative and bias-free multimodal data and to the explainability, fairness, and reliable performance of a system that detects misinformation. In this article, we critically discuss the current state-of-the-art approaches that attempt to respond to these complex requirements and present major unsolved issues; future research directions; and the synergies among statistics, data science, and other sciences for detecting misinformation.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 15","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50164712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1146/annurev-statistics-033021-014937
Lutz Dümbgen
Statistical models defined by shape constraints are a valuable alternative to parametric models or nonparametric models defined in terms of quantitative smoothness constraints. While the latter two classes of models are typically difficult to justify a priori, many applications involve natural shape constraints, for instance, monotonicity of a density or regression function. We review some of the history of this subject and recent developments, with special emphasis on algorithmic aspects, adaptivity, honest confidence bands for shape-constrained curves, and distributional regression, i.e., inference about the conditional distribution of a real-valued response given certain covariates.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"Shape-Constrained Statistical Inference","authors":"Lutz Dümbgen","doi":"10.1146/annurev-statistics-033021-014937","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033021-014937","url":null,"abstract":"Statistical models defined by shape constraints are a valuable alternative to parametric models or nonparametric models defined in terms of quantitative smoothness constraints. While the latter two classes of models are typically difficult to justify a priori, many applications involve natural shape constraints, for instance, monotonicity of a density or regression function. We review some of the history of this subject and recent developments, with special emphasis on algorithmic aspects, adaptivity, honest confidence bands for shape-constrained curves, and distributional regression, i.e., inference about the conditional distribution of a real-valued response given certain covariates.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 18","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50164709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}