首页 > 最新文献

Annual Review of Statistics and Its Application最新文献

英文 中文
Structure Assessment in Count Time Series 计数时间序列中的结构评估
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-29 DOI: 10.1146/annurev-statistics-042424-114518
Šárka Hudecová
Assessing the performance of an estimated model for count time series is critical for subsequent statistical inference and distributional forecasting. This article reviews the most commonly used count time series models and focuses on evaluating their goodness of fit. Various formal statistical tests are presented, along with useful graphical diagnostic tools. The methods are illustrated on a real data example.
评估计数时间序列估计模型的性能对于后续的统计推断和分布预测至关重要。本文回顾了最常用的计数时间序列模型,并着重于评估它们的拟合优度。介绍了各种正式的统计检验,以及有用的图形诊断工具。通过一个实际的数据实例说明了这些方法。
{"title":"Structure Assessment in Count Time Series","authors":"Šárka Hudecová","doi":"10.1146/annurev-statistics-042424-114518","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-114518","url":null,"abstract":"Assessing the performance of an estimated model for count time series is critical for subsequent statistical inference and distributional forecasting. This article reviews the most commonly used count time series models and focuses on evaluating their goodness of fit. Various formal statistical tests are presented, along with useful graphical diagnostic tools. The methods are illustrated on a real data example.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"113 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145397498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Tensor Time Series 张量时间序列分析
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-14 DOI: 10.1146/annurev-statistics-042424-063308
Stevenson Bolivar, Shuo-Chieh Huang, Rong Chen
This article provides a comprehensive overview of statistical methods developed for the analysis of tensor time series data, which have become increasingly prevalent across various fields such as economics, finance, biology, engineering, and the social sciences. The review focuses on three primary approaches: autoregressive modeling, factor modeling, and segmentation approaches. These methods leverage the inherent tensor structure to offer advantages such as dimension reduction, enhanced interpretability, and computational efficiency. The review focuses on model settings and their potential interpretations, discussing various estimation techniques for these models and their associated theoretical properties. In addition, we outline various applications using these models and discuss potential directions for future developments.
本文全面概述了为分析张量时间序列数据而开发的统计方法,这些方法在经济、金融、生物、工程和社会科学等各个领域越来越普遍。本文主要介绍了三种主要方法:自回归建模、因子建模和分割方法。这些方法利用固有的张量结构来提供诸如降维、增强的可解释性和计算效率等优势。这篇综述的重点是模型设置及其潜在的解释,讨论了这些模型的各种估计技术及其相关的理论性质。此外,我们概述了使用这些模型的各种应用,并讨论了未来发展的潜在方向。
{"title":"Analysis of Tensor Time Series","authors":"Stevenson Bolivar, Shuo-Chieh Huang, Rong Chen","doi":"10.1146/annurev-statistics-042424-063308","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-063308","url":null,"abstract":"This article provides a comprehensive overview of statistical methods developed for the analysis of tensor time series data, which have become increasingly prevalent across various fields such as economics, finance, biology, engineering, and the social sciences. The review focuses on three primary approaches: autoregressive modeling, factor modeling, and segmentation approaches. These methods leverage the inherent tensor structure to offer advantages such as dimension reduction, enhanced interpretability, and computational efficiency. The review focuses on model settings and their potential interpretations, discussing various estimation techniques for these models and their associated theoretical properties. In addition, we outline various applications using these models and discuss potential directions for future developments.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"27 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145289224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression Models with Interval-Censored Variables 区间截尾变量回归模型
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-14 DOI: 10.1146/annurev-statistics-042424-103337
Guadalupe Gómez Melis, Ramon Oller, Klaus Langohr
Survival analysis is essential for modeling time-to-event data across various fields, including medicine, engineering, and the social sciences. A major challenge in this field is handling censored data, particularly partly interval-censored data, where event times are either precisely recorded or only known to fall within a specific interval. Proper statistical modeling of such data is crucial for drawing valid conclusions and making accurate predictions. This article reviews regression models for analyzing interval-censored responses and their implementation in R. Following an introduction to the nonparametric maximum likelihood estimator, we focus on four major regression models: the accelerated failure time model, the proportional hazards model, the proportional odds model, and the generalized odds-rate model. For each, we review the state of the art, outline its methodology, discuss implementation strategies, and illustrate practical applications using real-world data. The article concludes with a discussion of current challenges, alternative modeling approaches, and potential directions for future research.
生存分析对于跨各个领域(包括医学、工程和社会科学)建模时间到事件数据至关重要。该领域的一个主要挑战是处理审查数据,特别是部分间隔审查数据,其中事件时间要么被精确记录,要么只在特定间隔内已知。对这些数据进行适当的统计建模对于得出有效的结论和作出准确的预测至关重要。本文回顾了用于分析区间截短响应的回归模型及其在r中的实现。在介绍了非参数最大似然估计量之后,我们重点介绍了四种主要的回归模型:加速失效时间模型、比例风险模型、比例赔率模型和广义赔率模型。对于每一个,我们回顾了最新的技术,概述了其方法,讨论了实现策略,并使用真实世界的数据说明了实际应用。文章最后讨论了当前的挑战、可选的建模方法和未来研究的潜在方向。
{"title":"Regression Models with Interval-Censored Variables","authors":"Guadalupe Gómez Melis, Ramon Oller, Klaus Langohr","doi":"10.1146/annurev-statistics-042424-103337","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-103337","url":null,"abstract":"Survival analysis is essential for modeling time-to-event data across various fields, including medicine, engineering, and the social sciences. A major challenge in this field is handling censored data, particularly partly interval-censored data, where event times are either precisely recorded or only known to fall within a specific interval. Proper statistical modeling of such data is crucial for drawing valid conclusions and making accurate predictions. This article reviews regression models for analyzing interval-censored responses and their implementation in R. Following an introduction to the nonparametric maximum likelihood estimator, we focus on four major regression models: the accelerated failure time model, the proportional hazards model, the proportional odds model, and the generalized odds-rate model. For each, we review the state of the art, outline its methodology, discuss implementation strategies, and illustrate practical applications using real-world data. The article concludes with a discussion of current challenges, alternative modeling approaches, and potential directions for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"1 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145289381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative Analysis of Multimodal Omics Data 多模态组学数据的综合分析
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 DOI: 10.1146/annurev-statistics-042424-113016
Gen Li, Eric F. Lock
With advancements in technology and the decreasing cost of data acquisition, high-throughput omics data have become increasingly prevalent in biomedical research. These data are often collected across multiple omics modalities at different molecular levels, offering a comprehensive perspective on underlying biological mechanisms. However, the multimodal nature of multiomics data presents unique and complex challenges for statistical analysis. In this article, we provide a comprehensive review of recent advancements in statistical methods for multiomics data integration. We discuss key topics in unsupervised learning (including dimension reduction, clustering, and network analysis), supervised learning (including regression, classification, and mediation analysis), and other areas. Finally, we highlight unresolved challenges and propose promising directions for future research to further advance the field.
随着技术的进步和数据获取成本的降低,高通量组学数据在生物医学研究中越来越普遍。这些数据通常是在不同分子水平上通过多种组学方式收集的,为潜在的生物学机制提供了一个全面的视角。然而,多组学数据的多模态特性为统计分析带来了独特而复杂的挑战。在这篇文章中,我们提供了一个全面的综述在统计方法的最新进展多组学数据集成。我们讨论了无监督学习(包括降维、聚类和网络分析)、监督学习(包括回归、分类和中介分析)和其他领域的关键主题。最后,我们强调了尚未解决的挑战,并提出了未来研究的有希望的方向,以进一步推进该领域。
{"title":"Integrative Analysis of Multimodal Omics Data","authors":"Gen Li, Eric F. Lock","doi":"10.1146/annurev-statistics-042424-113016","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-113016","url":null,"abstract":"With advancements in technology and the decreasing cost of data acquisition, high-throughput omics data have become increasingly prevalent in biomedical research. These data are often collected across multiple omics modalities at different molecular levels, offering a comprehensive perspective on underlying biological mechanisms. However, the multimodal nature of multiomics data presents unique and complex challenges for statistical analysis. In this article, we provide a comprehensive review of recent advancements in statistical methods for multiomics data integration. We discuss key topics in unsupervised learning (including dimension reduction, clustering, and network analysis), supervised learning (including regression, classification, and mediation analysis), and other areas. Finally, we highlight unresolved challenges and propose promising directions for future research to further advance the field.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"114 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change-Point Detection and Its Modern Applications 变点检测及其现代应用
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-10 DOI: 10.1146/annurev-statistics-041124-044143
Jialiang Li, Jingli Wang, Yuetao Yu
We review recent advances in change-point detection methods across three important fields of statistics: (a) We first present a subgroup identification method based on a multi-threshold change plane model where the subgroup boundaries are defined by a high-dimensional hyperplane in the covariate space. Subjects grouped into different regions may receive more individualized treatments in medical research studies and achieve improved health outcomes. (b) We then consider the estimation of discontinuity for functional process data. Many longitudinal or functional responses may exhibit abrupt jumps, and our methodology effectively accommodates such complicated nonsmooth features. (c) Finally, we explore change-point estimation within dynamic networks using a recently proposed network autoregressive model. This framework demonstrates that community structures in networks can shift similarly to changes observed in time series data. These reviews highlight the wide-ranging applications of change-point detection methodologies in modern data analysis.
我们回顾了在统计学的三个重要领域中变化点检测方法的最新进展:(a)我们首先提出了一种基于多阈值变化平面模型的子群识别方法,其中子群边界由协变量空间中的高维超平面定义。在医学研究中,被分组到不同地区的受试者可能会得到更个性化的治疗,并获得更好的健康结果。(b)然后考虑函数过程数据的不连续估计。许多纵向或功能响应可能表现出突然跳跃,我们的方法有效地适应了这种复杂的非光滑特征。(c)最后,我们使用最近提出的网络自回归模型探索动态网络中的变点估计。该框架表明,网络中的社区结构可以类似于在时间序列数据中观察到的变化而变化。这些评论强调了变化点检测方法在现代数据分析中的广泛应用。
{"title":"Change-Point Detection and Its Modern Applications","authors":"Jialiang Li, Jingli Wang, Yuetao Yu","doi":"10.1146/annurev-statistics-041124-044143","DOIUrl":"https://doi.org/10.1146/annurev-statistics-041124-044143","url":null,"abstract":"We review recent advances in change-point detection methods across three important fields of statistics: (<jats:italic>a</jats:italic>) We first present a subgroup identification method based on a multi-threshold change plane model where the subgroup boundaries are defined by a high-dimensional hyperplane in the covariate space. Subjects grouped into different regions may receive more individualized treatments in medical research studies and achieve improved health outcomes. (<jats:italic>b</jats:italic>) We then consider the estimation of discontinuity for functional process data. Many longitudinal or functional responses may exhibit abrupt jumps, and our methodology effectively accommodates such complicated nonsmooth features. (<jats:italic>c</jats:italic>) Finally, we explore change-point estimation within dynamic networks using a recently proposed network autoregressive model. This framework demonstrates that community structures in networks can shift similarly to changes observed in time series data. These reviews highlight the wide-ranging applications of change-point detection methodologies in modern data analysis.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"43 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disasters, Statistics, and the Humanitarian Sector 灾害、统计和人道主义部门
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-10 DOI: 10.1146/annurev-statistics-042424-061122
Hamish William Patten, Zineb Bhaby
This article examines the role of statistics in the humanitarian sector, with a particular focus on disasters caused by natural hazards. It begins by outlining current applications, including primary data collection, anticipatory action frameworks, Earth observation, mobile positioning data, and artificial intelligence. It then highlights key challenges such as gaps and biases in disaster impact and response data, difficulties in communicating statistical findings clearly, inequities in aid allocation, and the widespread outsourcing of statistics-related work. In exploring future applications, the article discusses the potential of impact-based early warning models, dynamic population data, and artificial intelligence to enhance communication and decision-making. Throughout, emphasis is placed on the need for interoperable systems as well as ethical and inclusive data practices. In doing so, the article presents statistics as both a diagnostic and strategic tool for strengthening the effectiveness, fairness, and responsiveness of humanitarian action in disaster contexts.
本文探讨统计数据在人道主义领域的作用,特别侧重于自然灾害造成的灾害。它首先概述了当前的应用,包括主要数据收集、预期行动框架、地球观测、移动定位数据和人工智能。然后,报告强调了主要挑战,如灾害影响和响应数据方面的差距和偏见、清晰传达统计结果的困难、援助分配的不公平以及统计相关工作的广泛外包。在探索未来应用方面,本文讨论了基于影响的预警模型、动态人口数据和人工智能在加强沟通和决策方面的潜力。在整个过程中,重点放在对互操作系统以及道德和包容性数据实践的需求上。在此过程中,本文将统计数据作为一种诊断和战略工具,用于加强灾害背景下人道主义行动的有效性、公平性和响应性。
{"title":"Disasters, Statistics, and the Humanitarian Sector","authors":"Hamish William Patten, Zineb Bhaby","doi":"10.1146/annurev-statistics-042424-061122","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-061122","url":null,"abstract":"This article examines the role of statistics in the humanitarian sector, with a particular focus on disasters caused by natural hazards. It begins by outlining current applications, including primary data collection, anticipatory action frameworks, Earth observation, mobile positioning data, and artificial intelligence. It then highlights key challenges such as gaps and biases in disaster impact and response data, difficulties in communicating statistical findings clearly, inequities in aid allocation, and the widespread outsourcing of statistics-related work. In exploring future applications, the article discusses the potential of impact-based early warning models, dynamic population data, and artificial intelligence to enhance communication and decision-making. Throughout, emphasis is placed on the need for interoperable systems as well as ethical and inclusive data practices. In doing so, the article presents statistics as both a diagnostic and strategic tool for strengthening the effectiveness, fairness, and responsiveness of humanitarian action in disaster contexts.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"85 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Enemies of Reliable and Useful Clinical Prediction Models: A Review of Statistical and Scientific Challenges 可靠和有用的临床预测模型的敌人:对统计和科学挑战的回顾
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-28 DOI: 10.1146/annurev-statistics-042324-123749
Ben Van Calster, Maarten van Smeden, Wouter van Amsterdam, Maarten Coemans, Laure Wynants, Ewout W. Steyerberg
The current status of applied clinical prediction modeling is poor. Many models are developed with suboptimal methods and are not evaluated, and hence have little impact on clinical care. We review 12 challenges—provocatively labeled enemies—that jeopardize the creation of prediction models that make it to clinical practice to improve treatment decisions and clinical outcomes for individual patients. The challenges cover four areas: context, data, design and analysis, and scientific culture. We provide negative examples and recommendations for improvement, but also highlight positive examples and developments. Greater awareness of the complexities surrounding clinical prediction modeling is needed among researchers, funding agencies, health professionals as end users, and all of us as potential patients. To improve the utility of prediction models for healthcare and society, we need fewer but better models as well as more resources for model validation, impact assessment, and implementation.
目前临床应用预测建模的现状较差。许多模型是用次优方法开发的,没有进行评估,因此对临床护理影响不大。我们回顾了12个挑战-具有挑衅性的标记敌人-危及预测模型的创建,使其能够用于临床实践,以改善个体患者的治疗决策和临床结果。挑战包括四个方面:环境、数据、设计和分析以及科学文化。我们提供消极的例子和改进建议,但也强调积极的例子和发展。研究人员、资助机构、作为最终用户的卫生专业人员以及作为潜在患者的我们所有人都需要对临床预测建模的复杂性有更多的认识。为了提高医疗保健和社会预测模型的效用,我们需要更少但更好的模型,以及用于模型验证、影响评估和实现的更多资源。
{"title":"The Enemies of Reliable and Useful Clinical Prediction Models: A Review of Statistical and Scientific Challenges","authors":"Ben Van Calster, Maarten van Smeden, Wouter van Amsterdam, Maarten Coemans, Laure Wynants, Ewout W. Steyerberg","doi":"10.1146/annurev-statistics-042324-123749","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-123749","url":null,"abstract":"The current status of applied clinical prediction modeling is poor. Many models are developed with suboptimal methods and are not evaluated, and hence have little impact on clinical care. We review 12 challenges—provocatively labeled enemies—that jeopardize the creation of prediction models that make it to clinical practice to improve treatment decisions and clinical outcomes for individual patients. The challenges cover four areas: context, data, design and analysis, and scientific culture. We provide negative examples and recommendations for improvement, but also highlight positive examples and developments. Greater awareness of the complexities surrounding clinical prediction modeling is needed among researchers, funding agencies, health professionals as end users, and all of us as potential patients. To improve the utility of prediction models for healthcare and society, we need fewer but better models as well as more resources for model validation, impact assessment, and implementation.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Based Spatial Data Fusion 基于模型的空间数据融合
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-27 DOI: 10.1146/annurev-statistics-042424-052920
Alan E. Gelfand, Erin M. Schliep
With increased data collection, the need to fuse data sources has emerged as an important and rapidly growing research activity in the statistical community. In considering spatial and spatio-temporal datasets to examine complex environmental and ecological processes of interest, we often have multiple sources that are jointly informative about features of interest of the processes. Model-based data fusion aims to leverage information from these sources to improve inference and prediction. In the spatial statistics setting, these data could be geostatistical; areal; or point patterns with varying spatial resolutions, supports, and domains. Given two or more sources, we explore stochastic modeling to implement a suitable fusion with full inference and uncertainty quantification. We illustrate these ideas using three environmental and ecological examples: precipitation, marine mammal abundance, and joint species distributions.
随着数据收集的增加,融合数据源的需要已成为统计界一项重要和迅速增长的研究活动。在考虑空间和时空数据集来检查感兴趣的复杂环境和生态过程时,我们通常有多个来源,这些来源共同提供有关过程感兴趣特征的信息。基于模型的数据融合旨在利用这些来源的信息来改进推理和预测。在空间统计设置中,这些数据可以是地统计数据;区域;或具有不同空间分辨率、支持和域的点模式。在给定两个或多个源的情况下,我们探索随机建模来实现充分推理和不确定性量化的适当融合。我们用三个环境和生态的例子来说明这些观点:降水、海洋哺乳动物丰度和共同物种分布。
{"title":"Model-Based Spatial Data Fusion","authors":"Alan E. Gelfand, Erin M. Schliep","doi":"10.1146/annurev-statistics-042424-052920","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-052920","url":null,"abstract":"With increased data collection, the need to fuse data sources has emerged as an important and rapidly growing research activity in the statistical community. In considering spatial and spatio-temporal datasets to examine complex environmental and ecological processes of interest, we often have multiple sources that are jointly informative about features of interest of the processes. Model-based data fusion aims to leverage information from these sources to improve inference and prediction. In the spatial statistics setting, these data could be geostatistical; areal; or point patterns with varying spatial resolutions, supports, and domains. Given two or more sources, we explore stochastic modeling to implement a suitable fusion with full inference and uncertainty quantification. We illustrate these ideas using three environmental and ecological examples: precipitation, marine mammal abundance, and joint species distributions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"10 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying Inference After Adaptive Experiments 自适应实验后推理的揭秘
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-04 DOI: 10.1146/annurev-statistics-040522-015431
Aurélien Bibaut, Nathan Kallus
Adaptive experiments such as multi-armed bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying the best treatments after the experiment, and to avoid wasting data. As an experiment (and not just a continually optimizing system), it is still desirable to draw statistical inferences with frequentist guarantees. The concentration inequalities and union bounds that generally underlie adaptive experimentation algorithms can yield overly conservative inferences, but at the same time, the asymptotic normality we would usually appeal to in nonadaptive settings can be imperiled by adaptivity. In this article we aim to explain why, how, and when adaptivity is in fact an issue for inference and, when it is, to understand the various ways to fix it: reweighting to stabilize variances and recover asymptotic normality, using always-valid inference based on joint normality of an asymptotic limiting sequence, and characterizing and inverting the nonnormal distributions induced by adaptivity.
适应性实验,如多武装强盗,根据迄今为止观察到的数据调整处理分配政策和/或停止实验的决定。这有可能改善实验参与者的结果,提高实验后确定最佳治疗方法的机会,并避免浪费数据。作为一个实验(而不仅仅是一个不断优化的系统),用频率保证得出统计推断仍然是可取的。通常作为自适应实验算法基础的集中不等式和联合边界可能会产生过于保守的推断,但与此同时,我们通常在非自适应环境中所呼吁的渐近正态性可能会受到自适应性的危害。在本文中,我们的目的是解释为什么,如何,以及何时自适应实际上是推理的一个问题,以及当它是,了解解决它的各种方法:重新加权以稳定方差和恢复渐近正态,使用基于渐近极限序列的联合正态的始终有效的推理,以及表征和反转由自适应引起的非正态分布。
{"title":"Demystifying Inference After Adaptive Experiments","authors":"Aurélien Bibaut, Nathan Kallus","doi":"10.1146/annurev-statistics-040522-015431","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-015431","url":null,"abstract":"Adaptive experiments such as multi-armed bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying the best treatments after the experiment, and to avoid wasting data. As an experiment (and not just a continually optimizing system), it is still desirable to draw statistical inferences with frequentist guarantees. The concentration inequalities and union bounds that generally underlie adaptive experimentation algorithms can yield overly conservative inferences, but at the same time, the asymptotic normality we would usually appeal to in nonadaptive settings can be imperiled by adaptivity. In this article we aim to explain why, how, and when adaptivity is in fact an issue for inference and, when it is, to understand the various ways to fix it: reweighting to stabilize variances and recover asymptotic normality, using always-valid inference based on joint normality of an asymptotic limiting sequence, and characterizing and inverting the nonnormal distributions induced by adaptivity.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"26 1","pages":"407-423"},"PeriodicalIF":7.9,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models 深度学习统计理论概览:逼近、训练动态和生成模型
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-21 DOI: 10.1146/annurev-statistics-040522-013920
Namjoon Suh, Guang Cheng
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.
在本文中,我们从逼近、训练动态和生成模型三个角度回顾了有关神经网络统计理论的文献。在第一部分中,我们回顾了在非参数回归框架下神经网络的超额风险结果。这些结果依赖于神经网络的明确构造,从而导致超额风险的快速收敛率。然而,它们的基本分析只适用于深度神经网络高度非凸景观中的全局最小化。这促使我们在第二部分回顾神经网络的训练动态。具体来说,我们回顾了一些文章,这些文章试图回答这样一个问题:通过基于梯度的方法训练的神经网络如何找到一个能在未见数据上很好泛化的解决方案。我们特别回顾了两种著名的范式:神经正切核和均值场范式。最后,我们回顾了生成模型的最新理论进展,包括生成对抗网络、扩散模型,以及从近似和训练动态这两个相同的角度对大型语言模型进行的上下文学习。
{"title":"A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models","authors":"Namjoon Suh, Guang Cheng","doi":"10.1146/annurev-statistics-040522-013920","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-013920","url":null,"abstract":"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"111 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Statistics and Its Application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1