首页 > 最新文献

Annual Review of Statistics and Its Application最新文献

英文 中文
Variable Importance Without Impossible Data 没有不可能数据的可变重要性
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-25 DOI: 10.1146/annurev-statistics-040722-045325
Masayoshi Mase, Art B. Owen, Benjamin B. Seiler
The most popular methods for measuring importance of the variables in a black-box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple observations. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead, we advocate a method called cohort Shapley, which is grounded in economic game theory and uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of observations judged to be similar to a target observation on one or more features. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
在黑箱预测算法中,最常用的测量变量重要性的方法是使用综合输入,该输入结合了来自多个观测值的预测变量。这些输入可能不太可能,物理上不可能,甚至逻辑上不可能。因此,对此类案件的预测可以基于与黑匣子训练的数据非常不同的数据。我们认为,当解释使用这些值时,用户不能信任预测算法的决策解释。相反,我们提倡一种叫做队列沙普利(cohort Shapley)的方法,它以经济博弈论为基础,只使用实际观察到的数据来量化变量的重要性。队列Shapley的工作原理是缩小在一个或多个特征上被判断为与目标观察相似的观察队列。我们在一个算法公平问题上说明了这一点,在这个问题上,必须将模型未训练的受保护变量的重要性归因于此。预计《统计年鉴及其应用》第11卷的最终在线出版日期为2024年3月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"Variable Importance Without Impossible Data","authors":"Masayoshi Mase, Art B. Owen, Benjamin B. Seiler","doi":"10.1146/annurev-statistics-040722-045325","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040722-045325","url":null,"abstract":"The most popular methods for measuring importance of the variables in a black-box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple observations. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead, we advocate a method called cohort Shapley, which is grounded in economic game theory and uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of observations judged to be similar to a target observation on one or more features. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"16 12","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50165107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inverse Problems for Physics-Based Process Models 基于物理过程模型的逆问题
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-16 DOI: 10.1146/annurev-statistics-031017-100108
D. Bingham, T. Butler, D. Estep
We describe and compare two formulations of inverse problems for a physics-based process model in the context of uncertainty and random variability: the Bayesian inverse problem and the stochastic inverse problem. We describe the foundations of the two problems in order to create a context for interpreting the applicability and solutions of inverse problems important for scientific and engineering inference. We conclude by comparing them to statistical approaches to related problems, including Bayesian calibration of computer models. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
在不确定性和随机变异性的背景下,我们描述并比较了基于物理的过程模型的两种反问题:贝叶斯反问题和随机反问题。我们描述了这两个问题的基础,以便为解释对科学和工程推理重要的逆问题的适用性和解决方案创造一个背景。最后,我们将它们与相关问题的统计方法进行比较,包括计算机模型的贝叶斯校准。预计《统计年鉴及其应用》第11卷的最终在线出版日期为2024年3月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"Inverse Problems for Physics-Based Process Models","authors":"D. Bingham, T. Butler, D. Estep","doi":"10.1146/annurev-statistics-031017-100108","DOIUrl":"https://doi.org/10.1146/annurev-statistics-031017-100108","url":null,"abstract":"We describe and compare two formulations of inverse problems for a physics-based process model in the context of uncertainty and random variability: the Bayesian inverse problem and the stochastic inverse problem. We describe the foundations of the two problems in order to create a context for interpreting the applicability and solutions of inverse problems important for scientific and engineering inference. We conclude by comparing them to statistical approaches to related problems, including Bayesian calibration of computer models. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44379080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Inference for Misspecified Generative Models 未指定生成模型的贝叶斯推理
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-05-15 DOI: 10.1146/annurev-statistics-040522-015915
D. Nott, C. Drovandi, David T. Frazier
Bayesian inference is a powerful tool for combining information in complex settings, a task of increasing importance in modern applications. However, Bayesian inference with a flawed model can produce unreliable conclusions. This review discusses approaches to performing Bayesian inference when the model is misspecified, where, by misspecified, we mean that the analyst is unwilling to act as if the model is correct. Much has been written about this topic, and in most cases we do not believe that a conventional Bayesian analysis is meaningful when there is serious model misspecification. Nevertheless, in some cases it is possible to use a well-specified model to give meaning to a Bayesian analysis of a misspecified model, and we focus on such cases. Three main classes of methods are discussed: restricted likelihood methods, which use a model based on an insufficient summary of the original data; modular inference methods, which use a model constructed from coupled submodels, with some of the submodels correctly specified; and the use of a reference model to construct a projected posterior or predictive distribution for a simplified model considered to be useful for prediction or interpretation. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
贝叶斯推理是在复杂环境中组合信息的强大工具,在现代应用中越来越重要。然而,贝叶斯推理与一个有缺陷的模型可能产生不可靠的结论。这篇评论讨论了当模型被错误指定时执行贝叶斯推理的方法,在这里,通过错误指定,我们的意思是分析师不愿意采取行动,好像模型是正确的。关于这个主题已经写了很多,在大多数情况下,当存在严重的模型错误规范时,我们不相信传统的贝叶斯分析是有意义的。然而,在某些情况下,可以使用指定良好的模型来对指定错误的模型进行贝叶斯分析,我们将重点关注这些情况。讨论了三种主要的方法:限制似然方法,它使用基于原始数据不充分总结的模型;模块化推理方法,它使用由耦合子模型构造的模型,并正确指定一些子模型;并使用参考模型来构建一个预测后验或预测分布的简化模型被认为是有用的预测或解释。预计《统计年鉴及其应用》第11卷的最终在线出版日期为2024年3月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"Bayesian Inference for Misspecified Generative Models","authors":"D. Nott, C. Drovandi, David T. Frazier","doi":"10.1146/annurev-statistics-040522-015915","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-015915","url":null,"abstract":"Bayesian inference is a powerful tool for combining information in complex settings, a task of increasing importance in modern applications. However, Bayesian inference with a flawed model can produce unreliable conclusions. This review discusses approaches to performing Bayesian inference when the model is misspecified, where, by misspecified, we mean that the analyst is unwilling to act as if the model is correct. Much has been written about this topic, and in most cases we do not believe that a conventional Bayesian analysis is meaningful when there is serious model misspecification. Nevertheless, in some cases it is possible to use a well-specified model to give meaning to a Bayesian analysis of a misspecified model, and we focus on such cases. Three main classes of methods are discussed: restricted likelihood methods, which use a model based on an insufficient summary of the original data; modular inference methods, which use a model constructed from coupled submodels, with some of the submodels correctly specified; and the use of a reference model to construct a projected posterior or predictive distribution for a simplified model considered to be useful for prediction or interpretation. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46969495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Second-Generation Functional Data 第二代功能数据
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-032921-033726
Salil Koner, Ana-Maria Staicu
Modern studies from a variety of fields record multiple functional observations according to either multivariate, longitudinal, spatial, or time series designs. We refer to such data as second-generation functional data because their analysis—unlike typical functional data analysis, which assumes independence of the functions—accounts for the complex dependence between the functional observations and requires more advanced methods. In this article, we provide an overview of the techniques for analyzing second-generation functional data with a focus on highlighting the key methodological intricacies that stem from the need for modeling complex dependence, compared with independent functional data. For each of the four types of second-generation functional data presented—multivariate functional data, longitudinal functional data, functional time series and spatially functional data—we discuss how the widely popular functional principal component analysis can be extended to these settings to define, identify main directions of variation, and describe dependence among the functions. In addition to modeling, we also discuss prediction, statistical inference, and application to clustering. We close by discussing future directions in this area.
来自不同领域的现代研究根据多变量、纵向、空间或时间序列设计记录了多种功能观测结果。我们将这些数据称为第二代函数数据,因为它们的分析与典型的假设函数独立性的函数数据分析不同,考虑了函数观测之间的复杂依赖关系,需要更先进的方法。在本文中,我们概述了用于分析第二代功能数据的技术,重点强调了与独立功能数据相比,由于需要对复杂依赖进行建模而产生的关键方法复杂性。对于提出的四种类型的第二代功能数据——多元功能数据、纵向功能数据、功能时间序列和空间功能数据——中的每一种,我们讨论了如何将广泛流行的功能主成分分析扩展到这些设置中,以定义、识别变化的主要方向,并描述功能之间的依赖性。除了建模,我们还讨论了预测、统计推断和聚类的应用。我们以讨论这一领域的未来方向作为结束。
{"title":"Second-Generation Functional Data","authors":"Salil Koner, Ana-Maria Staicu","doi":"10.1146/annurev-statistics-032921-033726","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-033726","url":null,"abstract":"Modern studies from a variety of fields record multiple functional observations according to either multivariate, longitudinal, spatial, or time series designs. We refer to such data as second-generation functional data because their analysis—unlike typical functional data analysis, which assumes independence of the functions—accounts for the complex dependence between the functional observations and requires more advanced methods. In this article, we provide an overview of the techniques for analyzing second-generation functional data with a focus on highlighting the key methodological intricacies that stem from the need for modeling complex dependence, compared with independent functional data. For each of the four types of second-generation functional data presented—multivariate functional data, longitudinal functional data, functional time series and spatially functional data—we discuss how the widely popular functional principal component analysis can be extended to these settings to define, identify main directions of variation, and describe dependence among the functions. In addition to modeling, we also discuss prediction, statistical inference, and application to clustering. We close by discussing future directions in this area.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43209001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Brief Tour of Deep Learning from a Statistical Perspective 从统计角度简要介绍深度学习
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-032921-013738
Eric T. Nalisnick, Padhraic Smyth, Dustin Tran
We expose the statistical foundations of deep learning with the goal of facilitating conversation between the deep learning and statistics communities. We highlight core themes at the intersection; summarize key neural models, such as feedforward neural networks, sequential neural networks, and neural latent variable models; and link these ideas to their roots in probability and statistics. We also highlight research directions in deep learning where there are opportunities for statistical contributions.
我们揭示了深度学习的统计基础,目的是促进深度学习和统计社区之间的对话。我们在交叉点突出核心主题;综述了关键的神经模型,如前馈神经网络、序列神经网络和神经潜变量模型;并将这些想法与它们在概率和统计学中的根源联系起来。我们还强调了深度学习中有机会做出统计贡献的研究方向。
{"title":"A Brief Tour of Deep Learning from a Statistical Perspective","authors":"Eric T. Nalisnick, Padhraic Smyth, Dustin Tran","doi":"10.1146/annurev-statistics-032921-013738","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-013738","url":null,"abstract":"We expose the statistical foundations of deep learning with the goal of facilitating conversation between the deep learning and statistics communities. We highlight core themes at the intersection; summarize key neural models, such as feedforward neural networks, sequential neural networks, and neural latent variable models; and link these ideas to their roots in probability and statistics. We also highlight research directions in deep learning where there are opportunities for statistical contributions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45067520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Surrogate Endpoints in Clinical Trials 临床试验中的替代终点
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-032921-035359
M. Elliott
Surrogate markers are often used in clinical trials settings when obtaining a final outcome to evaluate the effectiveness of a treatment requires a long wait, is expensive to obtain, or both. Formal definitions of surrogate marker quality resulting from a large variety of estimation approaches have been proposed over the years. I review this work, with a particular focus on approaches that use the causal inference paradigm, as these conceptualize a good marker as one in the causal pathway between the treatment and outcome. I also focus on efforts to evaluate the risk of a surrogate paradox, a damaging situation where the surrogate is positively associated with the outcome, and the causal effect of the treatment on the surrogate is in a helpful direction, but the ultimate causal effect of the treatment on the outcome is harmful. I then review some recent work in robust surrogate marker estimation and conclude with a discussion and suggestions for future research.
替代标记通常用于临床试验环境,当获得最终结果评估治疗的有效性需要很长时间的等待,是昂贵的,或两者兼而有之。多年来,各种各样的评估方法已经提出了替代标记质量的正式定义。我回顾了这项工作,特别关注使用因果推理范式的方法,因为这些方法将良好的标记概念化为治疗和结果之间的因果途径之一。我还专注于评估代理悖论的风险,即代理与结果呈正相关的破坏性情况,治疗对代理的因果效应是有益的,但治疗对结果的最终因果效应是有害的。然后,我回顾了最近在稳健替代标记估计方面的一些工作,并对未来的研究进行了讨论和建议。
{"title":"Surrogate Endpoints in Clinical Trials","authors":"M. Elliott","doi":"10.1146/annurev-statistics-032921-035359","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-035359","url":null,"abstract":"Surrogate markers are often used in clinical trials settings when obtaining a final outcome to evaluate the effectiveness of a treatment requires a long wait, is expensive to obtain, or both. Formal definitions of surrogate marker quality resulting from a large variety of estimation approaches have been proposed over the years. I review this work, with a particular focus on approaches that use the causal inference paradigm, as these conceptualize a good marker as one in the causal pathway between the treatment and outcome. I also focus on efforts to evaluate the risk of a surrogate paradox, a damaging situation where the surrogate is positively associated with the outcome, and the causal effect of the treatment on the surrogate is in a helpful direction, but the ultimate causal effect of the treatment on the outcome is harmful. I then review some recent work in robust surrogate marker estimation and conclude with a discussion and suggestions for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43592539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical Data Privacy: A Song of Privacy and Utility 统计数据隐私:隐私与实用之歌
1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-033121-112921
Aleksandra Slavković, Jeremy Seeman
To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms that sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC) and the more recent differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at their core. For inference problems, either we may design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk measures, or we may adjust existing sanitized output to create new statistically valid and optimal estimators. Regardless of design or adjustment, in evaluating risk and utility, valid statistical inferences from mechanism outputs require uncertainty quantification that accounts for the effect of the sanitization mechanism that introduces bias and/or variance. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference.
为了量化对开放数据共享日益增长的需求和对敏感信息披露的担忧之间的权衡,统计数据隐私(SDP)方法分析了基于机密数据的数据发布机制。目前存在两种主要框架:统计披露控制(SDC)和最近的差异隐私(DP)。尽管框架存在差异,但SDC和DP的核心都存在相同的统计问题。对于推理问题,我们可以设计最优的释放机制和相关的估计器,满足披露风险度量的界限,或者我们可以调整现有的净化输出来创建新的统计有效和最优的估计器。无论设计或调整如何,在评估风险和效用时,机制输出的有效统计推断需要不确定性量化,以解释引入偏差和/或方差的消毒机制的影响。在这篇综述中,我们讨论了SDC和DP共同的统计基础,重点介绍了SDP的主要发展,并提出了私人推理中令人兴奋的开放研究问题。
{"title":"Statistical Data Privacy: A Song of Privacy and Utility","authors":"Aleksandra Slavković, Jeremy Seeman","doi":"10.1146/annurev-statistics-033121-112921","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-112921","url":null,"abstract":"To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms that sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC) and the more recent differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at their core. For inference problems, either we may design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk measures, or we may adjust existing sanitized output to create new statistically valid and optimal estimators. Regardless of design or adjustment, in evaluating risk and utility, valid statistical inferences from mechanism outputs require uncertainty quantification that accounts for the effect of the sanitization mechanism that introduces bias and/or variance. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Graph-Based Change-Point Analysis 基于图的变化点分析
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-122121-033817
Hao Chen, Lynna Chu
Recent technological advances allow for the collection of massive data in the study of complex phenomena over time and/or space in various fields. Many of these data involve sequences of high-dimensional or non-Euclidean measurements, where change-point analysis is a crucial early step in understanding the data. Segmentation, or offline change-point analysis, divides data into homogeneous temporal or spatial segments, making subsequent analysis easier; its online counterpart detects changes in sequentially observed data, allowing for real-time anomaly detection. This article reviews a nonparametric change-point analysis framework that utilizes graphs representing the similarity between observations. This framework can be applied to data as long as a reasonable dissimilarity distance among the observations can be defined. Thus, this framework can be applied to a wide range of applications, from high-dimensional data to non-Euclidean data, such as imaging data or network data. In addition, analytic formulas can be derived to control the false discoveries, making them easy off-the-shelf data analysis tools.
最近的技术进步允许在研究各个领域随时间和/或空间变化的复杂现象时收集大量数据。其中许多数据涉及高维或非欧几里得测量序列,其中变点分析是理解数据的关键早期步骤。分割,或离线变化点分析,将数据划分为同质的时间或空间片段,使后续分析更容易;它的在线同行检测顺序观测数据的变化,从而实现实时异常检测。本文回顾了一个非参数变点分析框架,该框架利用图来表示观测值之间的相似性。只要能够定义观测值之间合理的相异距离,该框架就可以应用于数据。因此,该框架可以应用于广泛的应用,从高维数据到非欧几里得数据,例如成像数据或网络数据。此外,还可以导出分析公式来控制错误发现,使其成为现成的数据分析工具。
{"title":"Graph-Based Change-Point Analysis","authors":"Hao Chen, Lynna Chu","doi":"10.1146/annurev-statistics-122121-033817","DOIUrl":"https://doi.org/10.1146/annurev-statistics-122121-033817","url":null,"abstract":"Recent technological advances allow for the collection of massive data in the study of complex phenomena over time and/or space in various fields. Many of these data involve sequences of high-dimensional or non-Euclidean measurements, where change-point analysis is a crucial early step in understanding the data. Segmentation, or offline change-point analysis, divides data into homogeneous temporal or spatial segments, making subsequent analysis easier; its online counterpart detects changes in sequentially observed data, allowing for real-time anomaly detection. This article reviews a nonparametric change-point analysis framework that utilizes graphs representing the similarity between observations. This framework can be applied to data as long as a reasonable dissimilarity distance among the observations can be defined. Thus, this framework can be applied to a wide range of applications, from high-dimensional data to non-Euclidean data, such as imaging data or network data. In addition, analytic formulas can be derived to control the false discoveries, making them easy off-the-shelf data analysis tools.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"1 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43096913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical Applications to Cognitive Diagnostic Testing 统计学在认知诊断测试中的应用
1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-10 DOI: 10.1146/annurev-statistics-033021-111803
Susu Zhang, Jingchen Liu, Zhiliang Ying
Diagnostic classification tests are designed to assess examinees’ discrete mastery status on a set of skills or attributes. Such tests have gained increasing attention in educational and psychological measurement. We review diagnostic classification models and their applications to testing and learning, discuss their statistical and machine learning connections and related challenges, and introduce some contemporary and future extensions.
诊断分类测试旨在评估考生对一组技能或属性的离散掌握状态。这类测试在教育和心理测量领域受到越来越多的关注。我们回顾了诊断分类模型及其在测试和学习中的应用,讨论了它们在统计和机器学习方面的联系以及相关挑战,并介绍了一些当代和未来的扩展。
{"title":"Statistical Applications to Cognitive Diagnostic Testing","authors":"Susu Zhang, Jingchen Liu, Zhiliang Ying","doi":"10.1146/annurev-statistics-033021-111803","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033021-111803","url":null,"abstract":"Diagnostic classification tests are designed to assess examinees’ discrete mastery status on a set of skills or attributes. Such tests have gained increasing attention in educational and psychological measurement. We review diagnostic classification models and their applications to testing and learning, discuss their statistical and machine learning connections and related challenges, and introduce some contemporary and future extensions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical Deep Learning for Spatial and Spatiotemporal Data 时空数据的统计深度学习
IF 7.9 1区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-03-09 DOI: 10.1146/annurev-statistics-033021-112628
Christopher K. Wikle, Andrew Zammit-Mangion
Deep neural network models have become ubiquitous in recent years and have been applied to nearly all areas of science, engineering, and industry. These models are particularly useful for data that have strong dependencies in space (e.g., images) and time (e.g., sequences). Indeed, deep models have also been extensively used by the statistical community to model spatial and spatiotemporal data through, for example, the use of multilevel Bayesian hierarchical models and deep Gaussian processes. In this review, we first present an overview of traditional statistical and machine learning perspectives for modeling spatial and spatiotemporal data, and then focus on a variety of hybrid models that have recently been developed for latent process, data, and parameter specifications. These hybrid models integrate statistical modeling ideas with deep neural network models in order to take advantage of the strengths of each modeling paradigm. We conclude by giving an overview of computational technologies that have proven useful for these hybrid models, and with a brief discussion on future research directions.
近年来,深度神经网络模型已经变得无处不在,并且已经应用于几乎所有的科学、工程和工业领域。这些模型对于在空间(例如,图像)和时间(例如,序列)上具有强依赖性的数据特别有用。事实上,深度模型也被统计界广泛用于空间和时空数据的建模,例如,使用多层次贝叶斯分层模型和深度高斯过程。在这篇综述中,我们首先概述了用于空间和时空数据建模的传统统计和机器学习观点,然后重点介绍了最近为潜在过程、数据和参数规范开发的各种混合模型。这些混合模型将统计建模思想与深度神经网络模型相结合,以利用每种建模范式的优势。最后,我们概述了已被证明对这些混合模型有用的计算技术,并简要讨论了未来的研究方向。
{"title":"Statistical Deep Learning for Spatial and Spatiotemporal Data","authors":"Christopher K. Wikle, Andrew Zammit-Mangion","doi":"10.1146/annurev-statistics-033021-112628","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033021-112628","url":null,"abstract":"Deep neural network models have become ubiquitous in recent years and have been applied to nearly all areas of science, engineering, and industry. These models are particularly useful for data that have strong dependencies in space (e.g., images) and time (e.g., sequences). Indeed, deep models have also been extensively used by the statistical community to model spatial and spatiotemporal data through, for example, the use of multilevel Bayesian hierarchical models and deep Gaussian processes. In this review, we first present an overview of traditional statistical and machine learning perspectives for modeling spatial and spatiotemporal data, and then focus on a variety of hybrid models that have recently been developed for latent process, data, and parameter specifications. These hybrid models integrate statistical modeling ideas with deep neural network models in order to take advantage of the strengths of each modeling paradigm. We conclude by giving an overview of computational technologies that have proven useful for these hybrid models, and with a brief discussion on future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 18","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50166556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Statistics and Its Application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1