首页 > 最新文献

The New England Journal of Statistics in Data Science最新文献

英文 中文
Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks 深度神经网络超参数整定设计评价
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds26
Chenlu Shi, Ashley Kathleen Chiu, Hongquan Xu
The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.
学习技术的性能很大程度上依赖于超参数设置。它需要深度学习技术的超参数调优,对于复杂的学习技术来说,这可能在计算上过于昂贵。因此,需要快速探索超参数与由这些超参数控制的学习技术的性能之间的关系,因此需要考虑设计策略来有效地收集信息数据。为此目的可以考虑各种设计。那么自然就会出现使用哪种设计的问题。在本文中,我们研究了不同类型的设计在有效收集信息数据方面的使用,以研究测试精度的表面,这是超参数学习技术性能的衡量标准。在我们考虑的设置下,我们发现强正交阵列优于所有其他可比设计。
{"title":"Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks","authors":"Chenlu Shi, Ashley Kathleen Chiu, Hongquan Xu","doi":"10.51387/23-nejsds26","DOIUrl":"https://doi.org/10.51387/23-nejsds26","url":null,"abstract":"The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85137668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Particle Swarm Optimization for Finding Efficient Longitudinal Exact Designs for Nonlinear Models 非线性模型纵向精确设计的粒子群算法
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds45
Ping-Yang Chen, Ray‐Bing Chen, W. Wong
Designing longitudinal studies is generally a very challenging problem because of the complex optimization problems. We show the popular nature-inspired metaheuristic algorithm, Particle Swarm Optimization (PSO), can find different types of optimal exact designs for longitudinal studies with different correlation structures for different types of models. In particular, we demonstrate PSO-generated D-optimal longitudinal studies for the widely used Michaelis-Menten model with various correlation structures agree with the reported analytically derived locally D-optimal designs in the literature when there are only 2 observations per subject, and their numerical D-optimal designs when there are 3 and 4 observations per subject. We further show the usefulness of PSO by applying it to generate new locally D-optimal designs to estimate model parameters when there are 5 or more observations per subject. Additionally, we find various optimal longitudinal designs for a growth curve model commonly used in animal studies and for a nonlinear HIV dynamic model for studying T-cells in AIDS subjects. In particular, c-optimal exact designs for estimating one or more functions of model parameters (c-optimality) were found, along with other types of multiple objectives optimal designs.
由于复杂的优化问题,设计纵向研究通常是一个非常具有挑战性的问题。我们证明了流行的自然启发的元启发式算法粒子群优化(PSO)可以为不同类型的模型找到具有不同关联结构的不同类型的纵向研究的最优精确设计。特别是,我们证明了pso生成的具有各种相关结构的广泛使用的Michaelis-Menten模型的d -最优纵向研究与文献中报道的解析导出的局部d -最优设计一致,当每个受试者只有2个观测值时,以及当每个受试者有3和4个观测值时,它们的数值d -最优设计。当每个受试者有5个或更多的观测值时,我们将PSO应用于生成新的局部d -最优设计来估计模型参数,从而进一步证明了PSO的实用性。此外,我们还发现了用于动物研究的生长曲线模型和用于研究艾滋病受试者t细胞的非线性HIV动态模型的各种最佳纵向设计。特别是c-最优精确设计,用于估计模型参数的一个或多个函数(c-最优性),以及其他类型的多目标优化设计。
{"title":"Particle Swarm Optimization for Finding Efficient Longitudinal Exact Designs for Nonlinear Models","authors":"Ping-Yang Chen, Ray‐Bing Chen, W. Wong","doi":"10.51387/23-nejsds45","DOIUrl":"https://doi.org/10.51387/23-nejsds45","url":null,"abstract":"Designing longitudinal studies is generally a very challenging problem because of the complex optimization problems. We show the popular nature-inspired metaheuristic algorithm, Particle Swarm Optimization (PSO), can find different types of optimal exact designs for longitudinal studies with different correlation structures for different types of models. In particular, we demonstrate PSO-generated D-optimal longitudinal studies for the widely used Michaelis-Menten model with various correlation structures agree with the reported analytically derived locally D-optimal designs in the literature when there are only 2 observations per subject, and their numerical D-optimal designs when there are 3 and 4 observations per subject. We further show the usefulness of PSO by applying it to generate new locally D-optimal designs to estimate model parameters when there are 5 or more observations per subject. Additionally, we find various optimal longitudinal designs for a growth curve model commonly used in animal studies and for a nonlinear HIV dynamic model for studying T-cells in AIDS subjects. In particular, c-optimal exact designs for estimating one or more functions of model parameters (c-optimality) were found, along with other types of multiple objectives optimal designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84998503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Not-so-radical Rejoinder: Habituate Systems Thinking and Data (Science) Confession for Quality Enhancement 一个不太激进的反驳:习惯系统思维和数据(科学)忏悔质量提高
Pub Date : 2023-01-01 DOI: 10.51387/22-nejsds6rej
Xiao Meng
{"title":"A Not-so-radical Rejoinder: Habituate Systems Thinking and Data (Science) Confession for Quality Enhancement","authors":"Xiao Meng","doi":"10.51387/22-nejsds6rej","DOIUrl":"https://doi.org/10.51387/22-nejsds6rej","url":null,"abstract":"","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73337670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates 存在观测协变量的个性化决策控制实验优化设计
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds22
Yezhuo Li, Qiong Zhang, A. Khademi, Boshi Yang
Controlled experiments are widely applied in many areas such as clinical trials or user behavior studies in IT companies. Recently, it is popular to study experimental design problems to facilitate personalized decision making. In this paper, we investigate the problem of optimal design of multiple treatment allocation for personalized decision making in the presence of observational covariates associated with experimental units (often, patients or users). We assume that the response of a subject assigned to a treatment follows a linear model which includes the interaction between covariates and treatments to facilitate precision decision making. We define the optimal objective as the maximum variance of estimated personalized treatment effects over different treatments and different covariates values. The optimal design is obtained by minimizing this objective. Under a semi-definite program reformulation of the original optimization problem, we use a YALMIP and MOSEK based optimization solver to provide the optimal design. Numerical studies are provided to assess the quality of the optimal design.
对照实验广泛应用于临床试验或IT公司的用户行为研究等许多领域。近年来,研究实验设计问题以促进个性化决策成为一种流行趋势。在本文中,我们研究了在与实验单位(通常是患者或使用者)相关的观察性协变量存在的情况下,用于个性化决策的多重治疗分配的优化设计问题。我们假设分配给治疗的受试者的反应遵循线性模型,其中包括协变量和治疗之间的相互作用,以促进精确决策。我们将最优目标定义为不同治疗方法和不同协变量值的估计个性化治疗效果的最大方差。通过最小化这一目标得到最优设计。在对原优化问题进行半确定程序重构的情况下,利用基于YALMIP和MOSEK的优化求解器进行优化设计。通过数值研究来评估优化设计的质量。
{"title":"Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates","authors":"Yezhuo Li, Qiong Zhang, A. Khademi, Boshi Yang","doi":"10.51387/23-nejsds22","DOIUrl":"https://doi.org/10.51387/23-nejsds22","url":null,"abstract":"Controlled experiments are widely applied in many areas such as clinical trials or user behavior studies in IT companies. Recently, it is popular to study experimental design problems to facilitate personalized decision making. In this paper, we investigate the problem of optimal design of multiple treatment allocation for personalized decision making in the presence of observational covariates associated with experimental units (often, patients or users). We assume that the response of a subject assigned to a treatment follows a linear model which includes the interaction between covariates and treatments to facilitate precision decision making. We define the optimal objective as the maximum variance of estimated personalized treatment effects over different treatments and different covariates values. The optimal design is obtained by minimizing this objective. Under a semi-definite program reformulation of the original optimization problem, we use a YALMIP and MOSEK based optimization solver to provide the optimal design. Numerical studies are provided to assess the quality of the optimal design.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"217 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73630471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Seamless Clinical Trials with Doubly Adaptive Biased Coin Designs 双自适应偏置硬币设计的无缝临床试验
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds25
Hongjian Zhu, Jun Yu, D. Lai, Li Wang
In addition to scientific questions, clinical trialists often explore or require other design features, such as increasing the power while controlling the type I error rate, minimizing unnecessary exposure to inferior treatments, and comparing multiple treatments in one clinical trial. We propose implementing adaptive seamless design (ASD) with response adaptive randomization (RAR) to satisfy various clinical trials’ design objectives. However, the combination of ASD and RAR poses a challenge in controlling the type I error rate. In this paper, we investigated how to utilize the advantages of the two adaptive methods and control the type I error rate. We offered the theoretical foundation for this procedure. Numerical studies demonstrated that our methods could achieve efficient and ethical objectives while controlling the type I error rate.
除了科学问题外,临床试验人员还经常探索或要求其他设计特征,例如在控制I型错误率的同时增加功率,尽量减少不必要的劣质治疗,以及在一项临床试验中比较多种治疗。我们提出采用响应自适应随机化(RAR)的自适应无缝设计(ASD)来满足各种临床试验的设计目标。然而,ASD和RAR的结合在控制I型错误率方面提出了挑战。在本文中,我们研究了如何利用这两种自适应方法的优点,控制I型错误率。为这一过程提供了理论基础。数值研究表明,我们的方法可以在控制I型错误率的同时达到高效和道德的目标。
{"title":"Seamless Clinical Trials with Doubly Adaptive Biased Coin Designs","authors":"Hongjian Zhu, Jun Yu, D. Lai, Li Wang","doi":"10.51387/23-nejsds25","DOIUrl":"https://doi.org/10.51387/23-nejsds25","url":null,"abstract":"In addition to scientific questions, clinical trialists often explore or require other design features, such as increasing the power while controlling the type I error rate, minimizing unnecessary exposure to inferior treatments, and comparing multiple treatments in one clinical trial. We propose implementing adaptive seamless design (ASD) with response adaptive randomization (RAR) to satisfy various clinical trials’ design objectives. However, the combination of ASD and RAR poses a challenge in controlling the type I error rate. In this paper, we investigated how to utilize the advantages of the two adaptive methods and control the type I error rate. We offered the theoretical foundation for this procedure. Numerical studies demonstrated that our methods could achieve efficient and ethical objectives while controlling the type I error rate.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81634857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inaugural Editorial. Can We Achieve Our Mission: Fast, Accessible, Cutting-edge, and Top-quality? 就职社论。我们能否实现我们的使命:快速、便捷、前沿、高品质?
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds11edi
Colin O. Wu, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu
We are pleased to launch the first issue of the New England Journal of Statistics in Data Science (NEJSDS). NEJSDS is the official journal of the New England Statistical Society (NESS) under the leadership of Vice President for Journal and Publication and sponsored by the College of Liberal Arts and Sciences, University of Connecticut. The aims of the journal are to serve as an interface between statistics and other disciplines in data science, to encourage researchers to exchange innovative ideas, and to promote data science methods to the general scientific community. The journal publishes high quality original research, novel applications, and timely review articles in all aspects of data science, including all areas of statistical methodology, methods of machine learning, and artificial intelligence, novel algorithms, computational methods, data management and manipulation, applications of data science methods, among others. We encourage authors to submit collaborative work driven by real life problems posed by researchers, administrators, educators, or other stakeholders, and which require original and innovative solutions from data scientists.
我们很高兴推出第一期新英格兰数据科学统计杂志(NEJSDS)。NEJSDS是新英格兰统计学会(NESS)的官方期刊,由康涅狄格大学文理学院主办,由杂志和出版副主席领导。该杂志的目标是充当统计学与数据科学其他学科之间的接口,鼓励研究人员交流创新思想,并向一般科学界推广数据科学方法。该杂志在数据科学的各个方面发表高质量的原创研究,新颖的应用和及时的评论文章,包括统计方法学的所有领域,机器学习方法,人工智能,新颖算法,计算方法,数据管理和操作,数据科学方法的应用等。我们鼓励作者提交由研究人员、管理人员、教育工作者或其他利益相关者提出的现实生活问题驱动的协作工作,这些问题需要数据科学家提供原创和创新的解决方案。
{"title":"Inaugural Editorial. Can We Achieve Our Mission: Fast, Accessible, Cutting-edge, and Top-quality?","authors":"Colin O. Wu, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu","doi":"10.51387/23-nejsds11edi","DOIUrl":"https://doi.org/10.51387/23-nejsds11edi","url":null,"abstract":"We are pleased to launch the first issue of the New England Journal of Statistics in Data Science (NEJSDS). NEJSDS is the official journal of the New England Statistical Society (NESS) under the leadership of Vice President for Journal and Publication and sponsored by the College of Liberal Arts and Sciences, University of Connecticut. The aims of the journal are to serve as an interface between statistics and other disciplines in data science, to encourage researchers to exchange innovative ideas, and to promote data science methods to the general scientific community. The journal publishes high quality original research, novel applications, and timely review articles in all aspects of data science, including all areas of statistical methodology, methods of machine learning, and artificial intelligence, novel algorithms, computational methods, data management and manipulation, applications of data science methods, among others. We encourage authors to submit collaborative work driven by real life problems posed by researchers, administrators, educators, or other stakeholders, and which require original and innovative solutions from data scientists.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84686155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering-Based Imputation for Dropout Buyers in Large-Scale Online Experimentation 大规模在线实验中退出购买者的聚类插值
Pub Date : 2022-09-09 DOI: 10.51387/23-nejsds33
Sumin Shen, Huiying Mao, Zezhong Zhang, Zili Chen, Keyu Nie, Xinwei Deng
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using k-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.
在在线实验中,适当的度量标准(例如,购买)为支持假设和增强决策过程提供了强有力的证据。然而,在线实验中经常出现不完整的度量,使得可用数据比计划的在线实验(例如,A/B测试)少得多。在这项工作中,我们引入了辍学买家的概念,并将具有不完整度量值的用户分为两组:访问者和辍学买家。对于不完全度量的分析,我们提出了一种基于聚类的k近邻插值方法。我们提出的imputation方法既考虑了实验特定的特征,也考虑了用户在购物路径上的活动,允许不同的用户使用不同的imputation值。为了方便在线实验中大规模数据集的有效输入,该方法采用分层和聚类相结合的方法。在仿真研究和eBay的实际在线实验中,将该方法的性能与几种传统方法进行了比较。
{"title":"Clustering-Based Imputation for Dropout Buyers in Large-Scale Online Experimentation","authors":"Sumin Shen, Huiying Mao, Zezhong Zhang, Zili Chen, Keyu Nie, Xinwei Deng","doi":"10.51387/23-nejsds33","DOIUrl":"https://doi.org/10.51387/23-nejsds33","url":null,"abstract":"In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using k-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximate Confidence Distribution Computing 近似置信分布计算
Pub Date : 2022-06-03 DOI: 10.51387/23-nejsds38
S. Thornton, Wentao Li, Min‐ge Xie
Approximate confidence distribution computing (ACDC) offers a new take on the rapidly developing field of likelihood-free inference from within a frequentist framework. The appeal of this computational method for statistical inference hinges upon the concept of a confidence distribution, a special type of estimator which is defined with respect to the repeated sampling principle. An ACDC method provides frequentist validation for computational inference in problems with unknown or intractable likelihoods. The main theoretical contribution of this work is the identification of a matching condition necessary for frequentist validity of inference from this method. In addition to providing an example of how a modern understanding of confidence distribution theory can be used to connect Bayesian and frequentist inferential paradigms, we present a case to expand the current scope of so-called approximate Bayesian inference to include non-Bayesian inference by targeting a confidence distribution rather than a posterior. The main practical contribution of this work is the development of a data-driven approach to drive ACDC in both Bayesian or frequentist contexts. The ACDC algorithm is data-driven by the selection of a data-dependent proposal function, the structure of which is quite general and adaptable to many settings. We explore three numerical examples that both verify the theoretical arguments in the development of ACDC and suggest instances in which ACDC outperform approximate Bayesian computing methods computationally.
近似置信分布计算(ACDC)在频率论框架下为快速发展的无似然推理领域提供了一种新的思路。这种计算方法对统计推断的吸引力在于置信分布的概念,这是一种特殊类型的估计量,它是根据重复抽样原则定义的。ACDC方法为未知或难处理似然问题的计算推理提供了频率验证。这项工作的主要理论贡献是确定了从该方法推断的频率有效性所必需的匹配条件。除了提供一个例子,说明如何使用现代的信心分布理论来连接贝叶斯和频率论推理范式,我们还提出了一个案例,以扩大所谓的近似贝叶斯推理的当前范围,通过针对信心分布而不是后验来包括非贝叶斯推理。这项工作的主要实际贡献是开发了一种数据驱动的方法来驱动贝叶斯或频率上下文中的ACDC。ACDC算法是通过选择一个与数据相关的提议函数来实现数据驱动的,该提议函数的结构具有相当的通用性和适应性。我们探索了三个数值例子,既验证了ACDC发展中的理论论点,又提出了ACDC在计算上优于近似贝叶斯计算方法的实例。
{"title":"Approximate Confidence Distribution Computing","authors":"S. Thornton, Wentao Li, Min‐ge Xie","doi":"10.51387/23-nejsds38","DOIUrl":"https://doi.org/10.51387/23-nejsds38","url":null,"abstract":"Approximate confidence distribution computing (ACDC) offers a new take on the rapidly developing field of likelihood-free inference from within a frequentist framework. The appeal of this computational method for statistical inference hinges upon the concept of a confidence distribution, a special type of estimator which is defined with respect to the repeated sampling principle. An ACDC method provides frequentist validation for computational inference in problems with unknown or intractable likelihoods. The main theoretical contribution of this work is the identification of a matching condition necessary for frequentist validity of inference from this method. In addition to providing an example of how a modern understanding of confidence distribution theory can be used to connect Bayesian and frequentist inferential paradigms, we present a case to expand the current scope of so-called approximate Bayesian inference to include non-Bayesian inference by targeting a confidence distribution rather than a posterior. The main practical contribution of this work is the development of a data-driven approach to drive ACDC in both Bayesian or frequentist contexts. The ACDC algorithm is data-driven by the selection of a data-dependent proposal function, the structure of which is quite general and adaptable to many settings. We explore three numerical examples that both verify the theoretical arguments in the development of ACDC and suggest instances in which ACDC outperform approximate Bayesian computing methods computationally.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89760848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalable Marginalization of Correlated Latent Variables with Applications to Learning Particle Interaction Kernels 相关潜变量的可扩展边缘化及其在粒子相互作用核学习中的应用
Pub Date : 2022-03-16 DOI: 10.51387/22-nejsds13
Mengyang Gu, Xubo Liu, X. Fang, Sui Tang
Marginalization of latent variables or nuisance parameters is a fundamental aspect of Bayesian inference and uncertainty quantification. In this work, we focus on scalable marginalization of latent variables in modeling correlated data, such as spatio-temporal or functional observations. We first introduce Gaussian processes (GPs) for modeling correlated data and highlight the computational challenge, where the computational complexity increases cubically fast along with the number of observations. We then review the connection between the state space model and GPs with Matérn covariance for temporal inputs. The Kalman filter and Rauch-Tung-Striebel smoother were introduced as a scalable marginalization technique for computing the likelihood and making predictions of GPs without approximation. We introduce recent efforts on extending the scalable marginalization idea to the linear model of coregionalization for multivariate correlated output and spatio-temporal observations. In the final part of this work, we introduce a novel marginalization technique to estimate interaction kernels and forecast particle trajectories. The computational progress lies in the sparse representation of the inverse covariance matrix of the latent variables, then applying conjugate gradient for improving predictive accuracy with large data sets. The computational advances achieved in this work outline a wide range of applications in molecular dynamic simulation, cellular migration, and agent-based models.
潜在变量或有害参数的边缘化是贝叶斯推理和不确定性量化的一个基本方面。在这项工作中,我们专注于建模相关数据(如时空或功能观测)中潜在变量的可扩展边缘化。我们首先引入高斯过程(GPs)来建模相关数据,并强调计算挑战,其中计算复杂性随着观测数量的增加而快速增加。然后,我们回顾了状态空间模型和具有时间输入mat协方差的GPs之间的联系。引入卡尔曼滤波和Rauch-Tung-Striebel平滑作为一种可扩展的边缘化技术,用于计算gp的可能性并在没有近似的情况下进行预测。我们介绍了将可扩展边缘化思想扩展到多变量相关输出和时空观测的共区域化线性模型的最新研究成果。在本文的最后,我们介绍了一种新的边缘化技术来估计相互作用核和预测粒子轨迹。计算的进步在于对潜变量的协方差逆矩阵进行稀疏表示,然后应用共轭梯度来提高大数据集的预测精度。在这项工作中取得的计算进步概述了分子动力学模拟,细胞迁移和基于代理的模型的广泛应用。
{"title":"Scalable Marginalization of Correlated Latent Variables with Applications to Learning Particle Interaction Kernels","authors":"Mengyang Gu, Xubo Liu, X. Fang, Sui Tang","doi":"10.51387/22-nejsds13","DOIUrl":"https://doi.org/10.51387/22-nejsds13","url":null,"abstract":"Marginalization of latent variables or nuisance parameters is a fundamental aspect of Bayesian inference and uncertainty quantification. In this work, we focus on scalable marginalization of latent variables in modeling correlated data, such as spatio-temporal or functional observations. We first introduce Gaussian processes (GPs) for modeling correlated data and highlight the computational challenge, where the computational complexity increases cubically fast along with the number of observations. We then review the connection between the state space model and GPs with Matérn covariance for temporal inputs. The Kalman filter and Rauch-Tung-Striebel smoother were introduced as a scalable marginalization technique for computing the likelihood and making predictions of GPs without approximation. We introduce recent efforts on extending the scalable marginalization idea to the linear model of coregionalization for multivariate correlated output and spatio-temporal observations. In the final part of this work, we introduce a novel marginalization technique to estimate interaction kernels and forecast particle trajectories. The computational progress lies in the sparse representation of the inverse covariance matrix of the latent variables, then applying conjugate gradient for improving predictive accuracy with large data sets. The computational advances achieved in this work outline a wide range of applications in molecular dynamic simulation, cellular migration, and agent-based models.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87968223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram,” by Xiao-Li Meng 评论小李b孟的《方差翻倍,贝叶斯变脏,河豚变大,画小孩图
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds6b
T. Junk
This contribution is a series of comments on Prof. Xiao-Li Meng’s article, “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram”. Prof. Meng’s article offers some radical proposals and not-so-radical proposals to improve the quality of statistical inference used in the sciences and also to extend distributional thinking to early education. Discussions and alternative proposals are presented.
这篇文章是对孟晓丽教授的文章《方差翻倍,贝叶斯变脏,河豚变大,画小孩图》的系列评论。孟教授的文章提出了一些激进的建议和不那么激进的建议,以提高科学中使用的统计推断的质量,并将分布思维扩展到早期教育。提出了讨论和备选建议。
{"title":"Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram,” by Xiao-Li Meng","authors":"T. Junk","doi":"10.51387/22-nejsds6b","DOIUrl":"https://doi.org/10.51387/22-nejsds6b","url":null,"abstract":"This contribution is a series of comments on Prof. Xiao-Li Meng’s article, “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram”. Prof. Meng’s article offers some radical proposals and not-so-radical proposals to improve the quality of statistical inference used in the sciences and also to extend distributional thinking to early education. Discussions and alternative proposals are presented.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1997 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82485810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
The New England Journal of Statistics in Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1