首页 > 最新文献

The New England Journal of Statistics in Data Science最新文献

英文 中文
Modeling Multivariate Spatial Dependencies Using Graphical Models. 使用图形模型建模多变量空间相关性。
Pub Date : 2023-09-01 Epub Date: 2023-09-06 DOI: 10.51387/23-nejsds47
Debangan Dey, Abhirup Datta, Sudipto Banerjee

Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.

图形模型在空间数据科学中得到了显著的发展和应用,用于对大量时空坐标上引用的数据进行建模。这些文献大多集中在单一或相对较少的空间依赖性结果上。最近的注意力集中在处理大量结果的建模和推理上。虽然空间因子模型和多元基展开在这一领域占据着重要地位,但本文阐述了最近的一种方法,图形高斯过程,该方法利用大量空间过程之间的条件独立性概念,构建可扩展的图形模型,用于对多变量空间数据进行完全基于模型的贝叶斯分析。
{"title":"Modeling Multivariate Spatial Dependencies Using Graphical Models.","authors":"Debangan Dey,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.51387/23-nejsds47","DOIUrl":"https://doi.org/10.51387/23-nejsds47","url":null,"abstract":"<p><p>Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.</p>","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 2","pages":"283-295"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10563032/pdf/nihms-1934371.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41226881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of model space priors on statistical inference with model uncertainty. 模型空间先验对模型不确定性统计推断的影响。
Pub Date : 2023-09-01 Epub Date: 2022-11-16
Anupreet Porwal, Adrian E Raftery

Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.

贝叶斯模型平均法(BMA)为统计推断任务中的模型不确定性提供了一种一致的解释方法。贝叶斯模型平均法需要指定模型空间先验和参数空间先验。在本文中,我们将重点比较存在模型不确定性时的不同模型空间先验。我们考虑了文献中使用的八个参考模型空间先验和 Porwal 和 Raftery [37] 推荐的三个自适应参数先验。我们评估了这些用于线性回归模型变量选择的先验规范组合在参数估计、区间估计、推理、点预测和区间预测等统计任务中的性能。我们基于 14 个真实数据集进行了广泛的模拟研究,这些数据集代表了实践中遇到的各种情况。我们发现,在各种统计任务和数据集中,根据模型大小的先验概率指定的 beta-二叉模型空间先验平均表现最佳,优于统一模型先验。最近提出的复杂性先验表现相对较差。
{"title":"Effect of model space priors on statistical inference with model uncertainty.","authors":"Anupreet Porwal, Adrian E Raftery","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.</p>","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 2","pages":"149-158"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142485094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models 双广义线性Tweedie空间过程模型中的贝叶斯变量选择
Pub Date : 2023-06-19 DOI: 10.51387/23-NEJSDS37
Aritra Halder, Shariq Mohammed, D. Dey
Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.
双广义线性模型为数据建模提供了一个灵活的框架,允许平均值和离散度在观测值之间变化。已知指数色散族的常见成员,包括高斯,泊松,复合泊松-伽马(CP-g),伽马和逆高斯都允许这样的模型。缺乏它们的使用可归因于在大量协变量下模型规范中存在的模糊性以及当数据显示复杂的空间依赖性时出现的复杂性。在这项工作中,我们考虑了具有空间随机效应的CP-g模型的分层规范。空间效应的目标是通过对响应的基于位置的索引产生的数据中的依赖性进行建模来进行不确定性量化。我们关注空间效应的高斯过程规范。同时,我们利用贝叶斯变量选择方法解决了这类模型的模型规范问题。它是通过一个连续的尖峰和板先验对模型参数的影响,特别是固定效应。我们的贡献的新颖之处在于为这些模型开发的贝叶斯框架。我们执行各种合成实验来展示我们框架的准确性。然后将它们应用于分析康涅狄格州2008年的汽车保险费。
{"title":"Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models","authors":"Aritra Halder, Shariq Mohammed, D. Dey","doi":"10.51387/23-NEJSDS37","DOIUrl":"https://doi.org/10.51387/23-NEJSDS37","url":null,"abstract":"Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77699161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses 定量和定性反应实验的贝叶斯d -最优设计
Pub Date : 2023-04-18 DOI: 10.51387/23-nejsds30
Lulu Kang, Xinwei Deng, R. Jin
Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian D-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the D-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local D-optimal designs for given parameter values. Global D-optimal designs are obtained by accumulating the frequencies of the design points in local D-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples.
具有定量和定性响应的系统在许多应用中都广泛遇到。在对这类系统进行实验研究时,需要设计实验方法。经典的实验设计方法在这里并不适用,因为它们通常只关注一种类型的反应。本文提出了一种具有一个连续响应和一个二元响应的实验贝叶斯d -最优设计方法。考虑了未知参数上的非信息先验分布和共轭信息先验分布。所提出的设计准则对于两种响应类型的模型的d -最优性有意义的解释。提出了一种有效的点交换搜索算法来构造给定参数值的局部d -最优设计。全局d -最优设计通过累积局部d -最优设计中设计点的频率得到,其中参数从先验分布中采样。通过两个算例对所提方法的性能进行了评价。
{"title":"Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses","authors":"Lulu Kang, Xinwei Deng, R. Jin","doi":"10.51387/23-nejsds30","DOIUrl":"https://doi.org/10.51387/23-nejsds30","url":null,"abstract":"Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian D-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the D-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local D-optimal designs for given parameter values. Global D-optimal designs are obtained by accumulating the frequencies of the design points in local D-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73674038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian Interim Analysis in Basket Trials 篮子试验中的贝叶斯中期分析
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds48
Cheng Huang, Chenghao Chu, Yimeng Lu, Bingming Yi, Ming-Hui Chen
Basket trials have captured much attention in oncology research in recent years, as advances in health technology have opened up the possibility of classification of patients at the genomic level. Bayesian methods are particularly prevalent in basket trials as the hierarchical structure is adapted to basket trials to allow for information borrowing. In this article, we extend the Bayesian methods to basket trials with treatment and control arms for continuous endpoints, which are often the cases in clinical trials for rare diseases. To account for the imbalance in the covariates which are potentially strong predictors but not stratified in a randomized trial, our models make adjustments for these covariates, and allow different coefficients across baskets. In addition, comparisons are drawn between two-stage design and one-stage design for the four Bayesian methods. Extensive simulation studies are conducted to examine the empirical performance of all models under consideration. A real data analysis is carried out to further demonstrate the usefulness of the Bayesian methods.
近年来,随着卫生技术的进步,在基因组水平上对患者进行分类成为可能,篮子试验在肿瘤学研究中引起了很大的关注。贝叶斯方法在篮子试验中特别普遍,因为层次结构适应篮子试验以允许信息借鉴。在本文中,我们将贝叶斯方法扩展到具有连续终点的治疗和控制臂的篮子试验中,这通常是罕见病临床试验中的情况。为了解释协变量的不平衡,这些协变量是潜在的强预测因子,但在随机试验中没有分层,我们的模型对这些协变量进行了调整,并允许不同篮子的系数不同。此外,对四种贝叶斯方法的两阶段设计和一阶段设计进行了比较。进行了广泛的模拟研究,以检查所考虑的所有模型的经验性能。通过实际数据分析,进一步证明了贝叶斯方法的有效性。
{"title":"Bayesian Interim Analysis in Basket Trials","authors":"Cheng Huang, Chenghao Chu, Yimeng Lu, Bingming Yi, Ming-Hui Chen","doi":"10.51387/23-nejsds48","DOIUrl":"https://doi.org/10.51387/23-nejsds48","url":null,"abstract":"Basket trials have captured much attention in oncology research in recent years, as advances in health technology have opened up the possibility of classification of patients at the genomic level. Bayesian methods are particularly prevalent in basket trials as the hierarchical structure is adapted to basket trials to allow for information borrowing. In this article, we extend the Bayesian methods to basket trials with treatment and control arms for continuous endpoints, which are often the cases in clinical trials for rare diseases. To account for the imbalance in the covariates which are potentially strong predictors but not stratified in a randomized trial, our models make adjustments for these covariates, and allow different coefficients across baskets. In addition, comparisons are drawn between two-stage design and one-stage design for the four Bayesian methods. Extensive simulation studies are conducted to examine the empirical performance of all models under consideration. A real data analysis is carried out to further demonstrate the usefulness of the Bayesian methods.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89692242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data 基于大量缺失数据的交通流异常检测
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds20
Qing He, Charles W. Harrison, Hsin-Hsiung Huang
Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.
异常检测在交通运行和控制中起着重要的作用。由于缺乏大量的数据,时空数据集的缺失阻碍了异常检测算法学习特征规则和模式。本文提出了一种基于高斯过程模型的2021威胁检测算法(ATD)挑战的异常检测方案,该方案生成的特征用于逻辑回归模型,从而对具有较大缺失比例的稀疏交通流数据具有较高的预测精度。该数据集由美国国家科学基金会(NSF)与国家地理空间情报局(NGA)联合提供,由400个传感器从2011年到2020年的数千条标记交通流量记录组成。每个传感器通过NSF和NGA故意下采样,以完全随机模拟缺失,缺失率分别为99%,98%,95%和90%。因此,从稀疏的交通流数据中检测异常是一个挑战。该方案利用一天中不同时间和一周中不同日子的交通模式来恢复完整的数据。该异常检测方案允许在不同传感器上进行并行计算,从而提高了计算效率。该方法是2021年ATD挑战赛中表现最好的两种算法之一。
{"title":"Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data","authors":"Qing He, Charles W. Harrison, Hsin-Hsiung Huang","doi":"10.51387/23-nejsds20","DOIUrl":"https://doi.org/10.51387/23-nejsds20","url":null,"abstract":"Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135420116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Editorial. Modern Bayesian Methods with Applications in Data Science 社论。现代贝叶斯方法及其在数据科学中的应用
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds12edi
Dipak K. Dey, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu
Publisher: New England Statistical Society, Journal: The New England Journal of Statistics in Data Science, Title: Editorial. Modern Bayesian Methods with Applications in Data Science, Authors: Dipak K. Dey, Ming-Hui Chen, Min-ge Xie
出版商:新英格兰统计学会,期刊:《新英格兰数据科学统计杂志》,标题:社论。现代贝叶斯方法在数据科学中的应用,作者:Dey Dipak K.,陈明辉,谢敏阁
{"title":"Editorial. Modern Bayesian Methods with Applications in Data Science","authors":"Dipak K. Dey, Ming-Hui Chen, Min-ge Xie, HaiYing Wang, Jing Wu","doi":"10.51387/23-nejsds12edi","DOIUrl":"https://doi.org/10.51387/23-nejsds12edi","url":null,"abstract":"Publisher: New England Statistical Society, Journal: The New England Journal of Statistics in Data Science, Title: Editorial. Modern Bayesian Methods with Applications in Data Science, Authors: Dipak K. Dey, Ming-Hui Chen, Min-ge Xie","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135495816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis 贝叶斯同时部分包络模型及其在成像遗传学分析中的应用
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds23
Yanbo Shen, Yeonhee Park, Saptarshi Chakraborty, Chunming Zhang
As a prominent dimension reduction method for multivariate linear regression, the envelope model has received increased attention over the past decade due to its modeling flexibility and success in enhancing estimation and prediction efficiencies. Several enveloping approaches have been proposed in the literature; among these, the partial response envelope model [57] that focuses on only enveloping the coefficients for predictors of interest, and the simultaneous envelope model [14] that combines the predictor and the response envelope models within a unified modeling framework, are noteworthy. In this article we incorporate these two approaches within a Bayesian framework, and propose a novel Bayesian simultaneous partial envelope model that generalizes and addresses some limitations of the two approaches. Our method offers the flexibility of incorporating prior information if available, and aids coherent quantification of all modeling uncertainty through the posterior distribution of model parameters. A block Metropolis-within-Gibbs algorithm for Markov chain Monte Carlo (MCMC) sampling from the posterior is developed. The utility of our model is corroborated by theoretical results, comprehensive simulations, and a real imaging genetics data application for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.
包络模型作为一种重要的多元线性回归降维方法,由于其建模的灵活性和在提高估计和预测效率方面的成功,在过去的十年中受到越来越多的关注。文献中提出了几种包络方法;其中,部分响应包络模型[57]侧重于只包络感兴趣的预测因子的系数,同时包络模型[14]将预测因子和响应包络模型结合在一个统一的建模框架内,值得注意。在本文中,我们将这两种方法合并到贝叶斯框架中,并提出了一种新的贝叶斯同时部分包络模型,该模型概括并解决了这两种方法的一些局限性。我们的方法提供了结合先验信息的灵活性,并通过模型参数的后验分布帮助所有建模不确定性的连贯量化。提出了一种用于马尔科夫链蒙特卡罗(MCMC)后验抽样的块metropolis - in- gibbs算法。理论结果、综合模拟和阿尔茨海默病神经影像学倡议(ADNI)研究的真实成像遗传学数据应用证实了我们模型的实用性。
{"title":"Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis","authors":"Yanbo Shen, Yeonhee Park, Saptarshi Chakraborty, Chunming Zhang","doi":"10.51387/23-nejsds23","DOIUrl":"https://doi.org/10.51387/23-nejsds23","url":null,"abstract":"As a prominent dimension reduction method for multivariate linear regression, the envelope model has received increased attention over the past decade due to its modeling flexibility and success in enhancing estimation and prediction efficiencies. Several enveloping approaches have been proposed in the literature; among these, the partial response envelope model [57] that focuses on only enveloping the coefficients for predictors of interest, and the simultaneous envelope model [14] that combines the predictor and the response envelope models within a unified modeling framework, are noteworthy. In this article we incorporate these two approaches within a Bayesian framework, and propose a novel Bayesian simultaneous partial envelope model that generalizes and addresses some limitations of the two approaches. Our method offers the flexibility of incorporating prior information if available, and aids coherent quantification of all modeling uncertainty through the posterior distribution of model parameters. A block Metropolis-within-Gibbs algorithm for Markov chain Monte Carlo (MCMC) sampling from the posterior is developed. The utility of our model is corroborated by theoretical results, comprehensive simulations, and a real imaging genetics data application for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74033047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Highest Posterior Model Computation and Variable Selection via Simulated Annealing 基于模拟退火的最高后验模型计算和变量选择
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds40
A. Maity, S. Basu
Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.
变量选择广泛应用于数据分析的所有应用领域,从大规模微阵列研究中的基因优化选择,到癌症基因组学中靶向治疗的生物标志物的优化选择,再到商业分析中最佳预测因子的选择。在贝叶斯方法下进行这种选择的一种正式方法是选择具有最高后验概率的模型。该问题可以看作是模型空间上的优化问题,其目标函数是模型的后验概率。我们建议使用模拟退火来进行这种优化,并说明了它在高维问题中的可行性。通过各种仿真研究,证明了这种新方法的有效性。给出了理论依据,并讨论了在高维数据集上的应用。所提出的方法在R包sahpm中实现,以供一般使用,并在R CRAN上提供。
{"title":"Highest Posterior Model Computation and Variable Selection via Simulated Annealing","authors":"A. Maity, S. Basu","doi":"10.51387/23-nejsds40","DOIUrl":"https://doi.org/10.51387/23-nejsds40","url":null,"abstract":"Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80490988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models 基于算法的交叉与干扰模型的最优高效精确实验设计
Pub Date : 2023-01-01 DOI: 10.51387/23-nejsds41
S. Hao, Min Yang, Weiwei Zheng
The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by [1]. It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.
交叉模型和干扰模型常用于临床试验、农业研究、社会研究等领域。虽然一些理论上的最优性结果是可用的,但在实践中应用这些结果仍然具有挑战性。由于精确优化设计的复杂性,现有的理论结果通常需要一些特定的处理次数(t),周期(p)和主题(n)的组合。更灵活的方法是基于近似设计理论中的理论构建整数规划,可以处理$(t,p,n)$的一般情况。尽管如此,这些结果通常是针对特定的模型或设计问题而得出的,并且需要为新的问题做出新的努力。这些障碍使得理论结果的应用相当困难。本文提出了一种新的算法,对最优权值交换算法进行了修正[1]。它在各种情况下,针对不同的最优性准则、不同的感兴趣参数、不同的$(t,p,n)$配置以及任意退出场景,快速提供高效的交叉设计。为了方便我们的算法的使用,我们开发了相应的R包和一个R Shiny应用程序,作为一个更友好的用户界面。
{"title":"Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models","authors":"S. Hao, Min Yang, Weiwei Zheng","doi":"10.51387/23-nejsds41","DOIUrl":"https://doi.org/10.51387/23-nejsds41","url":null,"abstract":"The crossover models and interference models are frequently used in clinical trials, agriculture studies, social studies, etc. While some theoretical optimality results are available, it is still challenging to apply these results in practice. The available theoretical results, due to the complexity of exact optimal designs, typically require some specific combinations of the number of treatments (t), periods (p), and subjects (n). A more flexible method is to build integer programming based on theories in approximate design theory, which can handle general cases of $(t,p,n)$. Nonetheless, those results are generally derived for specific models or design problems and new efforts are needed for new problems. These obstacles make the application of the theoretical results rather difficult. Here we propose a new algorithm, a revision of the optimal weight exchange algorithm by [1]. It provides efficient crossover designs quickly under various situations, for different optimality criteria, different parameters of interest, different configurations of $(t,p,n)$, as well as arbitrary dropout scenarios. To facilitate the usage of our algorithm, the corresponding R package and an R Shiny app as a more user-friendly interface has been developed.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87236279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
The New England Journal of Statistics in Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1