Annals of Applied Statistics最新文献_第6页

Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics. 空间蛋白质组学的半监督非参数贝叶斯建模

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-12-01 DOI: 10.1214/22-AOAS1603

Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk

Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.

了解亚细胞蛋白质定位是分析特定环境蛋白质功能的重要组成部分。定量质谱分析（MS）技术的最新进展，已将数千种蛋白质高分辨率地绘制到细胞内的亚细胞位置。因此有必要采用新的建模方法来捕捉这些数据的复杂性质。我们在非参数贝叶斯框架下，利用高斯过程回归模型的 K 分量混合物来分析空间蛋白质组学数据。高斯过程回归模型考虑了亚细胞龛内的相关结构，每个混合物成分捕捉每个龛内观察到的不同相关结构。标记蛋白质（即具有先验已知标记位置的蛋白质）的可用性促使我们采用半监督学习方法为高斯过程超参数提供信息。此外，我们还为我们的模型提供了一个高效的哈密顿-内-吉布斯采样器（Hamiltonian-within-Gibbs sampler）。此外，我们还利用协方差矩阵的结构，减轻了与协方差矩阵反演相关的计算负担。通过对协方差矩阵进行张量分解，可以应用扩展的 Trench 和 Durbin 算法来降低反演的计算复杂度，从而加快计算速度。我们提供了果蝇胚胎和小鼠多能胚胎干细胞的详细案例研究，以说明半监督功能贝叶斯数据建模的好处。

{"title":"Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.","authors":"Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk","doi":"10.1214/22-AOAS1603","DOIUrl":"10.1214/22-AOAS1603","url":null,"abstract":"Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7613899/pdf/EMS143956.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9155886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AN OMNIBUS TEST FOR DETECTION OF SUBGROUP TREATMENT EFFECTS VIA DATA PARTITIONING. 通过数据分区检测亚组治疗效果的综合测试。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-12-01 Epub Date: 2022-09-26 DOI: 10.1214/21-AOAS1589

Yifei Sun, Xuming He, Jianhua Hu

Late-stage clinical trials have been conducted primarily to establish the efficacy of a new treatment in an intended population. A corollary of population heterogeneity in clinical trials is that a treatment might be effective for one or more subgroups, rather than for the whole population of interest. As an example, the phase III clinical trial of panitumumab in metastatic colorectal cancer patients failed to demonstrate its efficacy in the overall population, but a subgroup associated with tumor KRAS status was found to be promising (Peeters et al. (Am. J. Clin. Oncol. 28 (2010) 4706-4713)). As we search for such subgroups via data partitioning based on a large number of biomarkers, we need to guard against inflated type I error rates due to multiple testing. Commonly-used multiplicity adjustments tend to lose power for the detection of subgroup treatment effects. We develop an effective omnibus test to detect the existence of, at least, one subgroup treatment effect, allowing a large number of possible subgroups to be considered and possibly censored outcomes. Applied to the panitumumab trial data, the proposed test would confirm a significant subgroup treatment effect. Empirical studies also show that the proposed test is applicable to a variety of outcome variables and maintains robust statistical power.

后期临床试验主要是为了确定一种新疗法在目标人群中的疗效。临床试验中人群异质性的一个必然结果是，一种治疗方法可能对一个或多个亚组有效，而不是对整个相关人群有效。例如，帕尼单抗在转移性结直肠癌患者中的 III 期临床试验未能证明其在总体人群中的疗效，但发现与肿瘤 KRAS 状态相关的一个亚组很有希望（Peeters 等（Am.J. Clin.Oncol.28 (2010) 4706-4713)).当我们通过基于大量生物标记物的数据分区来寻找此类亚组时，我们需要防止因多重检验而导致的I型错误率升高。常用的多重性调整往往会失去检测亚组治疗效应的能力。我们开发了一种有效的综合测试来检测是否存在至少一种亚组治疗效应，允许考虑大量可能的亚组和可能的删减结果。将该检验方法应用于帕尼单抗试验数据，可确认存在显著的亚组治疗效应。实证研究还表明，建议的检验适用于各种结果变量，并能保持强大的统计能力。

{"title":"AN OMNIBUS TEST FOR DETECTION OF SUBGROUP TREATMENT EFFECTS VIA DATA PARTITIONING.","authors":"Yifei Sun, Xuming He, Jianhua Hu","doi":"10.1214/21-AOAS1589","DOIUrl":"10.1214/21-AOAS1589","url":null,"abstract":"Late-stage clinical trials have been conducted primarily to establish the efficacy of a new treatment in an intended population. A corollary of population heterogeneity in clinical trials is that a treatment might be effective for one or more subgroups, rather than for the whole population of interest. As an example, the phase III clinical trial of panitumumab in metastatic colorectal cancer patients failed to demonstrate its efficacy in the overall population, but a subgroup associated with tumor KRAS status was found to be promising (Peeters et al. (Am. J. Clin. Oncol. 28 (2010) 4706-4713)). As we search for such subgroups via data partitioning based on a large number of biomarkers, we need to guard against inflated type I error rates due to multiple testing. Commonly-used multiplicity adjustments tend to lose power for the detection of subgroup treatment effects. We develop an effective omnibus test to detect the existence of, at least, one subgroup treatment effect, allowing a large number of possible subgroups to be considered and possibly censored outcomes. Applied to the panitumumab trial data, the proposed test would confirm a significant subgroup treatment effect. Empirical studies also show that the proposed test is applicable to a variety of outcome variables and maintains robust statistical power.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2266-2278"},"PeriodicalIF":1.8,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381789/pdf/nihms-1919024.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9973657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CRITICAL WINDOW VARIABLE SELECTION FOR MIXTURES: ESTIMATING THE IMPACT OF MULTIPLE AIR POLLUTANTS ON STILLBIRTH. 混合物的关键窗口变量选择:估计多种空气污染物对死产的影响。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 DOI: 10.1214/21-aoas1560

Joshua L Warren, Howard H Chang, Lauren K Warren, Matthew J Strickland, Lyndsey A Darrow, James A Mulholland

Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable subpopulations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend critical window variable selection (CWVS) to the multipollutant setting by introducing CWVS for mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to: (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM_2.5 (week 2), and sulfate (week 20). The method is available in the R package CWVSmix.

了解时变污染混合物对人类健康的作用至关重要，因为人们在其一生中同时暴露于多种污染物。对于有明确暴露期的脆弱亚人群(如孕妇)，关于暴露于这些混合物的关键窗口期的问题对于减轻危害很重要。我们通过引入混合的临界窗口变量选择(CWVSmix)，将临界窗口变量选择(CWVS)扩展到多污染物设置，这是一种分层贝叶斯方法，结合了平滑变量选择和时间相关的权重参数，以便:(i)确定接触时变污染物混合物的关键窗口，(ii)估计每种污染物的时变相对重要性及其在混合物中的一级相互作用，以及(iii)量化混合物对健康的影响。通过仿真，我们表明CWVSmix在这些类别中提供了与竞争方法相比的最佳性能平衡。使用这些方法，我们调查了2005-2014年新泽西州暴露于多种环境空气污染物对死产风险的影响。我们发现非西班牙裔黑人母亲在妊娠2、16-17和20周的风险持续升高，污染混合物主要是铵(第2、17、20周)、硝酸盐(第2、17周)、氮氧化物(第2、16周)、PM2.5(第2周)和硫酸盐(第20周)。该方法在R包CWVSmix中可用。

{"title":"CRITICAL WINDOW VARIABLE SELECTION FOR MIXTURES: ESTIMATING THE IMPACT OF MULTIPLE AIR POLLUTANTS ON STILLBIRTH.","authors":"Joshua L Warren, Howard H Chang, Lauren K Warren, Matthew J Strickland, Lyndsey A Darrow, James A Mulholland","doi":"10.1214/21-aoas1560","DOIUrl":"https://doi.org/10.1214/21-aoas1560","url":null,"abstract":"Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable subpopulations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend critical window variable selection (CWVS) to the multipollutant setting by introducing CWVS for mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to: (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM2.5 (week 2), and sulfate (week 20). The method is available in the R package CWVSmix.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1633-1652"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854390/pdf/nihms-1863002.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10124900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK. 大规模多元稀疏回归在英国生物银行中的应用。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 DOI: 10.1214/21-aoas1575

Junyang Qian, Yosuke Tanigawa, Ruilin Li, Robert Tibshirani, Manuel A Rivas, Trevor Hastie

In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.

在高维回归问题中，通常相对较小的特征子集与预测结果相关，并且在解决方案上施加稀疏性的方法很受欢迎。当多个相关结果可用(多任务)时，降阶回归是一种有效的方法，可以借用强度并捕获数据背后的潜在结构。我们的提议是由英国生物银行基于人群的队列研究激发的，在该研究中，我们面临着大规模、超高维特征，并且可以获得大量的结果(表型)——生活方式测量、生物标志物和疾病结果。因此，我们使用允许我们扩展到这种规模的问题的计算策略来拟合稀疏降阶回归模型。我们使用一种交替解决稀疏回归问题和求解降阶分解的方案。对于稀疏回归组件，我们提出了一种基于自适应筛选的可扩展迭代算法，该算法利用稀疏性假设，使我们能够专注于解决更小的子问题。通过最优性条件对完整解进行重构和测试，以确保它是原始问题的有效解。我们进一步扩展了该方法来处理实际问题，例如在表型中包含混淆变量和缺失值的imputation。在合成数据和UK Biobank数据上的实验证明了该方法和算法的有效性。我们提供了multiSnpnet包，可在http://github.com/junyangq/multiSnpnet上获得，它在PLINK2文件上工作，我们预计它将成为一个有价值的工具，用于从人类遗传研究中生成多基因风险评分。

{"title":"LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK.","authors":"Junyang Qian, Yosuke Tanigawa, Ruilin Li, Robert Tibshirani, Manuel A Rivas, Trevor Hastie","doi":"10.1214/21-aoas1575","DOIUrl":"https://doi.org/10.1214/21-aoas1575","url":null,"abstract":"In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1891-1918"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454085/pdf/nihms-1830548.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9399257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION. 在种群数量估计中结合多种数据源的贝叶斯分层模型。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 Epub Date: 2022-07-19 DOI: 10.1214/21-AOAS1556

Jacob Parsons, Xiaoyue Niu, Le Bao

To combat the HIV/AIDS pandemic effectively, targeted interventions among certain key populations play a critical role. Examples of such key populations include sex workers, people who inject drugs, and men who have sex with men. While having accurate estimates for the size of these key populations is important, any attempt to directly contact or count members of these populations is difficult. As a result, indirect methods are used to produce size estimates. Multiple approaches for estimating the size of such populations have been suggested but often give conflicting results. It is, therefore, necessary to have a principled way to combine and reconcile these estimates. To this end, we present a Bayesian hierarchical model for estimating the size of key populations that combines multiple estimates from different sources of information. The proposed model makes use of multiple years of data and explicitly models the systematic error in the data sources used. We use the model to estimate the size of people who inject drugs in Ukraine. We evaluate the appropriateness of the model and compare the contribution of each data source to the final estimates.

为有效防治艾滋病毒/艾滋病，对某些关键人群采取有针对性的干预措施至关重要。这些关键人群包括性工作者、注射毒品者和男男性行为者。虽然准确估计这些关键人群的规模非常重要，但任何试图直接接触或统计这些人群成员的尝试都很困难。因此，我们采用间接方法来估算人口规模。人们提出了多种估算此类人口规模的方法，但结果往往相互矛盾。因此，有必要制定一种原则性的方法来合并和协调这些估计值。为此，我们提出了一个贝叶斯分层模型，用于估算重点人群的规模，该模型综合了来自不同信息来源的多种估算结果。所提出的模型利用了多年的数据，并对所用数据源的系统误差进行了明确建模。我们使用该模型估算了乌克兰注射吸毒者的规模。我们对模型的适当性进行了评估，并比较了每个数据源对最终估算结果的贡献。

{"title":"A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION.","authors":"Jacob Parsons, Xiaoyue Niu, Le Bao","doi":"10.1214/21-AOAS1556","DOIUrl":"10.1214/21-AOAS1556","url":null,"abstract":"To combat the HIV/AIDS pandemic effectively, targeted interventions among certain key populations play a critical role. Examples of such key populations include sex workers, people who inject drugs, and men who have sex with men. While having accurate estimates for the size of these key populations is important, any attempt to directly contact or count members of these populations is difficult. As a result, indirect methods are used to produce size estimates. Multiple approaches for estimating the size of such populations have been suggested but often give conflicting results. It is, therefore, necessary to have a principled way to combine and reconcile these estimates. To this end, we present a Bayesian hierarchical model for estimating the size of key populations that combines multiple estimates from different sources of information. The proposed model makes use of multiple years of data and explicitly models the systematic error in the data sources used. We use the model to estimate the size of people who inject drugs in Ukraine. We evaluate the appropriateness of the model and compare the contribution of each data source to the final estimates.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1550-1562"},"PeriodicalIF":1.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10150643/pdf/nihms-1889948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9465730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAYESIAN FUNCTIONAL REGISTRATION OF FMRI ACTIVATION MAPS. FMRI 激活图的贝叶斯功能配准。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 Epub Date: 2022-07-19 DOI: 10.1214/21-aoas1562

Guoqing Wang, Abhirup Datta, Martin A Lindquist

Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization after anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework, and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.

功能磁共振成像（fMRI）为我们了解人类行为提供了宝贵的洞察力。然而，解剖配准后大脑解剖和功能定位方面的巨大个体间差异仍然是进行群体分析和群体推断的主要限制因素。本文针对这一问题，开发并验证了一种新的计算技术，通过将每个受试者的功能数据空间转换到一个共同的参考图，减少大脑功能系统中的个体间错位。我们提出的贝叶斯功能配准方法允许我们评估不同受试者大脑功能的差异以及激活拓扑的个体差异。它将基于强度的信息和基于特征的信息整合到一个综合框架中，并允许通过后验样本对转换进行推断。我们在一项模拟研究中对该方法进行了评估，并将其应用于一项热痛研究的数据中。我们发现，所提出的方法提高了组级推断的灵敏度。

{"title":"BAYESIAN FUNCTIONAL REGISTRATION OF FMRI ACTIVATION MAPS.","authors":"Guoqing Wang, Abhirup Datta, Martin A Lindquist","doi":"10.1214/21-aoas1562","DOIUrl":"10.1214/21-aoas1562","url":null,"abstract":"Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization after anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework, and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1676-1699"},"PeriodicalIF":1.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312483/pdf/nihms-1910200.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10138002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SENSITIVITY ANALYSIS FOR EVALUATING PRINCIPAL SURROGATE ENDPOINTS RELAXING THE EQUAL EARLY CLINICAL RISK ASSUMPTION. 评估主要替代终点的灵敏度分析放松了早期临床风险相同的假设。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 Epub Date: 2022-07-19 DOI: 10.1214/21-aoas1566

Ying Huang, Yingying Zhuang, Peter Gilbert

This article addresses the evaluation of post-randomization immune response biomarkers as principal surrogate endpoints of a vaccine's protective effect, based on data from randomized vaccine trials. An important metric for quantifying a biomarker's principal surrogacy in vaccine research is the vaccine efficacy curve, which shows a vaccine's efficacy as a function of potential biomarker values if receiving vaccine, among an 'early-always-at-risk' principal stratum of trial participants who remain disease-free at the time of biomarker measurement whether having received vaccine or placebo. Earlier work in principal surrogate evaluation relied on an 'equal-early-clinical-risk' assumption for identifiability of the vaccine curve, based on observed disease status at the time of biomarker measurement. This assumption is violated in the common setting that the vaccine has an early effect on the clinical endpoint before the biomarker is measured. In particular, a vaccine's early protective effect observed in two phase III dengue vaccine trials (CYD14/CYD15) has motivated our current research development. We relax the 'equal-early-clinical-risk' assumption and propose a new sensitivity analysis framework for principal surrogate evaluation allowing for early vaccine efficacy. Under this framework, we develop inference procedures for vaccine efficacy curve estimators based on the estimated maximum likelihood approach. We then use the proposed methodology to assess the surrogacy of post-randomization neutralization titer in the motivating dengue application.

本文以随机疫苗试验的数据为基础，对作为疫苗保护效果主要替代终点的随机化后免疫反应生物标志物进行了评估。疫苗疗效曲线是疫苗研究中量化生物标志物主要代用性的一个重要指标，它显示了疫苗的疗效与接受疫苗时潜在生物标志物值的函数关系，而疫苗的疗效是由 "早期一直处于风险中 "的主要试验参与者组成的。早期的主要替代物评估工作依赖于 "早期临床风险相同 "的假设，根据生物标记物测量时观察到的疾病状态来确定疫苗曲线的可识别性。在生物标记物测量前疫苗对临床终点产生早期影响的常见情况下，这一假设就被打破了。特别是，在登革热疫苗 III 期试验（CYD14/CYD15）中观察到的疫苗早期保护效果激发了我们目前的研究发展。我们放宽了 "早期临床风险相等 "的假设，并提出了一个新的敏感性分析框架，用于主要替代物评估，允许早期疫苗疗效。在这一框架下，我们基于最大似然估计法开发了疫苗疗效曲线估计器的推断程序。然后，我们在登革热应用中使用所提出的方法来评估随机化后中和滴度的代用性。

{"title":"SENSITIVITY ANALYSIS FOR EVALUATING PRINCIPAL SURROGATE ENDPOINTS RELAXING THE EQUAL EARLY CLINICAL RISK ASSUMPTION.","authors":"Ying Huang, Yingying Zhuang, Peter Gilbert","doi":"10.1214/21-aoas1566","DOIUrl":"10.1214/21-aoas1566","url":null,"abstract":"This article addresses the evaluation of post-randomization immune response biomarkers as principal surrogate endpoints of a vaccine's protective effect, based on data from randomized vaccine trials. An important metric for quantifying a biomarker's principal surrogacy in vaccine research is the vaccine efficacy curve, which shows a vaccine's efficacy as a function of potential biomarker values if receiving vaccine, among an 'early-always-at-risk' principal stratum of trial participants who remain disease-free at the time of biomarker measurement whether having received vaccine or placebo. Earlier work in principal surrogate evaluation relied on an 'equal-early-clinical-risk' assumption for identifiability of the vaccine curve, based on observed disease status at the time of biomarker measurement. This assumption is violated in the common setting that the vaccine has an early effect on the clinical endpoint before the biomarker is measured. In particular, a vaccine's early protective effect observed in two phase III dengue vaccine trials (CYD14/CYD15) has motivated our current research development. We relax the 'equal-early-clinical-risk' assumption and propose a new sensitivity analysis framework for principal surrogate evaluation allowing for early vaccine efficacy. Under this framework, we develop inference procedures for vaccine efficacy curve estimators based on the estimated maximum likelihood approach. We then use the proposed methodology to assess the surrogacy of post-randomization neutralization titer in the motivating dengue application.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1774-1794"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065750/pdf/nihms-1836703.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10190558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MEASURING PERFORMANCE FOR END-OF-LIFE CARE. 衡量临终关怀的表现。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-09-01 DOI: 10.1214/21-aoas1558

Sebastien Haneuse, Deborah Schrag, Francesca Dominici, Sharon-Lise Normand, Kyu Ha Lee

Although not without controversy, readmission is entrenched as a hospital quality metric with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospital's seemingly good performance for readmission may be an artifact of it having poor performance for mortality. in this paper we propose novel multivariate hospital-level performance measures for readmission and mortality that derive from framing the analysis as one of cluster-correlated semi-competing risks data. We also consider a number of profiling-related goals, including the identification of extreme performers and a bivariate classification of whether the hospital has higher-/lower-than-expected readmission and mortality rates via a Bayesian decision-theoretic approach that characterizes hospitals on the basis of minimizing the posterior expected loss for an appropriate loss function. in some settings, particularly if the number of hospitals is large, the computational burden may be prohibitive. To resolve this, we propose a series of analysis strategies that will be useful in practice. Throughout, the methods are illustrated with data from CMS on N = 17,685 patients diagnosed with pancreatic cancer between 2000-2012 at one of J = 264 hospitals in California.

虽然并非没有争议，但再入院率被确立为医院质量指标，其统计分析通常基于拟合logistic-Normal广义线性混合模型。然而，这种分析忽略了死亡作为一种竞争风险，尽管对高死亡率的临床条件这样做可能会产生深远的影响;一家医院在再入院率方面表现良好，可能是它在死亡率方面表现不佳的假象。在本文中，我们提出了新的多变量医院水平的再入院和死亡率的绩效指标，这些指标来源于将分析框架作为集群相关的半竞争风险数据之一。我们还考虑了一些与分析相关的目标，包括识别极端表现者，以及通过贝叶斯决策理论方法对医院是否有高于/低于预期的再入院率和死亡率进行双变量分类，该方法以最小化适当损失函数的后验预期损失为基础来表征医院。在某些情况下，特别是在医院数量众多的情况下，计算负担可能令人望而却步。为了解决这个问题，我们提出了一系列在实践中有用的分析策略。在整个过程中，这些方法用CMS对2000年至2012年间在加利福尼亚州J = 264家医院中的一家诊断为胰腺癌的N = 17,685例患者的数据进行了说明。

{"title":"MEASURING PERFORMANCE FOR END-OF-LIFE CARE.","authors":"Sebastien Haneuse, Deborah Schrag, Francesca Dominici, Sharon-Lise Normand, Kyu Ha Lee","doi":"10.1214/21-aoas1558","DOIUrl":"https://doi.org/10.1214/21-aoas1558","url":null,"abstract":"Although not without controversy, readmission is entrenched as a hospital quality metric with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospital's seemingly good performance for readmission may be an artifact of it having poor performance for mortality. in this paper we propose novel multivariate hospital-level performance measures for readmission and mortality that derive from framing the analysis as one of cluster-correlated semi-competing risks data. We also consider a number of profiling-related goals, including the identification of extreme performers and a bivariate classification of whether the hospital has higher-/lower-than-expected readmission and mortality rates via a Bayesian decision-theoretic approach that characterizes hospitals on the basis of minimizing the posterior expected loss for an appropriate loss function. in some settings, particularly if the number of hospitals is large, the computational burden may be prohibitive. To resolve this, we propose a series of analysis strategies that will be useful in practice. Throughout, the methods are illustrated with data from CMS on N = 17,685 patients diagnosed with pancreatic cancer between 2000-2012 at one of J = 264 hospitals in California.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1586-1607"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728673/pdf/nihms-1842846.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10333686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA. 一种灵活的敏感性分析方法，用于对多种治疗方法和二元结果的未测量混杂因素进行分析，并应用于 SEER-medicare 肺癌数据。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-06-01 Epub Date: 2022-06-13 DOI: 10.1214/21-aoas1530

Liangyuan Hu, Jungang Zou, Chenyang Gu, Jiayi Ji, Michael Lopez, Minal Kale

In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.

在没有随机实验的情况下，对治疗效果进行因果推断的一个关键假设是治疗分配不可忽略。违反可忽略性假设可能会导致治疗效果估计值出现偏差。敏感性分析有助于衡量因果推断会因偏离可忽略性假设的潜在程度而发生怎样的变化。然而，在多重治疗和二元结果的背景下，针对未测量混杂因素的敏感性分析方法还很缺乏。我们提出了一种灵活的蒙特卡罗敏感性分析方法，用于在这种情况下进行因果推断。我们首先推导出未测量混杂引入的偏差的一般形式，重点是与多重治疗独特相关的理论属性。然后，我们提出了对未测量混杂因素对潜在结果的影响进行编码的方法，并对去除假定未测量混杂因素的因果效应估计值进行调整。我们提出的方法在贝叶斯框架内嵌入了嵌套多重归因法，可以无缝整合敏感性参数值的不确定性和抽样变异性，并使用贝叶斯加性回归树来灵活建模。大量模拟验证了我们的方法，并深入了解了多种治疗方法的敏感性分析。我们使用 SEER-Medicare 数据演示了早期非小细胞肺癌三种治疗方法的敏感性分析。本研究中开发的方法可通过 R 软件包 SAMTx 轻松获得。

{"title":"A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA.","authors":"Liangyuan Hu, Jungang Zou, Chenyang Gu, Jiayi Ji, Michael Lopez, Minal Kale","doi":"10.1214/21-aoas1530","DOIUrl":"10.1214/21-aoas1530","url":null,"abstract":"In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 2","pages":"1014-1037"},"PeriodicalIF":1.8,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835106/pdf/nihms-1859782.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10538891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A FLEXIBLE BAYESIAN FRAMEWORK TO ESTIMATE AGE- AND CAUSE-SPECIFIC CHILD MORTALITY OVER TIME FROM SAMPLE REGISTRATION DATA. 一个灵活的贝叶斯框架估计年龄和原因特定的儿童死亡率随时间的样本登记数据。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2022-03-01 DOI: 10.1214/21-aoas1489

Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li Liu

In order to implement disease-specific interventions in young age groups, policy makers in low- and middle-income countries require timely and accurate estimates of age- and cause-specific child mortality. High-quality data is not available in settings where these interventions are most needed, but there is a push to create sample registration systems that collect detailed mortality information. current methods that estimate mortality from this data employ multistage frameworks without rigorous statistical justification that separately estimate all-cause and cause-specific mortality and are not sufficiently adaptable to capture important features of the data. We propose a flexible Bayesian modeling framework to estimate age- and cause-specific child mortality from sample registration data. We provide a theoretical justification for the framework, explore its properties via simulation, and use it to estimate mortality trends using data from the Maternal and Child Health Surveillance System in China.

为了在年轻群体中实施针对特定疾病的干预措施，低收入和中等收入国家的决策者需要及时和准确地估计针对特定年龄和原因的儿童死亡率。在最需要这些干预措施的环境中没有高质量的数据，但正在推动建立收集详细死亡率信息的样本登记系统。目前根据这些数据估计死亡率的方法采用多阶段框架，没有严格的统计依据，分别估计全因死亡率和特定原因死亡率，适应性不足，无法捕捉数据的重要特征。我们提出了一个灵活的贝叶斯建模框架来估计年龄和特定原因的儿童死亡率从样本登记数据。我们为该框架提供了理论依据，通过模拟探索其特性，并利用中国妇幼健康监测系统的数据来估计死亡率趋势。

{"title":"A FLEXIBLE BAYESIAN FRAMEWORK TO ESTIMATE AGE- AND CAUSE-SPECIFIC CHILD MORTALITY OVER TIME FROM SAMPLE REGISTRATION DATA.","authors":"Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li Liu","doi":"10.1214/21-aoas1489","DOIUrl":"https://doi.org/10.1214/21-aoas1489","url":null,"abstract":"In order to implement disease-specific interventions in young age groups, policy makers in low- and middle-income countries require timely and accurate estimates of age- and cause-specific child mortality. High-quality data is not available in settings where these interventions are most needed, but there is a push to create sample registration systems that collect detailed mortality information. current methods that estimate mortality from this data employ multistage frameworks without rigorous statistical justification that separately estimate all-cause and cause-specific mortality and are not sufficiently adaptable to capture important features of the data. We propose a flexible Bayesian modeling framework to estimate age- and cause-specific child mortality from sample registration data. We provide a theoretical justification for the framework, explore its properties via simulation, and use it to estimate mortality trends using data from the Maternal and Child Health Surveillance System in China.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 1","pages":"124-143"},"PeriodicalIF":1.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448806/pdf/nihms-1862449.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10103673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3