Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk
Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
了解亚细胞蛋白质定位是分析特定环境蛋白质功能的重要组成部分。定量质谱分析(MS)技术的最新进展,已将数千种蛋白质高分辨率地绘制到细胞内的亚细胞位置。因此有必要采用新的建模方法来捕捉这些数据的复杂性质。我们在非参数贝叶斯框架下,利用高斯过程回归模型的 K 分量混合物来分析空间蛋白质组学数据。高斯过程回归模型考虑了亚细胞龛内的相关结构,每个混合物成分捕捉每个龛内观察到的不同相关结构。标记蛋白质(即具有先验已知标记位置的蛋白质)的可用性促使我们采用半监督学习方法为高斯过程超参数提供信息。此外,我们还为我们的模型提供了一个高效的哈密顿-内-吉布斯采样器(Hamiltonian-within-Gibbs sampler)。此外,我们还利用协方差矩阵的结构,减轻了与协方差矩阵反演相关的计算负担。通过对协方差矩阵进行张量分解,可以应用扩展的 Trench 和 Durbin 算法来降低反演的计算复杂度,从而加快计算速度。我们提供了果蝇胚胎和小鼠多能胚胎干细胞的详细案例研究,以说明半监督功能贝叶斯数据建模的好处。
{"title":"Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.","authors":"Oliver M Crook, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk","doi":"10.1214/22-AOAS1603","DOIUrl":"10.1214/22-AOAS1603","url":null,"abstract":"<p><p>Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of <i>marker proteins</i> (i.e. proteins with <i>a priori</i> known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on <i>Drosophila</i> embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7613899/pdf/EMS143956.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9155886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01Epub Date: 2022-09-26DOI: 10.1214/21-AOAS1589
Yifei Sun, Xuming He, Jianhua Hu
Late-stage clinical trials have been conducted primarily to establish the efficacy of a new treatment in an intended population. A corollary of population heterogeneity in clinical trials is that a treatment might be effective for one or more subgroups, rather than for the whole population of interest. As an example, the phase III clinical trial of panitumumab in metastatic colorectal cancer patients failed to demonstrate its efficacy in the overall population, but a subgroup associated with tumor KRAS status was found to be promising (Peeters et al. (Am. J. Clin. Oncol. 28 (2010) 4706-4713)). As we search for such subgroups via data partitioning based on a large number of biomarkers, we need to guard against inflated type I error rates due to multiple testing. Commonly-used multiplicity adjustments tend to lose power for the detection of subgroup treatment effects. We develop an effective omnibus test to detect the existence of, at least, one subgroup treatment effect, allowing a large number of possible subgroups to be considered and possibly censored outcomes. Applied to the panitumumab trial data, the proposed test would confirm a significant subgroup treatment effect. Empirical studies also show that the proposed test is applicable to a variety of outcome variables and maintains robust statistical power.
后期临床试验主要是为了确定一种新疗法在目标人群中的疗效。临床试验中人群异质性的一个必然结果是,一种治疗方法可能对一个或多个亚组有效,而不是对整个相关人群有效。例如,帕尼单抗在转移性结直肠癌患者中的 III 期临床试验未能证明其在总体人群中的疗效,但发现与肿瘤 KRAS 状态相关的一个亚组很有希望(Peeters 等(Am.J. Clin.Oncol.28 (2010) 4706-4713)).当我们通过基于大量生物标记物的数据分区来寻找此类亚组时,我们需要防止因多重检验而导致的I型错误率升高。常用的多重性调整往往会失去检测亚组治疗效应的能力。我们开发了一种有效的综合测试来检测是否存在至少一种亚组治疗效应,允许考虑大量可能的亚组和可能的删减结果。将该检验方法应用于帕尼单抗试验数据,可确认存在显著的亚组治疗效应。实证研究还表明,建议的检验适用于各种结果变量,并能保持强大的统计能力。
{"title":"AN OMNIBUS TEST FOR DETECTION OF SUBGROUP TREATMENT EFFECTS VIA DATA PARTITIONING.","authors":"Yifei Sun, Xuming He, Jianhua Hu","doi":"10.1214/21-AOAS1589","DOIUrl":"10.1214/21-AOAS1589","url":null,"abstract":"<p><p>Late-stage clinical trials have been conducted primarily to establish the efficacy of a new treatment in an intended population. A corollary of population heterogeneity in clinical trials is that a treatment might be effective for one or more subgroups, rather than for the whole population of interest. As an example, the phase III clinical trial of panitumumab in metastatic colorectal cancer patients failed to demonstrate its efficacy in the overall population, but a subgroup associated with tumor KRAS status was found to be promising (Peeters et al. (<i>Am. J. Clin. Oncol.</i> 28 (2010) 4706-4713)). As we search for such subgroups via data partitioning based on a large number of biomarkers, we need to guard against inflated type I error rates due to multiple testing. Commonly-used multiplicity adjustments tend to lose power for the detection of subgroup treatment effects. We develop an effective omnibus test to detect the existence of, at least, one subgroup treatment effect, allowing a large number of possible subgroups to be considered and possibly censored outcomes. Applied to the panitumumab trial data, the proposed test would confirm a significant subgroup treatment effect. Empirical studies also show that the proposed test is applicable to a variety of outcome variables and maintains robust statistical power.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 4","pages":"2266-2278"},"PeriodicalIF":1.8,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381789/pdf/nihms-1919024.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9973657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua L Warren, Howard H Chang, Lauren K Warren, Matthew J Strickland, Lyndsey A Darrow, James A Mulholland
Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable subpopulations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend critical window variable selection (CWVS) to the multipollutant setting by introducing CWVS for mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to: (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM2.5 (week 2), and sulfate (week 20). The method is available in the R package CWVSmix.
{"title":"CRITICAL WINDOW VARIABLE SELECTION FOR MIXTURES: ESTIMATING THE IMPACT OF MULTIPLE AIR POLLUTANTS ON STILLBIRTH.","authors":"Joshua L Warren, Howard H Chang, Lauren K Warren, Matthew J Strickland, Lyndsey A Darrow, James A Mulholland","doi":"10.1214/21-aoas1560","DOIUrl":"https://doi.org/10.1214/21-aoas1560","url":null,"abstract":"<p><p>Understanding the role of time-varying pollution mixtures on human health is critical as people are simultaneously exposed to multiple pollutants during their lives. For vulnerable subpopulations who have well-defined exposure periods (e.g., pregnant women), questions regarding critical windows of exposure to these mixtures are important for mitigating harm. We extend critical window variable selection (CWVS) to the multipollutant setting by introducing CWVS for mixtures (CWVSmix), a hierarchical Bayesian method that combines smoothed variable selection and temporally correlated weight parameters to: (i) identify critical windows of exposure to mixtures of time-varying pollutants, (ii) estimate the time-varying relative importance of each individual pollutant and their first order interactions within the mixture, and (iii) quantify the impact of the mixtures on health. Through simulation we show that CWVSmix offers the best balance of performance in each of these categories in comparison to competing methods. Using these approaches, we investigate the impact of exposure to multiple ambient air pollutants on the risk of stillbirth in New Jersey, 2005-2014. We find consistent elevated risk in gestational weeks 2, 16-17, and 20 for non-Hispanic Black mothers, with pollution mixtures dominated by ammonium (weeks 2, 17, 20), nitrate (weeks 2, 17), nitrogen oxides (weeks 2, 16), PM<sub>2.5</sub> (week 2), and sulfate (week 20). The method is available in the R package CWVSmix.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1633-1652"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854390/pdf/nihms-1863002.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10124900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyang Qian, Yosuke Tanigawa, Ruilin Li, Robert Tibshirani, Manuel A Rivas, Trevor Hastie
In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.
{"title":"LARGE-SCALE MULTIVARIATE SPARSE REGRESSION WITH APPLICATIONS TO UK BIOBANK.","authors":"Junyang Qian, Yosuke Tanigawa, Ruilin Li, Robert Tibshirani, Manuel A Rivas, Trevor Hastie","doi":"10.1214/21-aoas1575","DOIUrl":"https://doi.org/10.1214/21-aoas1575","url":null,"abstract":"<p><p>In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1891-1918"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454085/pdf/nihms-1830548.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9399257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01Epub Date: 2022-07-19DOI: 10.1214/21-AOAS1556
Jacob Parsons, Xiaoyue Niu, Le Bao
To combat the HIV/AIDS pandemic effectively, targeted interventions among certain key populations play a critical role. Examples of such key populations include sex workers, people who inject drugs, and men who have sex with men. While having accurate estimates for the size of these key populations is important, any attempt to directly contact or count members of these populations is difficult. As a result, indirect methods are used to produce size estimates. Multiple approaches for estimating the size of such populations have been suggested but often give conflicting results. It is, therefore, necessary to have a principled way to combine and reconcile these estimates. To this end, we present a Bayesian hierarchical model for estimating the size of key populations that combines multiple estimates from different sources of information. The proposed model makes use of multiple years of data and explicitly models the systematic error in the data sources used. We use the model to estimate the size of people who inject drugs in Ukraine. We evaluate the appropriateness of the model and compare the contribution of each data source to the final estimates.
{"title":"A BAYESIAN HIERARCHICAL MODEL FOR COMBINING MULTIPLE DATA SOURCES IN POPULATION SIZE ESTIMATION.","authors":"Jacob Parsons, Xiaoyue Niu, Le Bao","doi":"10.1214/21-AOAS1556","DOIUrl":"10.1214/21-AOAS1556","url":null,"abstract":"<p><p>To combat the HIV/AIDS pandemic effectively, targeted interventions among certain key populations play a critical role. Examples of such key populations include sex workers, people who inject drugs, and men who have sex with men. While having accurate estimates for the size of these key populations is important, any attempt to directly contact or count members of these populations is difficult. As a result, indirect methods are used to produce size estimates. Multiple approaches for estimating the size of such populations have been suggested but often give conflicting results. It is, therefore, necessary to have a principled way to combine and reconcile these estimates. To this end, we present a Bayesian hierarchical model for estimating the size of key populations that combines multiple estimates from different sources of information. The proposed model makes use of multiple years of data and explicitly models the systematic error in the data sources used. We use the model to estimate the size of people who inject drugs in Ukraine. We evaluate the appropriateness of the model and compare the contribution of each data source to the final estimates.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1550-1562"},"PeriodicalIF":1.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10150643/pdf/nihms-1889948.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9465730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01Epub Date: 2022-07-19DOI: 10.1214/21-aoas1562
Guoqing Wang, Abhirup Datta, Martin A Lindquist
Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization after anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework, and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.
{"title":"BAYESIAN FUNCTIONAL REGISTRATION OF FMRI ACTIVATION MAPS.","authors":"Guoqing Wang, Abhirup Datta, Martin A Lindquist","doi":"10.1214/21-aoas1562","DOIUrl":"10.1214/21-aoas1562","url":null,"abstract":"<p><p>Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization <i>after</i> anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework, and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1676-1699"},"PeriodicalIF":1.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312483/pdf/nihms-1910200.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10138002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01Epub Date: 2022-07-19DOI: 10.1214/21-aoas1566
Ying Huang, Yingying Zhuang, Peter Gilbert
This article addresses the evaluation of post-randomization immune response biomarkers as principal surrogate endpoints of a vaccine's protective effect, based on data from randomized vaccine trials. An important metric for quantifying a biomarker's principal surrogacy in vaccine research is the vaccine efficacy curve, which shows a vaccine's efficacy as a function of potential biomarker values if receiving vaccine, among an 'early-always-at-risk' principal stratum of trial participants who remain disease-free at the time of biomarker measurement whether having received vaccine or placebo. Earlier work in principal surrogate evaluation relied on an 'equal-early-clinical-risk' assumption for identifiability of the vaccine curve, based on observed disease status at the time of biomarker measurement. This assumption is violated in the common setting that the vaccine has an early effect on the clinical endpoint before the biomarker is measured. In particular, a vaccine's early protective effect observed in two phase III dengue vaccine trials (CYD14/CYD15) has motivated our current research development. We relax the 'equal-early-clinical-risk' assumption and propose a new sensitivity analysis framework for principal surrogate evaluation allowing for early vaccine efficacy. Under this framework, we develop inference procedures for vaccine efficacy curve estimators based on the estimated maximum likelihood approach. We then use the proposed methodology to assess the surrogacy of post-randomization neutralization titer in the motivating dengue application.
本文以随机疫苗试验的数据为基础,对作为疫苗保护效果主要替代终点的随机化后免疫反应生物标志物进行了评估。疫苗疗效曲线是疫苗研究中量化生物标志物主要代用性的一个重要指标,它显示了疫苗的疗效与接受疫苗时潜在生物标志物值的函数关系,而疫苗的疗效是由 "早期一直处于风险中 "的主要试验参与者组成的。早期的主要替代物评估工作依赖于 "早期临床风险相同 "的假设,根据生物标记物测量时观察到的疾病状态来确定疫苗曲线的可识别性。在生物标记物测量前疫苗对临床终点产生早期影响的常见情况下,这一假设就被打破了。特别是,在登革热疫苗 III 期试验(CYD14/CYD15)中观察到的疫苗早期保护效果激发了我们目前的研究发展。我们放宽了 "早期临床风险相等 "的假设,并提出了一个新的敏感性分析框架,用于主要替代物评估,允许早期疫苗疗效。在这一框架下,我们基于最大似然估计法开发了疫苗疗效曲线估计器的推断程序。然后,我们在登革热应用中使用所提出的方法来评估随机化后中和滴度的代用性。
{"title":"SENSITIVITY ANALYSIS FOR EVALUATING PRINCIPAL SURROGATE ENDPOINTS RELAXING THE EQUAL EARLY CLINICAL RISK ASSUMPTION.","authors":"Ying Huang, Yingying Zhuang, Peter Gilbert","doi":"10.1214/21-aoas1566","DOIUrl":"10.1214/21-aoas1566","url":null,"abstract":"<p><p>This article addresses the evaluation of post-randomization immune response biomarkers as principal surrogate endpoints of a vaccine's protective effect, based on data from randomized vaccine trials. An important metric for quantifying a biomarker's principal surrogacy in vaccine research is the vaccine efficacy curve, which shows a vaccine's efficacy as a function of potential biomarker values if receiving vaccine, among an 'early-always-at-risk' principal stratum of trial participants who remain disease-free at the time of biomarker measurement whether having received vaccine or placebo. Earlier work in principal surrogate evaluation relied on an 'equal-early-clinical-risk' assumption for identifiability of the vaccine curve, based on observed disease status at the time of biomarker measurement. This assumption is violated in the common setting that the vaccine has an early effect on the clinical endpoint before the biomarker is measured. In particular, a vaccine's early protective effect observed in two phase III dengue vaccine trials (CYD14/CYD15) has motivated our current research development. We relax the 'equal-early-clinical-risk' assumption and propose a new sensitivity analysis framework for principal surrogate evaluation allowing for early vaccine efficacy. Under this framework, we develop inference procedures for vaccine efficacy curve estimators based on the estimated maximum likelihood approach. We then use the proposed methodology to assess the surrogacy of post-randomization neutralization titer in the motivating dengue application.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1774-1794"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065750/pdf/nihms-1836703.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10190558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastien Haneuse, Deborah Schrag, Francesca Dominici, Sharon-Lise Normand, Kyu Ha Lee
Although not without controversy, readmission is entrenched as a hospital quality metric with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospital's seemingly good performance for readmission may be an artifact of it having poor performance for mortality. in this paper we propose novel multivariate hospital-level performance measures for readmission and mortality that derive from framing the analysis as one of cluster-correlated semi-competing risks data. We also consider a number of profiling-related goals, including the identification of extreme performers and a bivariate classification of whether the hospital has higher-/lower-than-expected readmission and mortality rates via a Bayesian decision-theoretic approach that characterizes hospitals on the basis of minimizing the posterior expected loss for an appropriate loss function. in some settings, particularly if the number of hospitals is large, the computational burden may be prohibitive. To resolve this, we propose a series of analysis strategies that will be useful in practice. Throughout, the methods are illustrated with data from CMS on N = 17,685 patients diagnosed with pancreatic cancer between 2000-2012 at one of J = 264 hospitals in California.
{"title":"MEASURING PERFORMANCE FOR END-OF-LIFE CARE.","authors":"Sebastien Haneuse, Deborah Schrag, Francesca Dominici, Sharon-Lise Normand, Kyu Ha Lee","doi":"10.1214/21-aoas1558","DOIUrl":"https://doi.org/10.1214/21-aoas1558","url":null,"abstract":"<p><p>Although not without controversy, readmission is entrenched as a hospital quality metric with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospital's seemingly good performance for readmission may be an artifact of it having poor performance for mortality. in this paper we propose novel multivariate hospital-level performance measures for readmission and mortality that derive from framing the analysis as one of cluster-correlated semi-competing risks data. We also consider a number of profiling-related goals, including the identification of extreme performers and a bivariate classification of whether the hospital has higher-/lower-than-expected readmission and mortality rates via a Bayesian decision-theoretic approach that characterizes hospitals on the basis of minimizing the posterior expected loss for an appropriate loss function. in some settings, particularly if the number of hospitals is large, the computational burden may be prohibitive. To resolve this, we propose a series of analysis strategies that will be useful in practice. Throughout, the methods are illustrated with data from CMS on <i>N</i> = 17,685 patients diagnosed with pancreatic cancer between 2000-2012 at one of <i>J</i> = 264 hospitals in California.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 3","pages":"1586-1607"},"PeriodicalIF":1.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728673/pdf/nihms-1842846.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10333686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-01Epub Date: 2022-06-13DOI: 10.1214/21-aoas1530
Liangyuan Hu, Jungang Zou, Chenyang Gu, Jiayi Ji, Michael Lopez, Minal Kale
In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.
在没有随机实验的情况下,对治疗效果进行因果推断的一个关键假设是治疗分配不可忽略。违反可忽略性假设可能会导致治疗效果估计值出现偏差。敏感性分析有助于衡量因果推断会因偏离可忽略性假设的潜在程度而发生怎样的变化。然而,在多重治疗和二元结果的背景下,针对未测量混杂因素的敏感性分析方法还很缺乏。我们提出了一种灵活的蒙特卡罗敏感性分析方法,用于在这种情况下进行因果推断。我们首先推导出未测量混杂引入的偏差的一般形式,重点是与多重治疗独特相关的理论属性。然后,我们提出了对未测量混杂因素对潜在结果的影响进行编码的方法,并对去除假定未测量混杂因素的因果效应估计值进行调整。我们提出的方法在贝叶斯框架内嵌入了嵌套多重归因法,可以无缝整合敏感性参数值的不确定性和抽样变异性,并使用贝叶斯加性回归树来灵活建模。大量模拟验证了我们的方法,并深入了解了多种治疗方法的敏感性分析。我们使用 SEER-Medicare 数据演示了早期非小细胞肺癌三种治疗方法的敏感性分析。本研究中开发的方法可通过 R 软件包 SAMTx 轻松获得。
{"title":"A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA.","authors":"Liangyuan Hu, Jungang Zou, Chenyang Gu, Jiayi Ji, Michael Lopez, Minal Kale","doi":"10.1214/21-aoas1530","DOIUrl":"10.1214/21-aoas1530","url":null,"abstract":"<p><p>In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 2","pages":"1014-1037"},"PeriodicalIF":1.8,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835106/pdf/nihms-1859782.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10538891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li Liu
In order to implement disease-specific interventions in young age groups, policy makers in low- and middle-income countries require timely and accurate estimates of age- and cause-specific child mortality. High-quality data is not available in settings where these interventions are most needed, but there is a push to create sample registration systems that collect detailed mortality information. current methods that estimate mortality from this data employ multistage frameworks without rigorous statistical justification that separately estimate all-cause and cause-specific mortality and are not sufficiently adaptable to capture important features of the data. We propose a flexible Bayesian modeling framework to estimate age- and cause-specific child mortality from sample registration data. We provide a theoretical justification for the framework, explore its properties via simulation, and use it to estimate mortality trends using data from the Maternal and Child Health Surveillance System in China.
{"title":"A FLEXIBLE BAYESIAN FRAMEWORK TO ESTIMATE AGE- AND CAUSE-SPECIFIC CHILD MORTALITY OVER TIME FROM SAMPLE REGISTRATION DATA.","authors":"Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li Liu","doi":"10.1214/21-aoas1489","DOIUrl":"https://doi.org/10.1214/21-aoas1489","url":null,"abstract":"<p><p>In order to implement disease-specific interventions in young age groups, policy makers in low- and middle-income countries require timely and accurate estimates of age- and cause-specific child mortality. High-quality data is not available in settings where these interventions are most needed, but there is a push to create sample registration systems that collect detailed mortality information. current methods that estimate mortality from this data employ multistage frameworks without rigorous statistical justification that separately estimate all-cause and cause-specific mortality and are not sufficiently adaptable to capture important features of the data. We propose a flexible Bayesian modeling framework to estimate age- and cause-specific child mortality from sample registration data. We provide a theoretical justification for the framework, explore its properties via simulation, and use it to estimate mortality trends using data from the Maternal and Child Health Surveillance System in China.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 1","pages":"124-143"},"PeriodicalIF":1.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448806/pdf/nihms-1862449.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10103673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}