Annals of Applied Statistics最新文献

CAUSAL HEALTH IMPACTS OF POWER PLANT EMISSION CONTROLS UNDER MODELED AND UNCERTAIN PHYSICAL PROCESS INTERFERENCE. 模拟和不确定物理过程干扰下电厂排放控制的因果健康影响。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1904

Nathan B Wikle, Corwin M Zigler

Causal inference with spatial environmental data is often challenging due to the presence of interference: outcomes for observational units depend on some combination of local and nonlocal treatment. This is especially relevant when estimating the effect of power plant emissions controls on population health, as pollution exposure is dictated by: (i) the location of point-source emissions as well as (ii) the transport of pollutants across space via dynamic physical-chemical processes. In this work we estimate the effectiveness of air quality interventions at coal-fired power plants in reducing two adverse health outcomes in Texas in 2016: pediatric asthma ED visits and Medicare all-cause mortality. We develop methods for causal inference with interference when the underlying network structure is not known with certainty and instead must be estimated from ancillary data. Notably, uncertainty in the interference structure is propagated to the resulting causal effect estimates. We offer a Bayesian, spatial mechanistic model for the interference mapping, which we combine with a flexible nonparametric outcome model to marginalize estimates of causal effects over uncertainty in the structure of interference. our analysis finds some evidence that emissions controls at upwind power plants reduce asthma ED visits and all-cause mortality; however, accounting for uncertainty in the interference renders the results largely inconclusive.

由于存在干扰，空间环境数据的因果推断往往具有挑战性：观测单位的结果取决于局部和非局部处理的某种组合。在估计发电厂排放控制对人口健康的影响时，这一点尤其重要，因为污染暴露取决于：（一）点源排放的地点以及（二）污染物通过动态物理化学过程在空间上的转移。在这项工作中，我们估计了2016年德克萨斯州燃煤电厂空气质量干预措施在减少两种不良健康结果方面的有效性：儿科哮喘急诊就诊和医疗保险全因死亡率。当底层网络结构不确定且必须从辅助数据中估计时，我们开发了具有干扰的因果推理方法。值得注意的是，干涉结构中的不确定性被传播到由此产生的因果效应估计中。我们为干扰映射提供了一个贝叶斯空间机制模型，我们将其与一个灵活的非参数结果模型相结合，以边缘化干扰结构中不确定性的因果效应估计。我们的分析发现，一些证据表明，对逆风发电厂的排放控制可以减少哮喘急诊就诊和全因死亡率；然而，考虑到干扰的不确定性，结果基本上是不确定的。

{"title":"CAUSAL HEALTH IMPACTS OF POWER PLANT EMISSION CONTROLS UNDER MODELED AND UNCERTAIN PHYSICAL PROCESS INTERFERENCE.","authors":"Nathan B Wikle, Corwin M Zigler","doi":"10.1214/24-aoas1904","DOIUrl":"10.1214/24-aoas1904","url":null,"abstract":"Causal inference with spatial environmental data is often challenging due to the presence of interference: outcomes for observational units depend on some combination of local and nonlocal treatment. This is especially relevant when estimating the effect of power plant emissions controls on population health, as pollution exposure is dictated by: (i) the location of point-source emissions as well as (ii) the transport of pollutants across space via dynamic physical-chemical processes. In this work we estimate the effectiveness of air quality interventions at coal-fired power plants in reducing two adverse health outcomes in Texas in 2016: pediatric asthma ED visits and Medicare all-cause mortality. We develop methods for causal inference with interference when the underlying network structure is not known with certainty and instead must be estimated from ancillary data. Notably, uncertainty in the interference structure is propagated to the resulting causal effect estimates. We offer a Bayesian, spatial mechanistic model for the interference mapping, which we combine with a flexible nonparametric outcome model to marginalize estimates of causal effects over uncertainty in the structure of interference. our analysis finds some evidence that emissions controls at upwind power plants reduce asthma ED visits and all-cause mortality; however, accounting for uncertainty in the interference renders the results largely inconclusive.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"2753-2774"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11619076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

INDIVIDUAL DYNAMIC PREDICTION FOR CURE AND SURVIVAL BASED ON LONGITUDINAL BIOMARKERS.

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1906

Can Xie, Xuelin Huang, Ruosha Li, Alexander Tsodikov, Kapil Bhalla

To optimize personalized treatment strategies and extend patients' survival times, it is critical to accurately predict patients' prognoses at all stages, from disease diagnosis to follow-up visits. The longitudinal biomarker measurements during visits are essential for this prediction purpose. Patients' ultimate concerns are cure and survival. However, in many situations, there is no clear biomarker indicator for cure. We propose a comprehensive joint model of longitudinal and survival data and a landmark cure model, incorporating proportions of potentially cured patients. The survival distributions in the joint and landmark models are specified through flexible hazard functions with the proportional hazards as a special case, allowing other patterns such as crossing hazard and survival functions. Formulas are provided for predicting each individual's probabilities of future cure and survival at any time point based on his or her current biomarker history. Simulations show that, with these comprehensive and flexible properties, the proposed cure models outperform standard cure models in terms of predictive performance, measured by the time-dependent area under the curve of receiver operating characteristic, Brier score, and integrated Brier score. The use and advantages of the proposed models are illustrated by their application to a study of patients with chronic myeloid leukemia.

{"title":"INDIVIDUAL DYNAMIC PREDICTION FOR CURE AND SURVIVAL BASED ON LONGITUDINAL BIOMARKERS.","authors":"Can Xie, Xuelin Huang, Ruosha Li, Alexander Tsodikov, Kapil Bhalla","doi":"10.1214/24-aoas1906","DOIUrl":"10.1214/24-aoas1906","url":null,"abstract":"To optimize personalized treatment strategies and extend patients' survival times, it is critical to accurately predict patients' prognoses at all stages, from disease diagnosis to follow-up visits. The longitudinal biomarker measurements during visits are essential for this prediction purpose. Patients' ultimate concerns are cure and survival. However, in many situations, there is no clear biomarker indicator for cure. We propose a comprehensive joint model of longitudinal and survival data and a landmark cure model, incorporating proportions of potentially cured patients. The survival distributions in the joint and landmark models are specified through flexible hazard functions with the proportional hazards as a special case, allowing other patterns such as crossing hazard and survival functions. Formulas are provided for predicting each individual's probabilities of future cure and survival at any time point based on his or her current biomarker history. Simulations show that, with these comprehensive and flexible properties, the proposed cure models outperform standard cure models in terms of predictive performance, measured by the time-dependent area under the curve of receiver operating characteristic, Brier score, and integrated Brier score. The use and advantages of the proposed models are illustrated by their application to a study of patients with chronic myeloid leukemia.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"2796-2817"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11864788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A SPATIALLY VARYING HIERARCHICAL RANDOM EFFECTS MODEL FOR LONGITUDINAL MACULAR STRUCTURAL DATA IN GLAUCOMA PATIENTS.

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1944

By Erica Su, Robert E Weiss, Kouros Nouri-Mahdavi, Andrew J Holbrook

We model longitudinal macular thickness measurements to monitor the course of glaucoma and prevent vision loss due to disease progression. The macular thickness varies over a 6 × 6 grid of locations on the retina, with additional variability arising from the imaging process at each visit. currently, ophthalmologists estimate slopes using repeated simple linear regression for each subject and location. To estimate slopes more precisely, we develop a novel Bayesian hierarchical model for multiple subjects with spatially varying population-level and subject-level coefficients, borrowing information over subjects and measurement locations. We augment the model with visit effects to account for observed spatially correlated visit-specific errors. We model spatially varying: (a) intercepts, (b) slopes, and (c) log-residual standard deviations (SD) with multivariate Gaussian process priors with Matérn cross-covariance functions. Each marginal process assumes an exponential kernel with its own SD and spatial correlation matrix. We develop our models for and apply them to data from the Advanced Glaucoma Progression Study. We show that including visit effects in the model reduces error in predicting future thickness measurements and greatly improves model fit.

{"title":"A SPATIALLY VARYING HIERARCHICAL RANDOM EFFECTS MODEL FOR LONGITUDINAL MACULAR STRUCTURAL DATA IN GLAUCOMA PATIENTS.","authors":"By Erica Su, Robert E Weiss, Kouros Nouri-Mahdavi, Andrew J Holbrook","doi":"10.1214/24-aoas1944","DOIUrl":"10.1214/24-aoas1944","url":null,"abstract":"We model longitudinal macular thickness measurements to monitor the course of glaucoma and prevent vision loss due to disease progression. The macular thickness varies over a 6 × 6 grid of locations on the retina, with additional variability arising from the imaging process at each visit. currently, ophthalmologists estimate slopes using repeated simple linear regression for each subject and location. To estimate slopes more precisely, we develop a novel Bayesian hierarchical model for multiple subjects with spatially varying population-level and subject-level coefficients, borrowing information over subjects and measurement locations. We augment the model with visit effects to account for observed spatially correlated visit-specific errors. We model spatially varying: (a) intercepts, (b) slopes, and (c) log-residual standard deviations (SD) with multivariate Gaussian process priors with Matérn cross-covariance functions. Each marginal process assumes an exponential kernel with its own SD and spatial correlation matrix. We develop our models for and apply them to data from the Advanced Glaucoma Progression Study. We show that including visit effects in the model reduces error in predicting future thickness measurements and greatly improves model fit.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3444-3466"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11864210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN. 通过观察数据分析暴露对计数结果的影响，并将其应用于被监禁妇女。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1874

Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora

Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.

因果推理方法可用于利用观察性研究的数据估算点暴露或治疗对相关结果的影响。例如，在 "妇女机构间艾滋病研究"（Women's Interagency HIV Study）中，我们有兴趣了解监禁对监禁后性伴侣数量和吸烟数量的影响。在这种结果为计数的情况下，估计值通常为因果平均比率，即暴露情况下的反事实平均计数与不暴露情况下的反事实平均计数之比。本文考虑了基于逆概率处理权重、参数 g 公式和双重稳健估计的因果平均比率估计方法，每种方法都可以考虑测量结果中的过度分散、零膨胀和堆叠。通过模拟对这些方法进行了比较，并将其应用于妇女机构间艾滋病毒研究的数据中。

{"title":"EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN.","authors":"Bonnie E Shook-Sa, Michael G Hudgens, Andrea K Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A Fischl, Dustin Long, Adaora A Adimora","doi":"10.1214/24-aoas1874","DOIUrl":"10.1214/24-aoas1874","url":null,"abstract":"Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and heaping in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2147-2165"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BIVARIATE FUNCTIONAL PATTERNS OF LIFETIME MEDICARE COSTS AMONG ESRD PATIENTS. ESD 患者终身医疗保险费用的双变量功能模式。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1897

Yue Wang, Bin Nan, John D Kalbfleisch

In this work we study the lifetime Medicare spending patterns of patients with end-stage renal disease (ESRD). We extract the information of patients who started their ESRD services in 2007-2011 from the United States Renal Data System (USRDS). Patients are partitioned into three groups based on their kidney transplant status: 1-unwaitlisted and never transplanted, 2-waitlisted but never transplanted, and 3-waitlisted and then transplanted. To study their Medicare cost trajectories, we use a semiparametric regression model with both fixed and bivariate time-varying coefficients to compare groups 1 and 2, and a bivariate time-varying coefficient model with different starting times (time since the first ESRD service and time since the kidney transplant) to compare groups 2 and 3. In addition to demographics and other medical conditions, these regression models are conditional on the survival time, which ideally depict the lifetime Medicare spending patterns. For estimation, we extend the profile weighted least squares (PWLS) estimator to longitudinal data for the first comparison and propose a two-stage estimating method for the second comparison. We use sandwich variance estimators to construct confidence intervals and validate inference procedures through simulations. Our analysis of the Medicare claims data reveals that waitlisting is associated with a lower daily medical cost at the beginning of ESRD service among waitlisted patients which gradually increases over time. Averaging over lifespan, however, there is no difference between waitlisted and unwaitlisted groups. A kidney transplant, on the other hand, reduces the medical cost significantly after an initial spike.

在这项工作中，我们研究了终末期肾病（ESRD）患者的终生医疗保险支出模式。我们从美国肾脏数据系统（USRDS）中提取了 2007-2011 年开始接受 ESRD 服务的患者信息。根据患者的肾移植状态将其分为三组：1-未列入等待名单且从未移植；2-列入等待名单但从未移植；3-列入等待名单后移植。为了研究他们的医疗保险费用轨迹，我们使用了一个具有固定系数和双变量时变系数的半参数回归模型来比较第 1 组和第 2 组，以及一个具有不同起始时间（首次 ESRD 服务起始时间和肾移植起始时间）的双变量时变系数模型来比较第 2 组和第 3 组。除人口统计学和其他医疗条件外，这些回归模型还以生存时间为条件，从而理想地描绘出医疗保险的终生支出模式。在估算时，我们将剖面加权最小二乘法（PWLS）估算器扩展到纵向数据，用于第一组比较，并为第二组比较提出了两阶段估算方法。我们使用三明治方差估计器构建置信区间，并通过模拟验证推断程序。我们对医疗保险理赔数据的分析表明，在 ESRD 服务开始时，候补患者的每日医疗费用较低，而随着时间的推移，这一费用会逐渐增加。然而，从生命周期的平均值来看，候诊组和未候诊组之间并无差异。另一方面，肾移植在最初的峰值之后会显著降低医疗费用。

{"title":"BIVARIATE FUNCTIONAL PATTERNS OF LIFETIME MEDICARE COSTS AMONG ESRD PATIENTS.","authors":"Yue Wang, Bin Nan, John D Kalbfleisch","doi":"10.1214/24-aoas1897","DOIUrl":"10.1214/24-aoas1897","url":null,"abstract":"In this work we study the lifetime Medicare spending patterns of patients with end-stage renal disease (ESRD). We extract the information of patients who started their ESRD services in 2007-2011 from the United States Renal Data System (USRDS). Patients are partitioned into three groups based on their kidney transplant status: 1-unwaitlisted and never transplanted, 2-waitlisted but never transplanted, and 3-waitlisted and then transplanted. To study their Medicare cost trajectories, we use a semiparametric regression model with both fixed and bivariate time-varying coefficients to compare groups 1 and 2, and a bivariate time-varying coefficient model with different starting times (time since the first ESRD service and time since the kidney transplant) to compare groups 2 and 3. In addition to demographics and other medical conditions, these regression models are conditional on the survival time, which ideally depict the lifetime Medicare spending patterns. For estimation, we extend the profile weighted least squares (PWLS) estimator to longitudinal data for the first comparison and propose a two-stage estimating method for the second comparison. We use sandwich variance estimators to construct confidence intervals and validate inference procedures through simulations. Our analysis of the Medicare claims data reveals that waitlisting is associated with a lower daily medical cost at the beginning of ESRD service among waitlisted patients which gradually increases over time. Averaging over lifespan, however, there is no difference between waitlisted and unwaitlisted groups. A kidney transplant, on the other hand, reduces the medical cost significantly after an initial spike.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2596-2614"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation. 用于识别具有特定遗传调控模式的基因的自举模型比较检验。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1859

Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan

Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.

要全面了解导致复杂性状的功能途径，就必须了解遗传变异是如何影响基因表达的。尽管大量研究已经证实，许多基因在不同的人体组织和细胞类型中表达不同，但目前还没有工具可以识别表达受到不同调控的基因。在这里，我们介绍一种基于基因的方法--DRAB（自举法差异调控分析），用于测试不同组织或其他生物环境中的基因调控模式是否存在显著差异。DRAB 首先利用弹性网来学习局部基因调控的特定背景模型，然后应用一种新颖的基于引导的模型比较测试来检验它们的等效性。与以往的模型比较测试不同，我们提出的方法可以通过考虑特征选择和模型训练的可变性来确定群体级模型是否具有相同的预测性能。我们在基因型-组织表达（GTEx）项目中对来自各种人体组织的 mRNA 表达数据进行了 DRAB 验证。DRAB 得出了生物学上合理的结果，并有足够的能力检测出具有组织特异性调控特征的基因，同时有效控制了假阳性。我们的研究提供了一个框架，有助于确定差异调控基因的优先次序，从而有助于未来发现分子表型的遗传结构。

{"title":"A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation.","authors":"Mykhaylo M Malakhov, Ben Dai, Xiaotong T Shen, Wei Pan","doi":"10.1214/23-aoas1859","DOIUrl":"10.1214/23-aoas1859","url":null,"abstract":"Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1840-1857"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

INTEGRATING MENDELIAN RANDOMIZATION WITH CAUSAL MEDIATION ANALYSES FOR CHARACTERIZING DIRECT AND INDIRECT EXPOSURE-TO-OUTCOME EFFECTS.

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1901

Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett

Mendelian randomization (MR) assesses the total effect of exposure on outcome. With the rapidly increasing availability of summary statistics from genome-wide association studies (GWASs), MR leverages existing summary statistics and is widely used to study the causal effects among complex traits and diseases. The total effect in the population is a sum of indirect and direct effects. For complex disease outcomes with complicated etiologies, and/or for modifiable exposure traits, there may exist more than one pathway between exposure and outcome. The direct effect and the indirect effect via a mediator of interest could be of opposite directions, and the total effect estimates may not be informative for treatment and prevention decision-making or may be even misleading for different subgroups of patients. Causal mediation analysis delineates the indirect effect of exposure on outcome operating through the mediator and the direct effect transmitted through other mechanisms. However, causal mediation analysis often requires individual-level data measured on exposure, outcome, mediator and confounding variables, and the power of the mediation analysis is restricted by sample size. In this work, motivated by a study of the effects of atrial fibrillation (AF) on Alzheimer's dementia, we propose a framework for Integrative Mendelian randomization and Mediation Analysis (IMMA). The proposed method integrates the total effect estimates from MR analyses based on large-scale GWASs with the direct and indirect effect estimates from mediation analysis based on individual-level data of a limited sample size. We introduce a series of IMMA models, under the scenarios with or without exposure-mediator interaction and/or study heterogeneity. The proposed IMMA models improve the estimation and the power of inference on the direct and indirect effects in the population, as well as the characterization of the variation of effects. Our analyses showed a significant positive direct effect of AF on Alzheimer's dementia risk not through the use of the oral anticoagulant treatment and a significant indirect effect of AF-induced anticoagulant treatment in reducing Alzheimer's dementia risk. The results suggested potential Alzheimer's dementia risk prediction and prevention strategies for AF patients, and paved the way for future re-evaluation of anticoagulant treatment guidelines for AF patients. A sensitivity analysis was conducted to assess the sensitivity of the conclusions to a key assumption of the IMMA approach.

{"title":"INTEGRATING MENDELIAN RANDOMIZATION WITH CAUSAL MEDIATION ANALYSES FOR CHARACTERIZING DIRECT AND INDIRECT EXPOSURE-TO-OUTCOME EFFECTS.","authors":"Fan Yang, Lin S Chen, Shahram Oveisgharan, Dawood Darbar, David A Bennett","doi":"10.1214/24-aoas1901","DOIUrl":"10.1214/24-aoas1901","url":null,"abstract":"Mendelian randomization (MR) assesses the total effect of exposure on outcome. With the rapidly increasing availability of summary statistics from genome-wide association studies (GWASs), MR leverages existing summary statistics and is widely used to study the causal effects among complex traits and diseases. The total effect in the population is a sum of indirect and direct effects. For complex disease outcomes with complicated etiologies, and/or for modifiable exposure traits, there may exist more than one pathway between exposure and outcome. The direct effect and the indirect effect via a mediator of interest could be of opposite directions, and the total effect estimates may not be informative for treatment and prevention decision-making or may be even misleading for different subgroups of patients. Causal mediation analysis delineates the indirect effect of exposure on outcome operating through the mediator and the direct effect transmitted through other mechanisms. However, causal mediation analysis often requires individual-level data measured on exposure, outcome, mediator and confounding variables, and the power of the mediation analysis is restricted by sample size. In this work, motivated by a study of the effects of atrial fibrillation (AF) on Alzheimer's dementia, we propose a framework for Integrative Mendelian randomization and Mediation Analysis (IMMA). The proposed method integrates the total effect estimates from MR analyses based on large-scale GWASs with the direct and indirect effect estimates from mediation analysis based on individual-level data of a limited sample size. We introduce a series of IMMA models, under the scenarios with or without exposure-mediator interaction and/or study heterogeneity. The proposed IMMA models improve the estimation and the power of inference on the direct and indirect effects in the population, as well as the characterization of the variation of effects. Our analyses showed a significant positive direct effect of AF on Alzheimer's dementia risk not through the use of the oral anticoagulant treatment and a significant indirect effect of AF-induced anticoagulant treatment in reducing Alzheimer's dementia risk. The results suggested potential Alzheimer's dementia risk prediction and prevention strategies for AF patients, and paved the way for future re-evaluation of anticoagulant treatment guidelines for AF patients. A sensitivity analysis was conducted to assess the sensitivity of the conclusions to a key assumption of the IMMA approach.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2656-2677"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AN INTEGRATIVE NETWORK-BASED MEDIATION MODEL (NMM) TO ESTIMATE MULTIPLE GENETIC EFFECTS ON OUTCOMES MEDIATED BY FUNCTIONAL CONNECTIVITY. 一个综合网络为基础的中介模型（nmm），以估计多种遗传效应的结果介导的功能连接。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/24-aoas1880

Wei Dai, Heping Zhang

Functional connectivity of the brain, characterized by interconnected neural circuits across functional networks, is a cutting-edge feature in neuroimaging. It has the potential to mediate the effect of genetic variants on behavioral outcomes or diseases. Existing mediation analysis methods can evaluate the impact of genetics and brain structurefunction on cognitive behavior or disorders, but they tend to be limited to single genetic variants or univariate mediators, without considering cumulative genetic effects and the complex matrix and group and network structures of functional connectivity. To address this gap, the paper presents an integrative network-based mediation model (NMM) that estimates the effect of multiple genetic variants on behavioral outcomes or diseases mediated by functional connectivity. The model incorporates group information of inter-regions at broad network level and imposes low-rank and sparse assumptions to reflect the complex structures of functional connectivity and selecting network mediators simultaneously. We adopt block coordinate descent algorithm to implement a fast and efficient solution to our model. Simulation results indicate the efficacy of the model in selecting active mediators and reducing bias in effect estimation. With application to the Human Connectome Project Youth Adult (HCP-YA) study of 493 young adults, two genetic variants (rs769448 and rs769449) on the APOE4 gene are identified that lead to deficits in functional connectivity within visual networks and fluid intelligence.

大脑的功能连通性，其特征是跨功能网络的相互连接的神经回路，是神经影像学的前沿特征。它有可能介导基因变异对行为结果或疾病的影响。现有的中介分析方法可以评估遗传和大脑结构功能对认知行为或障碍的影响，但它们往往局限于单一遗传变异或单变量中介，而没有考虑累积遗传效应和功能连接的复杂矩阵、群体和网络结构。为了解决这一差距，本文提出了一个基于网络的综合中介模型（NMM），该模型估计了多种遗传变异对由功能连接介导的行为结果或疾病的影响。该模型在广泛的网络层面上整合了区域间的群体信息，并采用低秩和稀疏假设，同时反映了功能连通性和网络中介选择的复杂结构。采用块坐标下降算法对模型进行快速有效的求解。仿真结果表明了该模型在选择有效介质和减少效应估计偏差方面的有效性。在人类连接组计划青年成人（HCP-YA）对493名年轻人的研究中，发现APOE4基因上的两个遗传变异（rs769448和rs769449）导致视觉网络和流体智力的功能连接缺陷。

{"title":"AN INTEGRATIVE NETWORK-BASED MEDIATION MODEL (NMM) TO ESTIMATE MULTIPLE GENETIC EFFECTS ON OUTCOMES MEDIATED BY FUNCTIONAL CONNECTIVITY.","authors":"Wei Dai, Heping Zhang","doi":"10.1214/24-aoas1880","DOIUrl":"10.1214/24-aoas1880","url":null,"abstract":"Functional connectivity of the brain, characterized by interconnected neural circuits across functional networks, is a cutting-edge feature in neuroimaging. It has the potential to mediate the effect of genetic variants on behavioral outcomes or diseases. Existing mediation analysis methods can evaluate the impact of genetics and brain structurefunction on cognitive behavior or disorders, but they tend to be limited to single genetic variants or univariate mediators, without considering cumulative genetic effects and the complex matrix and group and network structures of functional connectivity. To address this gap, the paper presents an integrative network-based mediation model (NMM) that estimates the effect of multiple genetic variants on behavioral outcomes or diseases mediated by functional connectivity. The model incorporates group information of inter-regions at broad network level and imposes low-rank and sparse assumptions to reflect the complex structures of functional connectivity and selecting network mediators simultaneously. We adopt block coordinate descent algorithm to implement a fast and efficient solution to our model. Simulation results indicate the efficacy of the model in selecting active mediators and reducing bias in effect estimation. With application to the Human Connectome Project Youth Adult (HCP-YA) study of 493 young adults, two genetic variants (rs769448 and rs769449) on the APOE4 gene are identified that lead to deficits in functional connectivity within visual networks and fluid intelligence.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2277-2294"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK. 在选择偏差的情况下利用电子病历招募患者：两阶段抽样框架。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1860

Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi

Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.

电子健康记录（EHR）越来越被认为是临床研究中招募病人的一种具有成本效益的资源。然而，如何从数以百万计的个体中最优化地选择一个队列来回答感兴趣的科学问题仍不清楚。考虑一项估算昂贵结果的平均值或平均差的研究。患者的健康记录中通常可能存在可预测结果的廉价辅助协变量，这为有选择性地招募患者提供了机会，可提高下游分析的效率。在本文中，我们提出了一种两阶段抽样设计，充分利用电子病历数据中可用的辅助协变量信息。使用电子病历数据进行多阶段抽样的一个主要挑战是潜在的选择偏差，因为电子病历数据并不一定代表目标人群。我们扩展了有关两阶段抽样设计的现有文献，推导出了一种最佳的两阶段抽样方法，它比随机抽样提高了效率，同时考虑到了电子病历数据中潜在的选择偏差。我们通过模拟研究和一个评估美国成年人高血压患病率的应用，利用密歇根基因组学倡议（Michigan Genomics Initiative）的数据（密歇根医学的一个纵向生物库），证明了我们的抽样设计提高了效率。

{"title":"PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.","authors":"Guanghao Zhang, Lauren J Beesley, Bhramar Mukherjee, X U Shi","doi":"10.1214/23-aoas1860","DOIUrl":"10.1214/23-aoas1860","url":null,"abstract":"Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"1858-1878"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19. 与 covid-19 相关的临床测量模式的非参数混合效应混合物模型。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-09-01 Epub Date: 2024-08-05 DOI: 10.1214/23-aoas1871

Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang

Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respond differently to infection and can further be used to identify infected individuals. We propose a flexible nonparametric mixed-effects mixture model that identifies risk factors and classifies patients with biological changes. We model the latent probability of biological changes using a logistic regression model and trajectories in the latent groups using smoothing splines. We developed an EM algorithm to maximize the penalized likelihood for estimating all parameters and mean functions. We evaluate our methods by simulations and apply the proposed model to investigate changes in temperature in a cohort of COVID-19-infected hemodialysis patients.

一些 COVID-19 患者在接受 SARS-CoV-2 阳性检测前几天体温和血氧饱和度等体征和症状发生变化，而另一些患者则仍无症状。确定这些亚群并了解与这些亚群相关的生物学和临床预测因素非常重要。这些信息将有助于了解免疫系统如何对感染做出不同的反应，并可进一步用于识别感染者。我们提出了一种灵活的非参数混合效应模型，该模型可识别风险因素，并根据生物变化对患者进行分类。我们使用逻辑回归模型对生物变化的潜伏概率进行建模，并使用平滑样条对潜伏组的轨迹进行建模。我们开发了一种 EM 算法，用于最大化估计所有参数和均值函数的惩罚似然。我们通过模拟评估了我们的方法，并将所提出的模型应用于研究 COVID-19 感染血液透析患者队列中的体温变化。

{"title":"A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19.","authors":"Xiaoran Ma, Wensheng Guo, Mengyang Gu, Len Usvyat, Peter Kotanko, Yuedong Wang","doi":"10.1214/23-aoas1871","DOIUrl":"10.1214/23-aoas1871","url":null,"abstract":"Some patients with COVID-19 show changes in signs and symptoms such as temperature and oxygen saturation days before being positively tested for SARS-CoV-2, while others remain asymptomatic. It is important to identify these subgroups and to understand what biological and clinical predictors are related to these subgroups. This information will provide insights into how the immune system may respond differently to infection and can further be used to identify infected individuals. We propose a flexible nonparametric mixed-effects mixture model that identifies risk factors and classifies patients with biological changes. We model the latent probability of biological changes using a logistic regression model and trajectories in the latent groups using smoothing splines. We developed an EM algorithm to maximize the penalized likelihood for estimating all parameters and mean functions. We evaluate our methods by simulations and apply the proposed model to investigate changes in temperature in a cohort of COVID-19-infected hemodialysis patients.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 3","pages":"2080-2095"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142394985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0