首页 > 最新文献

Annals of Applied Statistics最新文献

英文 中文
BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS. 利用多领域口头尸检的贝叶斯嵌套潜类模型确定死因。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1826
Zehang Richard Li, Zhenke Wu, Irena Chen, Samuel J Clark

Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs, as labeled data are usually unavailable in the target population. This article proposes a latent class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assigns causes of death for out-of-domain observations and estimates cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop a computationally efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary Material and reproducible analysis codes are available online. The R package LCVA implementing the method is available on GitHub (https://github.com/richardli/LCVA).

了解特定病因死亡率对于监测人口健康和设计公共卫生干预措施至关重要。在世界范围内,三分之二的死亡没有指定死因。口头尸检(VA)是一种行之有效的工具,通过对死者的护理人员进行调查,收集医院外的死亡信息。在许多中低收入国家,这种方法已成为常规。使用尸体解剖确定死因的统计算法通常容易受到用于训练模型的数据与目标人群之间分布变化的影响。由于目标人群中通常没有标注数据,这给分析 VAs 带来了重大挑战。本文提出了一种针对退伍军人数据的潜类模型框架(LCVA),该框架可对多个异质领域收集的退伍军人数据进行联合建模,为领域外观测数据指定死因,并估算新领域的特定死因死亡率分数。我们使用嵌套潜类模型对收集到的症状的联合分布进行了简明表述,并开发了一种计算高效的后验推断算法。我们证明 LCVA 在预测性能和可扩展性方面优于现有方法。补充材料和可重复的分析代码可在线获取。实现该方法的 R 软件包 LCVA 可在 GitHub 上获取 (https://github.com/richardli/LCVA)。
{"title":"BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS.","authors":"Zehang Richard Li, Zhenke Wu, Irena Chen, Samuel J Clark","doi":"10.1214/23-aoas1826","DOIUrl":"10.1214/23-aoas1826","url":null,"abstract":"<p><p>Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many low- and middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs, as labeled data are usually unavailable in the target population. This article proposes a latent class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assigns causes of death for out-of-domain observations and estimates cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop a computationally efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary Material and reproducible analysis codes are available online. The R package LCVA implementing the method is available on GitHub (https://github.com/richardli/LCVA).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1137-1159"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TENSOR QUANTILE REGRESSION WITH LOW-RANK TENSOR TRAIN ESTIMATION. 张量量子回归与低等级张量列车估计。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1835
Zihuan Liu, Cheuk Yin Lee, Heping Zhang

Neuroimaging studies often involve predicting a scalar outcome from an array of images collectively called tensor. The use of magnetic resonance imaging (MRI) provides a unique opportunity to investigate the structures of the brain. To learn the association between MRI images and human intelligence, we formulate a scalar-on-image quantile regression framework. However, the high dimensionality of the tensor makes estimating the coefficients for all elements computationally challenging. To address this, we propose a low-rank coefficient array estimation algorithm based on tensor train (TT) decomposition which we demonstrate can effectively reduce the dimensionality of the coefficient tensor to a feasible level while ensuring adequacy to the data. Our method is more stable and efficient compared to the commonly used, Canonic Polyadic rank approximation-based method. We also propose a generalized Lasso penalty on the coefficient tensor to take advantage of the spatial structure of the tensor, further reduce the dimensionality of the coefficient tensor, and improve the interpretability of the model. The consistency and asymptotic normality of the TT estimator are established under some mild conditions on the covariates and random errors in quantile regression models. The rate of convergence is obtained with regularization under the total variation penalty. Extensive numerical studies, including both synthetic and real MRI imaging data, are conducted to examine the empirical performance of the proposed method and its competitors.

神经成像研究通常涉及从统称为张量的图像阵列中预测标量结果。磁共振成像(MRI)的使用为研究大脑结构提供了独特的机会。为了了解核磁共振成像图像与人类智力之间的关联,我们制定了一个图像标量量化回归框架。然而,张量的高维度使得估算所有元素的系数在计算上具有挑战性。为了解决这个问题,我们提出了一种基于张量列车(TT)分解的低秩系数阵列估计算法,我们证明这种算法可以有效地将系数张量的维度降低到可行的水平,同时确保数据的充分性。与常用的基于卡诺尼多模秩近似的方法相比,我们的方法更稳定、更高效。我们还提出了对系数张量的广义 Lasso 惩罚,以利用张量的空间结构,进一步降低系数张量的维度,提高模型的可解释性。在量化回归模型的协变量和随机误差的一些温和条件下,建立了 TT 估计器的一致性和渐近正态性。在总变异惩罚下,通过正则化获得了收敛率。我们还进行了广泛的数值研究,包括合成和真实的核磁共振成像数据,以检验所提出的方法及其竞争对手的经验性能。
{"title":"TENSOR QUANTILE REGRESSION WITH LOW-RANK TENSOR TRAIN ESTIMATION.","authors":"Zihuan Liu, Cheuk Yin Lee, Heping Zhang","doi":"10.1214/23-aoas1835","DOIUrl":"10.1214/23-aoas1835","url":null,"abstract":"<p><p>Neuroimaging studies often involve predicting a scalar outcome from an array of images collectively called tensor. The use of magnetic resonance imaging (MRI) provides a unique opportunity to investigate the structures of the brain. To learn the association between MRI images and human intelligence, we formulate a scalar-on-image quantile regression framework. However, the high dimensionality of the tensor makes estimating the coefficients for all elements computationally challenging. To address this, we propose a low-rank coefficient array estimation algorithm based on tensor train (TT) decomposition which we demonstrate can effectively reduce the dimensionality of the coefficient tensor to a feasible level while ensuring adequacy to the data. Our method is more stable and efficient compared to the commonly used, Canonic Polyadic rank approximation-based method. We also propose a generalized Lasso penalty on the coefficient tensor to take advantage of the spatial structure of the tensor, further reduce the dimensionality of the coefficient tensor, and improve the interpretability of the model. The consistency and asymptotic normality of the TT estimator are established under some mild conditions on the covariates and random errors in quantile regression models. The rate of convergence is obtained with regularization under the total variation penalty. Extensive numerical studies, including both synthetic and real MRI imaging data, are conducted to examine the empirical performance of the proposed method and its competitors.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1294-1318"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11046526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140865777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPATIAL PREDICTIONS ON PHYSICALLY CONSTRAINED DOMAINS: APPLICATIONS TO ARCTIC SEA SALINITY DATA. 物理约束域的空间预测:北极海盐度数据的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1850
Bora Jin, Amy H Herring, David Dunson

In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on areas of water mistakenly flagged as ice by satellite algorithms. To remove bias in the retrieval of salinity near sea ice, the algorithms use conservative ice masks, which result in considerable loss of data. We aim to produce realistic SSS values for such regions to obtain more complete understanding about the SSS surface over the Arctic Ocean and benefit future applications that may require SSS measurements near edges of sea ice or coasts. We propose a class of scalable nonstationary processes that can handle large data from satellite products and complex geometries of the Arctic Ocean. Barrier overlap-removal acyclic directed graph GP (BORA-GP) constructs sparse directed acyclic graphs (DAGs) with neighbors conforming to barriers and boundaries, enabling characterization of dependence in constrained domains. The BORA-GP models produce more sensible SSS values in regions without satellite measurements and show improved performance in various constrained domains in simulation studies compared to state-of-the-art alternatives. An R package is available at https://github.com/jinbora0720/boraGP.

本文利用卫星观测资料对北冰洋海面盐度进行了预测。SSS是北冰洋持续变化的关键指标,可以提供有关气候变化的重要见解。我们特别关注那些被卫星算法错误地标记为冰的水域。为了消除海冰附近盐度检索中的偏差,算法使用了保守的冰掩模,这导致了相当大的数据损失。我们的目标是为这些地区产生现实的SSS值,以获得对北冰洋SSS表面更完整的了解,并有利于未来可能需要在海冰边缘或海岸附近测量SSS的应用。我们提出了一类可扩展的非平稳过程,可以处理来自卫星产品和北冰洋复杂几何形状的大数据。屏障重叠去除无环有向图GP (BORA-GP)构建了具有符合屏障和边界的稀疏有向无环图(dag),从而能够表征约束域中的依赖性。BORA-GP模型在没有卫星测量的地区产生更合理的SSS值,并且在模拟研究中与最先进的替代方案相比,在各种约束域中显示出更好的性能。R包可在https://github.com/jinbora0720/boraGP上获得。
{"title":"SPATIAL PREDICTIONS ON PHYSICALLY CONSTRAINED DOMAINS: APPLICATIONS TO ARCTIC SEA SALINITY DATA.","authors":"Bora Jin, Amy H Herring, David Dunson","doi":"10.1214/23-aoas1850","DOIUrl":"10.1214/23-aoas1850","url":null,"abstract":"<p><p>In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on areas of water mistakenly flagged as ice by satellite algorithms. To remove bias in the retrieval of salinity near sea ice, the algorithms use conservative ice masks, which result in considerable loss of data. We aim to produce realistic SSS values for such regions to obtain more complete understanding about the SSS surface over the Arctic Ocean and benefit future applications that may require SSS measurements near edges of sea ice or coasts. We propose a class of scalable nonstationary processes that can handle large data from satellite products and complex geometries of the Arctic Ocean. Barrier overlap-removal acyclic directed graph GP (BORA-GP) constructs sparse directed acyclic graphs (DAGs) with neighbors conforming to barriers and boundaries, enabling characterization of dependence in constrained domains. The BORA-GP models produce more sensible SSS values in regions without satellite measurements and show improved performance in various constrained domains in simulation studies compared to state-of-the-art alternatives. An R package is available at https://github.com/jinbora0720/boraGP.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1596-1617"},"PeriodicalIF":1.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12391905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MASH: MEDIATION ANALYSIS OF SURVIVAL OUTCOME AND HIGH-DIMENSIONAL OMICS MEDIATORS WITH APPLICATION TO COMPLEX DISEASES. mash:生存结果和高维 omics 中介因子的中介分析,适用于复杂疾病。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1838
Sunyi Chi, Christopher R Flowers, Ziyi Li, Xuelin Huang, Peng Wei

Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. However, little work has been done on mediation analysis when the mediators are high-dimensional and the outcome is a survival endpoint, and none of it has provided a robust measure of total mediation effect. To this end, we propose an estimation procedure for Mediation Analysis of Survival outcome and High-dimensional omics mediators (MASH) based on sure independence screening for putative mediator variable selection and a second-moment-based measure of total mediation effect for survival data analogous to the R 2 measure in a linear model. Extensive simulations showed good performance of MASH in estimating the total mediation effect and identifying true mediators. By applying MASH to the metabolomics data of 1919 subjects in the Framingham Heart Study, we identified five metabolites as mediators of the effect of cigarette smoking on coronary heart disease risk (total mediation effect, 51.1%) and two metabolites as mediators between smoking and risk of cancer (total mediation effect, 50.7%). Application of MASH to a diffuse large B-cell lymphoma genomics data set identified copy-number variations for eight genes as mediators between the baseline International Prognostic Index score and overall survival.

吸烟等环境暴露通过中间分子表型(如甲基组、转录组和代谢组)影响健康结果。中介分析是研究潜在高维中间表型在环境暴露与健康结果之间关系中的作用的有用工具。然而,当中介因素是高维的,而结果是生存终点时,中介分析方面的工作很少,而且没有一项工作提供了总中介效应的稳健测量方法。为此,我们提出了一种生存结果与高维 omics 中介因子中介分析(MASH)的估算程序,该程序基于对推定中介变量选择的确定独立性筛选,以及对生存数据的总中介效应的基于第二时刻的测量,类似于线性模型中的 R 2 测量。大量模拟结果表明,MASH 在估计总中介效应和识别真正的中介因子方面表现出色。通过将 MASH 应用于弗雷明汉心脏研究中 1919 名受试者的代谢组学数据,我们确定了五种代谢物是吸烟对冠心病风险影响的中介物(总中介效应为 51.1%),两种代谢物是吸烟与癌症风险之间的中介物(总中介效应为 50.7%)。将 MASH 应用于弥漫大 B 细胞淋巴瘤基因组学数据集,发现 8 个基因的拷贝数变异是基线国际预后指数评分与总生存期之间的中介因子。
{"title":"MASH: MEDIATION ANALYSIS OF SURVIVAL OUTCOME AND HIGH-DIMENSIONAL OMICS MEDIATORS WITH APPLICATION TO COMPLEX DISEASES.","authors":"Sunyi Chi, Christopher R Flowers, Ziyi Li, Xuelin Huang, Peng Wei","doi":"10.1214/23-aoas1838","DOIUrl":"10.1214/23-aoas1838","url":null,"abstract":"<p><p>Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. However, little work has been done on mediation analysis when the mediators are high-dimensional and the outcome is a survival endpoint, and none of it has provided a robust measure of total mediation effect. To this end, we propose an estimation procedure for Mediation Analysis of Survival outcome and High-dimensional omics mediators (MASH) based on sure independence screening for putative mediator variable selection and a second-moment-based measure of total mediation effect for survival data analogous to the <math> <mrow><msup><mi>R</mi> <mn>2</mn></msup> </mrow> </math> measure in a linear model. Extensive simulations showed good performance of MASH in estimating the total mediation effect and identifying true mediators. By applying MASH to the metabolomics data of 1919 subjects in the Framingham Heart Study, we identified five metabolites as mediators of the effect of cigarette smoking on coronary heart disease risk (total mediation effect, 51.1%) and two metabolites as mediators between smoking and risk of cancer (total mediation effect, 50.7%). Application of MASH to a diffuse large B-cell lymphoma genomics data set identified copy-number variations for eight genes as mediators between the baseline International Prognostic Index score and overall survival.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1360-1377"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VARIANCE AS A PREDICTOR OF HEALTH OUTCOMES: SUBJECT-LEVEL TRAJECTORIES AND VARIABILITY OF SEX HORMONES TO PREDICT BODY FAT CHANGES IN PERI- AND POSTMENOPAUSAL WOMEN. 方差作为健康结果的预测因子:受试者水平的性激素轨迹和可变性预测围绝经期和绝经后妇女体脂变化
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1852
Irena Chen, Zhenke Wu, Siobán D Harlow, Carrie A Karvonen-Gutierrez, Michelle M Hood, Michael R Elliott

Longitudinal biomarker data and cross-sectional outcomes are routinely collected in modern epidemiology studies, often with the goal of informing tailored early intervention decisions. For example, hormones, such as estradiol (E2) and follicle-stimulating hormone (FSH), may predict changes in womens' health during the midlife. Most existing methods focus on constructing predictors from mean marker trajectories. However, subject-level biomarker variability may also provide critical information about disease risks and health outcomes. Current literature does not provide statistical models to investigate such relationships with valid uncertainty quantification. In this paper we develop a fully Bayesian joint model that estimates subject-level means, variances, and covariances of multiple longitudinal biomarkers and uses these as predictors to evaluate their respective associations with a cross-sectional health outcome. Simulations demonstrate excellent recovery of true model parameters. The proposed method provides less biased and more efficient estimates, relative to alternative approaches that either ignore subject-level differences in variances or perform two-stage estimation where estimated marker variances are treated as observed. Empowered by the model, analyses of women's health data reveal, for the first time, that larger variability of E2 was associated with slower increases in waist circumference across the menopausal transition.

在现代流行病学研究中,定期收集纵向生物标志物数据和横断面结果,通常目的是为量身定制的早期干预决策提供信息。例如,激素,如雌二醇(E2)和卵泡刺激素(FSH),可以预测中年妇女健康状况的变化。大多数现有的方法侧重于从平均标记轨迹构建预测器。然而,受试者水平的生物标志物可变性也可能提供有关疾病风险和健康结果的关键信息。目前的文献没有提供统计模型来研究这种关系与有效的不确定性量化。在本文中,我们开发了一个全贝叶斯联合模型,该模型估计了多个纵向生物标志物的受试者水平均值、方差和协方差,并使用这些作为预测因子来评估它们各自与横断面健康结果的关联。仿真结果表明,该方法能很好地恢复真实模型参数。与忽略受试者水平方差差异或执行两阶段估计(其中估计的标记方差被视为观察到的)的替代方法相比,所提出的方法提供了更少的偏差和更有效的估计。在该模型的支持下,对女性健康数据的分析首次表明,E2的较大变异性与绝经期腰围增长较慢有关。
{"title":"VARIANCE AS A PREDICTOR OF HEALTH OUTCOMES: SUBJECT-LEVEL TRAJECTORIES AND VARIABILITY OF SEX HORMONES TO PREDICT BODY FAT CHANGES IN PERI- AND POSTMENOPAUSAL WOMEN.","authors":"Irena Chen, Zhenke Wu, Siobán D Harlow, Carrie A Karvonen-Gutierrez, Michelle M Hood, Michael R Elliott","doi":"10.1214/23-aoas1852","DOIUrl":"10.1214/23-aoas1852","url":null,"abstract":"<p><p>Longitudinal biomarker data and cross-sectional outcomes are routinely collected in modern epidemiology studies, often with the goal of informing tailored early intervention decisions. For example, hormones, such as estradiol (E2) and follicle-stimulating hormone (FSH), may predict changes in womens' health during the midlife. Most existing methods focus on constructing predictors from mean marker trajectories. However, subject-level biomarker variability may also provide critical information about disease risks and health outcomes. Current literature does not provide statistical models to investigate such relationships with valid uncertainty quantification. In this paper we develop a fully Bayesian joint model that estimates subject-level means, variances, and covariances of multiple longitudinal biomarkers and uses these as predictors to evaluate their respective associations with a cross-sectional health outcome. Simulations demonstrate excellent recovery of true model parameters. The proposed method provides less biased and more efficient estimates, relative to alternative approaches that either ignore subject-level differences in variances or perform two-stage estimation where estimated marker variances are treated as observed. Empowered by the model, analyses of women's health data reveal, for the first time, that larger variability of E2 was associated with slower increases in waist circumference across the menopausal transition.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1642-1667"},"PeriodicalIF":1.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA. 根据美国社区调查、人口估计计划和十年一次的人口普查数据,建立一个考虑到数据源特定方法的贝叶斯分层小地区人口模型。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1849
Emily N Peterson, Rachel C Nethery, Tullia Padellini, Jarvis T Chen, Brent A Coull, Frédéric B Piel, Jon Wakefield, Marta Blangiardo, Lance A Waller

Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts and their associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006-2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.

小地区人口统计是许多流行病学研究的必要条件,但其质量和准确性往往得不到评估。在美国,小地区人口统计由美国人口普查局(USCB)以十年一次的人口普查计数、普查间人口预测(PEP)和美国社区调查(ACS)估计值的形式发布。虽然这三个数据源之间存在重要关系,但在数据收集、数据可用性和处理方法方面存在重要差异,因此每套报告的人口数量可能会受到不同来源和不同程度误差的影响。此外,由于每个数据源都会进行特定的调查后调整,因此这些数据源报告的小地区人口数并不完全相同。因此,在公共卫生研究中,小地区疾病/死亡率可能会因分母数据使用的数据源不同而不同。为了准确估算年度小地区人口数量及其相关的不确定性,我们提出了一个贝叶斯人口(BPop)模型,该模型融合了 USCB 所有三个来源的信息,并考虑了数据源特定的方法和相关误差。考虑到所有三个 USCB 人口估计中观察到的趋势,我们对真实人口及其相关不确定性进行了全面的小区域种族分层估计。我们的框架的主要特点是(1) 整合多个数据源的单一模型,(2) 考虑到数据源特定的数据生成机制,特别是考虑到数据源特定的误差,以及 (3) 对没有 USCB 报告数据的年份的人口数量进行预测。我们的研究重点是佐治亚州 159 个县的黑人和白人人口,并得出 2006-2023 年的估计值。我们将 BPop 人口估计值与十年一次的人口普查计数、PEP 年度计数和 ACS 多年估计值进行了比较。此外,我们还说明并解释了不同类型的数据源特定误差。最后,我们通过模拟和验证练习来比较模型的性能。我们的贝叶斯人口模型可扩展到其他应用领域,如更小的空间粒度、按种族、年龄和性别进一步定义的人口亚群,以及/或其他地理区域。
{"title":"A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA.","authors":"Emily N Peterson, Rachel C Nethery, Tullia Padellini, Jarvis T Chen, Brent A Coull, Frédéric B Piel, Jon Wakefield, Marta Blangiardo, Lance A Waller","doi":"10.1214/23-aoas1849","DOIUrl":"https://doi.org/10.1214/23-aoas1849","url":null,"abstract":"<p><p>Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts <i>and their</i> associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006-2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1565-1595"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data. 选择无效工具,利用双样本汇总数据改进孟德尔随机化。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-04-05 eCollection Date: 2024-06-01 DOI: 10.1214/23-AOAS1856
Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess

Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.

孟德尔随机法(Mendelian randomization,MR)是一种广泛使用的估算风险因素与疾病之间因果关系的方法。任何 MR 分析的一个基本部分都是选择适当的遗传变异作为工具变量。全基因组关联研究通常会发现,数以百计的遗传变异可能与某一风险因素密切相关,但在某些情况下,研究者可能只对较小变异子集的工具有效性更有信心。尽管如此,从均方误差的角度来看,使用额外的工具可能是最佳的,即使这些工具稍有无效;估计中的微小偏差可能是值得付出的代价,以换取方差的较大减少。为此,我们考虑了一种 "有针对性 "的工具选择方法,即通过选择遗传变异来最小化因果效应估计的渐近均方误差。在存在许多弱工具和局部无效工具的情况下,我们提出了一种新的策略来构建选择后集中估计器的置信区间,以防止渐近覆盖率的最坏情况损失。在(i) 验证脂质药物靶点;(ii) 调查维生素 D 对各种结果的影响的经验应用中,我们的研究结果表明,最佳工具选择不仅涉及少量生物学上合理的工具,还涉及许多潜在的无效工具。
{"title":"Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data.","authors":"Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess","doi":"10.1214/23-AOAS1856","DOIUrl":"10.1214/23-AOAS1856","url":null,"abstract":"<p><p>Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for \"focused\" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL. 估计异质幸存者因果效应的贝叶斯机器学习方法:应用于重症监护试验。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1792
Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li

Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.

评估治疗效果的异质性在因果推理领域越来越流行,对临床决策具有重要意义。虽然已有大量文献用于研究完全观察结果时治疗效果的异质性,但当以患者为中心的结果被死亡等终末事件截断时,用于估计异质性因果效应的工具的发展还很有限。由于死亡发生在研究随访期间,许多参与者的相关结果是不可观察、未定义或未完全观察到的,在这种情况下,主分层是得出有效因果结论的一个有吸引力的框架。受急性呼吸窘迫综合征网络(ARDSNetwork)ARDS 呼吸管理(ARMA)试验的启发,我们开发了一种灵活的贝叶斯机器学习方法,用于估算当临床结果受到截断影响时,始终存活者层中的平均因果效应和异质性因果效应。我们采用贝叶斯加性回归树(BART),灵活地为潜在结果和潜在分层成员指定单独的均值模型。在对 ARMA 试验的分析中,我们发现低潮气量治疗对急性肺损伤参与者的总体获益在于重返家园的时间,但在始终存活者中,治疗效果存在很大的异质性,这主要受生物性别和基线肺泡-动脉血氧梯度(肺功能和低氧血症程度的生理学测量指标)的影响。这些发现说明了所提出的方法可如何指导该领域未来试验的预后丰富化。
{"title":"A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL.","authors":"Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li","doi":"10.1214/23-aoas1792","DOIUrl":"10.1214/23-aoas1792","url":null,"abstract":"<p><p>Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"350-374"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS. 简单灵活的样本可交换性测试,应用于统计基因组学。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1817
Alan J Aw, Jeffrey P Spence, Yun S Song

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the p-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

在涉及多变量数据分析的科学研究中,研究人员经常会遇到一些基本但重要的问题: 样本是否可交换,即样本的联合分布与单位排序无关?特征是否相互独立,或者特征是否可以分组,从而使各组相互独立?在统计基因组学中,这些考虑因素对于人口推断和构建多基因风险评分等下游任务至关重要。我们提出了一种非参数方法(我们称之为 V 检验)来解决这两个问题,即给定特征依赖结构的样本可交换性检验和给定样本可交换性的特征独立性检验。我们的检验方法概念简单、快速灵活。它能在现实场景中控制 I 类误差,并利用大样本渐近学处理任意维度的数据。通过大量的模拟以及与基于随机矩阵理论的无监督分层检验的比较,我们发现我们的检验在各种感兴趣的情况下都表现出色。我们将该检验应用于 1000 基因组计划的数据,展示了如何利用它来评估基因样本的可交换性,或为下游分析找到最佳的连锁不平衡(LD)分割。在可交换性评估中,我们发现去除罕见变异可大幅提高检验统计量的 p 值。对于最优 LD 分割,V 检验报告的最优分割与之前不依赖假设检验的方法不同。我们的方法可在 R(CRAN:flintyR)和 Python(PyPI:flintyPy)中使用。
{"title":"A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.","authors":"Alan J Aw, Jeffrey P Spence, Yun S Song","doi":"10.1214/23-aoas1817","DOIUrl":"10.1214/23-aoas1817","url":null,"abstract":"<p><p>In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the <math><mi>p</mi></math>-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"858-881"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11115382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141089297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY. 使用同步回归校准法,利用受控喂养研究中开发的生物标记物,研究多种易出错暴露对疾病风险的影响。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1782
Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng

Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.

自我报告数据中的系统测量误差给膳食摄入量与慢性疾病风险之间的关联研究带来了重大挑战,尤其是在对多种膳食成分进行联合研究时。当所有相关膳食成分都有客观测量的生物标志物时,联合回归校准法就可用于测量误差校正。遗憾的是,只有极少数膳食成分有客观测量的生物标志物,这限制了联合回归校准法的应用。最近,针对单一膳食成分进行了控制喂养研究,为更多膳食成分开发了新的生物标志物。然而,目前还不清楚针对单一膳食成分单独开发的生物标志物是否适用于联合校准。本文表明,针对单一膳食成分开发的生物标记物不能用于联合回归校准。我们提出了新的方法,利用控制喂养研究来开发用于联合回归校准的有效生物标志物,以估算多种膳食成分同时与相关疾病之间的关联。我们推导出了拟议估计器的渐近分布理论。我们进行了广泛的模拟,以研究拟议估计器的有限样本性能。我们利用妇女健康倡议队列数据,将我们的方法应用于研究钠和钾摄入量对心血管疾病发病率的共同影响。我们发现钠摄入量与心血管疾病之间存在正相关,而钾摄入量与心血管疾病之间存在负相关。
{"title":"USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY.","authors":"Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng","doi":"10.1214/23-aoas1782","DOIUrl":"10.1214/23-aoas1782","url":null,"abstract":"<p><p>Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"125-143"},"PeriodicalIF":1.4,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10836829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139681864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1