Annals of Applied Statistics最新文献_第2页

MASH: MEDIATION ANALYSIS OF SURVIVAL OUTCOME AND HIGH-DIMENSIONAL OMICS MEDIATORS WITH APPLICATION TO COMPLEX DISEASES. mash：生存结果和高维 omics 中介因子的中介分析，适用于复杂疾病。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1838

Sunyi Chi, Christopher R Flowers, Ziyi Li, Xuelin Huang, Peng Wei

Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. However, little work has been done on mediation analysis when the mediators are high-dimensional and the outcome is a survival endpoint, and none of it has provided a robust measure of total mediation effect. To this end, we propose an estimation procedure for Mediation Analysis of Survival outcome and High-dimensional omics mediators (MASH) based on sure independence screening for putative mediator variable selection and a second-moment-based measure of total mediation effect for survival data analogous to the $R^{2}$ measure in a linear model. Extensive simulations showed good performance of MASH in estimating the total mediation effect and identifying true mediators. By applying MASH to the metabolomics data of 1919 subjects in the Framingham Heart Study, we identified five metabolites as mediators of the effect of cigarette smoking on coronary heart disease risk (total mediation effect, 51.1%) and two metabolites as mediators between smoking and risk of cancer (total mediation effect, 50.7%). Application of MASH to a diffuse large B-cell lymphoma genomics data set identified copy-number variations for eight genes as mediators between the baseline International Prognostic Index score and overall survival.

吸烟等环境暴露通过中间分子表型（如甲基组、转录组和代谢组）影响健康结果。中介分析是研究潜在高维中间表型在环境暴露与健康结果之间关系中的作用的有用工具。然而，当中介因素是高维的，而结果是生存终点时，中介分析方面的工作很少，而且没有一项工作提供了总中介效应的稳健测量方法。为此，我们提出了一种生存结果与高维 omics 中介因子中介分析（MASH）的估算程序，该程序基于对推定中介变量选择的确定独立性筛选，以及对生存数据的总中介效应的基于第二时刻的测量，类似于线性模型中的 R 2 测量。大量模拟结果表明，MASH 在估计总中介效应和识别真正的中介因子方面表现出色。通过将 MASH 应用于弗雷明汉心脏研究中 1919 名受试者的代谢组学数据，我们确定了五种代谢物是吸烟对冠心病风险影响的中介物（总中介效应为 51.1%），两种代谢物是吸烟与癌症风险之间的中介物（总中介效应为 50.7%）。将 MASH 应用于弥漫大 B 细胞淋巴瘤基因组学数据集，发现 8 个基因的拷贝数变异是基线国际预后指数评分与总生存期之间的中介因子。

{"title":"MASH: MEDIATION ANALYSIS OF SURVIVAL OUTCOME AND HIGH-DIMENSIONAL OMICS MEDIATORS WITH APPLICATION TO COMPLEX DISEASES.","authors":"Sunyi Chi, Christopher R Flowers, Ziyi Li, Xuelin Huang, Peng Wei","doi":"10.1214/23-aoas1838","DOIUrl":"https://doi.org/10.1214/23-aoas1838","url":null,"abstract":"Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. However, little work has been done on mediation analysis when the mediators are high-dimensional and the outcome is a survival endpoint, and none of it has provided a robust measure of total mediation effect. To this end, we propose an estimation procedure for Mediation Analysis of Survival outcome and High-dimensional omics mediators (MASH) based on sure independence screening for putative mediator variable selection and a second-moment-based measure of total mediation effect for survival data analogous to the <math> <mrow><msup><mi>R</mi> <mn>2</mn></msup> </mrow> </math> measure in a linear model. Extensive simulations showed good performance of MASH in estimating the total mediation effect and identifying true mediators. By applying MASH to the metabolomics data of 1919 subjects in the Framingham Heart Study, we identified five metabolites as mediators of the effect of cigarette smoking on coronary heart disease risk (total mediation effect, 51.1%) and two metabolites as mediators between smoking and risk of cancer (total mediation effect, 50.7%). Application of MASH to a diffuse large B-cell lymphoma genomics data set identified copy-number variations for eight genes as mediators between the baseline International Prognostic Index score and overall survival.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1360-1377"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA. 根据美国社区调查、人口估计计划和十年一次的人口普查数据，建立一个考虑到数据源特定方法的贝叶斯分层小地区人口模型。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-06-01 Epub Date: 2024-04-05 DOI: 10.1214/23-aoas1849

Emily N Peterson, Rachel C Nethery, Tullia Padellini, Jarvis T Chen, Brent A Coull, Frédéric B Piel, Jon Wakefield, Marta Blangiardo, Lance A Waller

Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts and their associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006-2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.

小地区人口统计是许多流行病学研究的必要条件，但其质量和准确性往往得不到评估。在美国，小地区人口统计由美国人口普查局（USCB）以十年一次的人口普查计数、普查间人口预测（PEP）和美国社区调查（ACS）估计值的形式发布。虽然这三个数据源之间存在重要关系，但在数据收集、数据可用性和处理方法方面存在重要差异，因此每套报告的人口数量可能会受到不同来源和不同程度误差的影响。此外，由于每个数据源都会进行特定的调查后调整，因此这些数据源报告的小地区人口数并不完全相同。因此，在公共卫生研究中，小地区疾病/死亡率可能会因分母数据使用的数据源不同而不同。为了准确估算年度小地区人口数量及其相关的不确定性，我们提出了一个贝叶斯人口（BPop）模型，该模型融合了 USCB 所有三个来源的信息，并考虑了数据源特定的方法和相关误差。考虑到所有三个 USCB 人口估计中观察到的趋势，我们对真实人口及其相关不确定性进行了全面的小区域种族分层估计。我们的框架的主要特点是(1) 整合多个数据源的单一模型，(2) 考虑到数据源特定的数据生成机制，特别是考虑到数据源特定的误差，以及 (3) 对没有 USCB 报告数据的年份的人口数量进行预测。我们的研究重点是佐治亚州 159 个县的黑人和白人人口，并得出 2006-2023 年的估计值。我们将 BPop 人口估计值与十年一次的人口普查计数、PEP 年度计数和 ACS 多年估计值进行了比较。此外，我们还说明并解释了不同类型的数据源特定误差。最后，我们通过模拟和验证练习来比较模型的性能。我们的贝叶斯人口模型可扩展到其他应用领域，如更小的空间粒度、按种族、年龄和性别进一步定义的人口亚群，以及/或其他地理区域。

{"title":"A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA.","authors":"Emily N Peterson, Rachel C Nethery, Tullia Padellini, Jarvis T Chen, Brent A Coull, Frédéric B Piel, Jon Wakefield, Marta Blangiardo, Lance A Waller","doi":"10.1214/23-aoas1849","DOIUrl":"https://doi.org/10.1214/23-aoas1849","url":null,"abstract":"Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts and their associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006-2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":"1565-1595"},"PeriodicalIF":1.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data. 选择无效工具，利用双样本汇总数据改进孟德尔随机化。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-04-05 eCollection Date: 2024-06-01 DOI: 10.1214/23-AOAS1856

Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess

Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.

孟德尔随机法（Mendelian randomization，MR）是一种广泛使用的估算风险因素与疾病之间因果关系的方法。任何 MR 分析的一个基本部分都是选择适当的遗传变异作为工具变量。全基因组关联研究通常会发现，数以百计的遗传变异可能与某一风险因素密切相关，但在某些情况下，研究者可能只对较小变异子集的工具有效性更有信心。尽管如此，从均方误差的角度来看，使用额外的工具可能是最佳的，即使这些工具稍有无效；估计中的微小偏差可能是值得付出的代价，以换取方差的较大减少。为此，我们考虑了一种 "有针对性 "的工具选择方法，即通过选择遗传变异来最小化因果效应估计的渐近均方误差。在存在许多弱工具和局部无效工具的情况下，我们提出了一种新的策略来构建选择后集中估计器的置信区间，以防止渐近覆盖率的最坏情况损失。在(i) 验证脂质药物靶点；(ii) 调查维生素 D 对各种结果的影响的经验应用中，我们的研究结果表明，最佳工具选择不仅涉及少量生物学上合理的工具，还涉及许多潜在的无效工具。

{"title":"Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data.","authors":"Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess","doi":"10.1214/23-AOAS1856","DOIUrl":"10.1214/23-AOAS1856","url":null,"abstract":"Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for \"focused\" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL. 估计异质幸存者因果效应的贝叶斯机器学习方法：应用于重症监护试验。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1792

Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li

Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.

评估治疗效果的异质性在因果推理领域越来越流行，对临床决策具有重要意义。虽然已有大量文献用于研究完全观察结果时治疗效果的异质性，但当以患者为中心的结果被死亡等终末事件截断时，用于估计异质性因果效应的工具的发展还很有限。由于死亡发生在研究随访期间，许多参与者的相关结果是不可观察、未定义或未完全观察到的，在这种情况下，主分层是得出有效因果结论的一个有吸引力的框架。受急性呼吸窘迫综合征网络（ARDSNetwork）ARDS 呼吸管理（ARMA）试验的启发，我们开发了一种灵活的贝叶斯机器学习方法，用于估算当临床结果受到截断影响时，始终存活者层中的平均因果效应和异质性因果效应。我们采用贝叶斯加性回归树（BART），灵活地为潜在结果和潜在分层成员指定单独的均值模型。在对 ARMA 试验的分析中，我们发现低潮气量治疗对急性肺损伤参与者的总体获益在于重返家园的时间，但在始终存活者中，治疗效果存在很大的异质性，这主要受生物性别和基线肺泡-动脉血氧梯度（肺功能和低氧血症程度的生理学测量指标）的影响。这些发现说明了所提出的方法可如何指导该领域未来试验的预后丰富化。

{"title":"A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL.","authors":"Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li","doi":"10.1214/23-aoas1792","DOIUrl":"10.1214/23-aoas1792","url":null,"abstract":"Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"350-374"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS. 简单灵活的样本可交换性测试，应用于统计基因组学。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1817

Alan J Aw, Jeffrey P Spence, Yun S Song

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the $p$ -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

在涉及多变量数据分析的科学研究中，研究人员经常会遇到一些基本但重要的问题：样本是否可交换，即样本的联合分布与单位排序无关？特征是否相互独立，或者特征是否可以分组，从而使各组相互独立？在统计基因组学中，这些考虑因素对于人口推断和构建多基因风险评分等下游任务至关重要。我们提出了一种非参数方法（我们称之为 V 检验）来解决这两个问题，即给定特征依赖结构的样本可交换性检验和给定样本可交换性的特征独立性检验。我们的检验方法概念简单、快速灵活。它能在现实场景中控制 I 类误差，并利用大样本渐近学处理任意维度的数据。通过大量的模拟以及与基于随机矩阵理论的无监督分层检验的比较，我们发现我们的检验在各种感兴趣的情况下都表现出色。我们将该检验应用于 1000 基因组计划的数据，展示了如何利用它来评估基因样本的可交换性，或为下游分析找到最佳的连锁不平衡（LD）分割。在可交换性评估中，我们发现去除罕见变异可大幅提高检验统计量的 p 值。对于最优 LD 分割，V 检验报告的最优分割与之前不依赖假设检验的方法不同。我们的方法可在 R（CRAN：flintyR）和 Python（PyPI：flintyPy）中使用。

{"title":"A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.","authors":"Alan J Aw, Jeffrey P Spence, Yun S Song","doi":"10.1214/23-aoas1817","DOIUrl":"10.1214/23-aoas1817","url":null,"abstract":"In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the <math><mi>p</mi></math>-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"858-881"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11115382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141089297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY. 使用同步回归校准法，利用受控喂养研究中开发的生物标记物，研究多种易出错暴露对疾病风险的影响。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1782

Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng

Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.

自我报告数据中的系统测量误差给膳食摄入量与慢性疾病风险之间的关联研究带来了重大挑战，尤其是在对多种膳食成分进行联合研究时。当所有相关膳食成分都有客观测量的生物标志物时，联合回归校准法就可用于测量误差校正。遗憾的是，只有极少数膳食成分有客观测量的生物标志物，这限制了联合回归校准法的应用。最近，针对单一膳食成分进行了控制喂养研究，为更多膳食成分开发了新的生物标志物。然而，目前还不清楚针对单一膳食成分单独开发的生物标志物是否适用于联合校准。本文表明，针对单一膳食成分开发的生物标记物不能用于联合回归校准。我们提出了新的方法，利用控制喂养研究来开发用于联合回归校准的有效生物标志物，以估算多种膳食成分同时与相关疾病之间的关联。我们推导出了拟议估计器的渐近分布理论。我们进行了广泛的模拟，以研究拟议估计器的有限样本性能。我们利用妇女健康倡议队列数据，将我们的方法应用于研究钠和钾摄入量对心血管疾病发病率的共同影响。我们发现钠摄入量与心血管疾病之间存在正相关，而钾摄入量与心血管疾病之间存在负相关。

{"title":"USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY.","authors":"Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng","doi":"10.1214/23-aoas1782","DOIUrl":"10.1214/23-aoas1782","url":null,"abstract":"Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"125-143"},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10836829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139681864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LATENT SUBGROUP IDENTIFICATION IN IMAGE-ON-SCALAR REGRESSION. 图像标度回归中的潜在子群识别。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1797

Zikai Lin, Yajuan Si, Jian Kang

Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, for example, the Adolescent Brain Cognitive Development (ABCD) Study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: (1) within each subgroup the brain activities have homogeneous associations with the clinical measures; (2) across subgroups the associations are heterogeneous, and (3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multisite neuroimaging data with diverse sociode-mographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github.

在神经成像研究中，图像-标度回归一直是大脑活动与标度特征之间关联建模的常用方法。最近的大规模神经成像研究（如青少年脑认知发展（ABCD）研究）表明，人群中不同个体之间的关联可能是异质的。ABCD 数据可以帮助我们了解异质性关联，以及如何利用异质性和定制干预措施来增加受益青少年的数量。我们非常有兴趣从人群中识别出一些亚群，以便：(1) 在每个亚群中，每个人都有自己的特点：(1) 在每个亚组别中，大脑活动与临床指标具有同质性关联；(2) 在不同亚组别中，关联具有异质性；(3) 组别分配取决于个体特征。现有的图像尺度回归方法和聚类方法无法直接实现这一目标。我们提出了一种潜在子群图像-尺度回归模型（LASIR），用于分析具有不同社会人口统计学特征的大规模、多站点神经成像数据。LASIR 引入了每个个体的潜在子群和特定群体的空间变化效应，并采用高效的随机期望最大化算法进行推断。我们通过综合模拟和在 ABCD 研究中的应用，证明 LASIR 在利用功能磁共振成像数据对大脑激活模式进行亚组识别方面优于现有的替代方法。我们已经发布了可复制的代码，供公众使用，软件包可在 Github 上下载。

{"title":"LATENT SUBGROUP IDENTIFICATION IN IMAGE-ON-SCALAR REGRESSION.","authors":"Zikai Lin, Yajuan Si, Jian Kang","doi":"10.1214/23-aoas1797","DOIUrl":"10.1214/23-aoas1797","url":null,"abstract":"Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, for example, the Adolescent Brain Cognitive Development (ABCD) Study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: (1) within each subgroup the brain activities have homogeneous associations with the clinical measures; (2) across subgroups the associations are heterogeneous, and (3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multisite neuroimaging data with diverse sociode-mographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"468-486"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11156244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141285216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ANOPOW FOR REPLICATED NONSTATIONARY TIME SERIES IN EXPERIMENTS. 用于实验中复制的非平稳时间序列的 anopow。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1791

Zeda Li, Yu Ryan Yue, Scott A Bruce

We propose a novel analysis of power (ANOPOW) model for analyzing replicated nonstationary time series commonly encountered in experimental studies. Based on a locally stationary ANOPOW Cramér spectral representation, the proposed model can be used to compare the second-order time-varying frequency patterns among different groups of time series and to estimate group effects as functions of both time and frequency. Formulated in a Bayesian framework, independent two-dimensional second-order random walk (RW2D) priors are assumed on each of the time-varying functional effects for flexible and adaptive smoothing. A piecewise stationary approximation of the nonstationary time series is used to obtain localized estimates of time-varying spectra. Posterior distributions of the time-varying functional group effects are then obtained via integrated nested Laplace approximations (INLA) at a low computational cost. The large-sample distribution of local periodograms can be appropriately utilized to improve estimation accuracy since INLA allows modeling of data with various types of distributions. The usefulness of the proposed model is illustrated through two real data applications: analyses of seismic signals and pupil diameter time series in children with attention deficit hyperactivity disorder. Simulation studies, Supplementary Materials (Li, Yue and Bruce, 2023a), and R code (Li, Yue and Bruce, 2023b) for this article are also available.

我们提出了一种新颖的功率分析（ANOPOW）模型，用于分析实验研究中常见的重复非平稳时间序列。基于局部静止的 ANOPOW Cramér 频谱表示，所提出的模型可用于比较不同时间序列组间的二阶时变频率模式，并估算作为时间和频率函数的组效应。在贝叶斯框架下，假设每个时变函数效应都有独立的二维二阶随机游走（RW2D）先验，以实现灵活的自适应平滑。非平稳时间序列的片断平稳近似用于获得时变频谱的局部估计值。然后，通过集成嵌套拉普拉斯近似（INLA），以较低的计算成本获得时变功能组效应的后验分布。由于 INLA 可以对各种类型分布的数据建模，因此可以适当利用局部周期图的大样本分布来提高估计精度。本文通过两个实际数据应用说明了所提模型的实用性：地震信号分析和注意力缺陷多动障碍儿童的瞳孔直径时间序列分析。本文的仿真研究、补充材料（Li, Yue and Bruce, 2023a）和 R 代码（Li, Yue and Bruce, 2023b）也已发布。

{"title":"ANOPOW FOR REPLICATED NONSTATIONARY TIME SERIES IN EXPERIMENTS.","authors":"Zeda Li, Yu Ryan Yue, Scott A Bruce","doi":"10.1214/23-aoas1791","DOIUrl":"10.1214/23-aoas1791","url":null,"abstract":"We propose a novel analysis of power (ANOPOW) model for analyzing replicated nonstationary time series commonly encountered in experimental studies. Based on a locally stationary ANOPOW Cramér spectral representation, the proposed model can be used to compare the second-order time-varying frequency patterns among different groups of time series and to estimate group effects as functions of both time and frequency. Formulated in a Bayesian framework, independent two-dimensional second-order random walk (RW2D) priors are assumed on each of the time-varying functional effects for flexible and adaptive smoothing. A piecewise stationary approximation of the nonstationary time series is used to obtain localized estimates of time-varying spectra. Posterior distributions of the time-varying functional group effects are then obtained via integrated nested Laplace approximations (INLA) at a low computational cost. The large-sample distribution of local periodograms can be appropriately utilized to improve estimation accuracy since INLA allows modeling of data with various types of distributions. The usefulness of the proposed model is illustrated through two real data applications: analyses of seismic signals and pupil diameter time series in children with attention deficit hyperactivity disorder. Simulation studies, Supplementary Materials (Li, Yue and Bruce, 2023a), and R code (Li, Yue and Bruce, 2023b) for this article are also available.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"328-349"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LAND-USE FILTERING FOR NONSTATIONARY SPATIAL PREDICTION OF COLLECTIVE EFFICACY IN AN URBAN ENVIRONMENT. 利用土地利用滤波技术对城市环境中的集体效能进行非稳态空间预测。

IF 1.8 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1813

J Brandon Carter, Christopher R Browning, Bethany Boettner, Nicolo Pinchak, Catherine A Calder

Collective efficacy-the capacity of communities to exert social control toward the realization of their shared goals-is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area, which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.

集体效能--社区为实现其共同目标而施加社会控制的能力--是城市社会学和邻里效应文献中的一个基本概念。传统上，对集体效能的实证研究使用大样本调查来估算城市环境中不同社区的集体效能。此类研究表明，集体效能与社区暴力、教育成就和健康状况的地方差异之间存在关联。与传统的集体效能测量策略不同，"情境中的青少年健康与发展（AHDC）研究 "采用了一种新方法，从居住在俄亥俄州哥伦布市的代表性样本中获取空间参照、基于地点的集体效能评分。在本文中，我们介绍了一种新的非平稳空间模型，用于对整个研究区域的 AHDC 集体效能评分进行插值，该模型利用了有关土地利用的行政数据。我们的建设性模型规范策略包括对潜在空间过程进行维度扩展，并使用由研究区域的土地使用分区定义的过滤器，将潜在的多元空间过程与观察到的集体效能顺序评分联系起来。对参数的可识别性、模型拟合的 MCMC 算法的计算效率以及集体效能的精细空间预测等问题进行了仔细考虑。

{"title":"LAND-USE FILTERING FOR NONSTATIONARY SPATIAL PREDICTION OF COLLECTIVE EFFICACY IN AN URBAN ENVIRONMENT.","authors":"J Brandon Carter, Christopher R Browning, Bethany Boettner, Nicolo Pinchak, Catherine A Calder","doi":"10.1214/23-aoas1813","DOIUrl":"10.1214/23-aoas1813","url":null,"abstract":"Collective efficacy-the capacity of communities to exert social control toward the realization of their shared goals-is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area, which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"794-818"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

COMPOSITE SCORES FOR TRANSPLANT CENTER EVALUATION: A NEW INDIVIDUALIZED EMPIRICAL NULL METHOD. 用于移植中心评估的综合评分：一种新的个性化经验无效法。

IF 1.3 4区数学 Q2 STATISTICS & PROBABILITY

Annals of Applied Statistics

Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI: 10.1214/23-aoas1809

Nicholas Hartman, Joseph M Messana, Jian Kang, Abhijit S Naik, Tempie H Shearon, Kevin He

Risk-adjusted quality measures are used to evaluate healthcare providers with respect to national norms while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the between-provider variation in these measures is entirely due to meaningful differences in quality of care. However, in practice, much of the between-provider variation will be due to trivial fluctuations in healthcare quality, or unobservable confounding risk factors. If these additional sources of variation are not accounted for, conventional methods will disproportionately identify larger providers as outliers, even though their departures from the national norms may not be "extreme" or clinically meaningful. Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective sample size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.

风险调整后的质量衡量标准用于评估医疗服务提供者是否符合国家规范，同时控制其无法控制的因素。现有的医疗服务提供者特征分析方法通常假定，这些指标在提供者之间的差异完全是由于医疗服务质量方面存在有意义的差异造成的。但实际上，医疗服务提供者之间的差异很大程度上是由于医疗质量的微小波动或不可观测的混杂风险因素造成的。如果不考虑这些额外的差异来源，传统方法就会不成比例地将规模较大的医疗服务提供者视为异常值，尽管他们偏离全国标准的程度可能并不 "极端"，也没有临床意义。受移植中心医疗质量评估工作的启发，我们开发了一种基于新颖的个性化经验零方法的综合评估分数，该方法能稳健地考虑未观察到的风险因素导致的过度分散，将标准化分数的边际方差作为有效样本量的函数进行建模，并且只需要使用公开的中心级统计数据。根据建议的综合评分对美国肾移植中心进行的评估与根据传统方法进行的评估有很大不同。模拟结果表明，与现有方法相比，建议的经验空方法能更准确地对中心的医疗质量进行分类。

{"title":"COMPOSITE SCORES FOR TRANSPLANT CENTER EVALUATION: A NEW INDIVIDUALIZED EMPIRICAL NULL METHOD.","authors":"Nicholas Hartman, Joseph M Messana, Jian Kang, Abhijit S Naik, Tempie H Shearon, Kevin He","doi":"10.1214/23-aoas1809","DOIUrl":"10.1214/23-aoas1809","url":null,"abstract":"Risk-adjusted quality measures are used to evaluate healthcare providers with respect to national norms while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the between-provider variation in these measures is entirely due to meaningful differences in quality of care. However, in practice, much of the between-provider variation will be due to trivial fluctuations in healthcare quality, or unobservable confounding risk factors. If these additional sources of variation are not accounted for, conventional methods will disproportionately identify larger providers as outliers, even though their departures from the national norms may not be \"extreme\" or clinically meaningful. Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective sample size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"729-748"},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11395314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0