首页 > 最新文献

Annals of Applied Statistics最新文献

英文 中文
JOINT MODELING FOR LEARNING DECISION-MAKING DYNAMICS IN BEHAVIORAL EXPERIMENTS. 行为实验中决策动力学学习的联合建模。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 Epub Date: 2025-12-05 DOI: 10.1214/25-aoas2112
Yuan Bian, Xingche Guo, Yuanjia Wang

Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the engaged state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the lapsed state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer responses when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the engaged state but not in the lapsed state, providing evidence of brain-behavior association specific to the engaged state.

重度抑郁症(MDD)是导致残疾和死亡的主要原因,与奖励处理异常和注意力问题有关。基于“临床护理中抗抑郁反应的调节因子和生物特征的建立”(EMBARC)研究的概率奖励任务,我们提出了一个整合强化学习(RL)模型和漂移扩散模型(DDM)的新框架,以共同分析基于反应时间的奖励决策。为了解释新出现的证据表明决策可能在多个交错策略之间交替,我们使用隐马尔可夫模型(HMM)建模潜在状态切换。在参与状态下,决策遵循RL-DDM,同时捕获奖励处理、决策动态和时间结构。在失效状态下,决策模型使用简化的DDM,其中固定特定参数以近似等概率随机猜测。该方法采用一种计算效率高的广义期望最大化(EM)算法实现。通过广泛的数值研究,我们证明了我们提出的方法优于各种奖励生成分布的竞争方法,无论是在策略切换和非切换场景下,还是在存在输入扰动的情况下。当应用于EMBARC研究时,我们的框架揭示了重度抑郁症患者比健康对照者表现出更低的整体参与,并且当他们参与时经历了更长的反应。此外,我们表明,大脑活动的神经成像测量与参与状态下的决策特征相关,而与失神状态无关,这为参与状态下的大脑行为关联提供了证据。
{"title":"JOINT MODELING FOR LEARNING DECISION-MAKING DYNAMICS IN BEHAVIORAL EXPERIMENTS.","authors":"Yuan Bian, Xingche Guo, Yuanjia Wang","doi":"10.1214/25-aoas2112","DOIUrl":"10.1214/25-aoas2112","url":null,"abstract":"<p><p>Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the engaged state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the lapsed state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer responses when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the engaged state but not in the lapsed state, providing evidence of brain-behavior association specific to the engaged state.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3372-3393"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146012947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MULTI-OBJECT DATA INTEGRATION IN THE STUDY OF PRIMARY PROGRESSIVE APHASIA. 原发性进行性失语症的多目标数据整合研究。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 Epub Date: 2025-12-05 DOI: 10.1214/25-aoas2071
Rene Gutierrez, Aaron Scheffler, Rajarshi Guhaniyogi, Maria Luisa Gorno-Tempini, Maria Luisa Mandelli, Giovanni Battistella

This article focuses on a multi-modal imaging data application where structural/anatomical information from gray matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for a number of subjects with different degrees of primary progressive aphasia (PPA), a neurodegenerative disorder (ND) measured through a speech rate measure on motor speech loss. The clinical/scientific goal in this study becomes the identification of brain regions of interest significantly related to the speech rate measure to gain insight into ND patterns. Viewing the brain connectome network and GM images as objects, we develop an integrated object response regression framework of network and GM images on the speech rate measure. A novel integrated prior formulation is proposed on network and structural image coefficients in order to exploit network information of the brain connectome while leveraging the interconnections among the two objects. The principled Bayesian framework allows the characterization of uncertainty in ascertaining a region being actively related to the speech rate measure. Our framework yields new insights into the relationship of brain regions associated with PPA, offering a deeper understanding of neuro-degenerative patterns of PPA. The supplementary file adds details about posterior computation and additional empirical results.

本文重点介绍了一种多模态成像数据应用,其中来自灰质(GM)的结构/解剖信息和来自功能磁共振成像(fMRI)的脑连接组网络形式的脑连接信息可用于许多患有不同程度原发性进行性失语(PPA)的受试者,PPA是一种神经退行性疾病(ND),通过对运动语言丧失的言语速率测量来测量。本研究的临床/科学目标是识别与言语速率测量显著相关的大脑区域,以深入了解ND模式。以脑连接组网络和GM图像为对象,在语音速率测量上建立了网络和GM图像的综合对象响应回归框架。提出了一种新的基于网络和结构图像系数的综合先验公式,以利用脑连接组的网络信息,同时利用两者之间的相互联系。原则贝叶斯框架允许表征不确定性在确定一个区域是积极相关的语音速率测量。我们的框架为PPA相关的大脑区域的关系提供了新的见解,为PPA的神经退行性模式提供了更深入的理解。补充文件增加了后验计算的细节和额外的经验结果。
{"title":"MULTI-OBJECT DATA INTEGRATION IN THE STUDY OF PRIMARY PROGRESSIVE APHASIA.","authors":"Rene Gutierrez, Aaron Scheffler, Rajarshi Guhaniyogi, Maria Luisa Gorno-Tempini, Maria Luisa Mandelli, Giovanni Battistella","doi":"10.1214/25-aoas2071","DOIUrl":"10.1214/25-aoas2071","url":null,"abstract":"<p><p>This article focuses on a multi-modal imaging data application where structural/anatomical information from gray matter (GM) and brain connectivity information in the form of a brain connectome network from functional magnetic resonance imaging (fMRI) are available for a number of subjects with different degrees of primary progressive aphasia (PPA), a neurodegenerative disorder (ND) measured through a speech rate measure on motor speech loss. The clinical/scientific goal in this study becomes the identification of brain regions of interest significantly related to the speech rate measure to gain insight into ND patterns. Viewing the brain connectome network and GM images as objects, we develop an integrated object response regression framework of network and GM images on the speech rate measure. A novel integrated prior formulation is proposed on network and structural image coefficients in order to exploit network information of the brain connectome while leveraging the interconnections among the two objects. The principled Bayesian framework allows the characterization of uncertainty in ascertaining a region being actively related to the speech rate measure. Our framework yields new insights into the relationship of brain regions associated with PPA, offering a deeper understanding of neuro-degenerative patterns of PPA. The supplementary file adds details about posterior computation and additional empirical results.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3282-3303"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SUPERVISED LEARNING OF OUTCOME-RELEVANT ITEMS FROM A QUESTIONNAIRE VIA MIXED INTEGER OPTIMIZATION. 基于混合整数优化的问卷结果相关项的监督学习。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 Epub Date: 2025-12-05 DOI: 10.1214/25-AOAS2093
Leyao Zhang, Wen Wang, Mengtong Hu, Alan P Baptist, Peng Wang, Peter X K Song

Questionnaires are among the oldest and most widely used instruments in practice to measure variables relevant to traits of interest that cannot be easily measured by physical devices, for example, depression. In many clinical settings, the scope of an existing questionnaire is often unfit to apply to a new study population, whose underlying characteristics are different from those of the original population used for the questionnaire's development and/or validation. Motivated by a cohort study of elderly asthma patients, we aim to examine associations between clinical outcomes and quality of life (QoL) measured by a QoL questionnaire. To increase comparability, we consider a supervised learning method to identify a subset of questions whose summary score is strongly associated with a specific clinical outcome under investigation. The resultant set of selected items gives an optimal summary metric of the questionnaire, which improves both statistical power and clinical interpretation. Our item extraction procedure is built upon the best subset algorithm implemented by a mixed integer programming, which enjoys both theoretical guarantee of selection consistency and flexibility of handling nonresponse missing data. Moreover, estimation uncertainty is analyzed by the means of noise perturbation. Our methodology is first evaluated by extensive simulation studies with comparisons to existing methods and then applied to derive tailored QoL scores adaptive to two clinical outcomes of lung function measure (FEV1) and asthma control test (ACT), respectively, among elderly people with persistent asthma.

问卷调查是实践中最古老和最广泛使用的工具之一,用于测量与无法通过物理设备轻松测量的感兴趣特征相关的变量,例如抑郁症。在许多临床环境中,现有问卷的范围通常不适合应用于新的研究人群,其潜在特征与用于问卷开发和/或验证的原始人群不同。受一项老年哮喘患者队列研究的启发,我们旨在通过生活质量问卷调查临床结果与生活质量(QoL)之间的关系。为了增加可比性,我们考虑了一种监督学习方法来识别问题子集,这些问题的总结性得分与正在调查的特定临床结果密切相关。所选项目的结果集给出了问卷的最佳总结度量,这提高了统计能力和临床解释。我们的项目提取过程建立在混合整数规划实现的最佳子集算法的基础上,既具有选择一致性的理论保证,又具有处理无响应缺失数据的灵活性。此外,采用噪声扰动的方法分析了估计的不确定性。我们的方法首先通过广泛的模拟研究进行评估,并与现有方法进行比较,然后应用于在患有持续性哮喘的老年人中分别获得适合肺功能测量(FEV1)和哮喘控制测试(ACT)两种临床结果的量身定制的生活质量评分。
{"title":"SUPERVISED LEARNING OF OUTCOME-RELEVANT ITEMS FROM A QUESTIONNAIRE VIA MIXED INTEGER OPTIMIZATION.","authors":"Leyao Zhang, Wen Wang, Mengtong Hu, Alan P Baptist, Peng Wang, Peter X K Song","doi":"10.1214/25-AOAS2093","DOIUrl":"10.1214/25-AOAS2093","url":null,"abstract":"<p><p>Questionnaires are among the oldest and most widely used instruments in practice to measure variables relevant to traits of interest that cannot be easily measured by physical devices, for example, depression. In many clinical settings, the scope of an existing questionnaire is often unfit to apply to a new study population, whose underlying characteristics are different from those of the original population used for the questionnaire's development and/or validation. Motivated by a cohort study of elderly asthma patients, we aim to examine associations between clinical outcomes and quality of life (QoL) measured by a QoL questionnaire. To increase comparability, we consider a supervised learning method to identify a subset of questions whose summary score is strongly associated with a specific clinical outcome under investigation. The resultant set of selected items gives an optimal summary metric of the questionnaire, which improves both statistical power and clinical interpretation. Our item extraction procedure is built upon the best subset algorithm implemented by a mixed integer programming, which enjoys both theoretical guarantee of selection consistency and flexibility of handling nonresponse missing data. Moreover, estimation uncertainty is analyzed by the means of noise perturbation. Our methodology is first evaluated by extensive simulation studies with comparisons to existing methods and then applied to derive tailored QoL scores adaptive to two clinical outcomes of lung function measure (FEV1) and asthma control test (ACT), respectively, among elderly people with persistent asthma.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3157-3178"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12869357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TREE-REGULARIZED BAYESIAN LATENT CLASS ANALYSIS FOR IMPROVING WEAKLY SEPARATED DIETARY PATTERN SUBTYPING IN SMALL-SIZED SUBPOPULATIONS. 树正则化贝叶斯潜类分析改善小尺度亚群中弱分离饮食模式亚型。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-12-01 Epub Date: 2025-12-05 DOI: 10.1214/25-aoas2067
By Mengbing Li, Briana Stephenson, Zhenke Wu

Dietary patterns synthesize multiple related diet components, which can be used by nutrition researchers to examine diet-disease relationships. Latent class models (LCMs) have been used to derive dietary patterns from dietary intake assessment, where each class profile represents the probabilities of exposure to a set of diet components. However, LCM-derived dietary patterns can exhibit strong similarities, or weak separation, resulting in numerical and inferential instabilities that challenge scientific interpretation. This issue is exacerbated in small-sized subpopulations. To address these issues, we provide a simple solution that empowers LCMs to improve dietary pattern estimation. We develop a tree-regularized Bayesian LCM that shares statistical strength between dietary patterns to make better estimates using limited data. This is achieved via a Dirichlet diffusion tree process that specifies a prior distribution for the unknown tree over classes. Dietary patterns that share proximity to one another in the tree are shrunk toward ancestral dietary patterns a priori, with the degree of shrinkage varying across prespecified food groups. Using dietary intake data from the Hispanic Community Health Study/Study of Latinos, we apply the proposed approach to a sample of 496 U.S. adults of South American ethnic background to identify and compare dietary patterns.

饮食模式综合了多种相关的饮食成分,可以被营养研究人员用来研究饮食与疾病的关系。潜在类别模型(lcm)已被用于从饮食摄入评估中得出饮食模式,其中每个类别概况代表暴露于一组饮食成分的概率。然而,lcm衍生的饮食模式可能表现出强烈的相似性或弱分离性,导致数字和推断的不稳定性,挑战科学解释。这个问题在小型亚种群中更加严重。为了解决这些问题,我们提供了一个简单的解决方案,使lcm能够改进饮食模式估计。我们开发了一种树正则化贝叶斯LCM,它在饮食模式之间共享统计强度,以便使用有限的数据进行更好的估计。这是通过狄利克雷扩散树过程实现的,该过程指定了未知树在类上的先验分布。在树中彼此相近的饮食模式会先验地向祖先的饮食模式缩小,缩小的程度因预先指定的食物组而异。使用来自西班牙裔社区健康研究/拉丁裔研究的饮食摄入数据,我们将建议的方法应用于496名南美种族背景的美国成年人样本,以确定和比较饮食模式。
{"title":"TREE-REGULARIZED BAYESIAN LATENT CLASS ANALYSIS FOR IMPROVING WEAKLY SEPARATED DIETARY PATTERN SUBTYPING IN SMALL-SIZED SUBPOPULATIONS.","authors":"By Mengbing Li, Briana Stephenson, Zhenke Wu","doi":"10.1214/25-aoas2067","DOIUrl":"10.1214/25-aoas2067","url":null,"abstract":"<p><p>Dietary patterns synthesize multiple related diet components, which can be used by nutrition researchers to examine diet-disease relationships. Latent class models (LCMs) have been used to derive dietary patterns from dietary intake assessment, where each class profile represents the probabilities of exposure to a set of diet components. However, LCM-derived dietary patterns can exhibit strong similarities, or weak separation, resulting in numerical and inferential instabilities that challenge scientific interpretation. This issue is exacerbated in small-sized subpopulations. To address these issues, we provide a simple solution that empowers LCMs to improve dietary pattern estimation. We develop a tree-regularized Bayesian LCM that shares statistical strength between dietary patterns to make better estimates using limited data. This is achieved via a Dirichlet diffusion tree process that specifies a prior distribution for the unknown tree over classes. Dietary patterns that share proximity to one another in the tree are shrunk toward ancestral dietary patterns a priori, with the degree of shrinkage varying across prespecified food groups. Using dietary intake data from the Hispanic Community Health Study/Study of Latinos, we apply the proposed approach to a sample of 496 U.S. adults of South American ethnic background to identify and compare dietary patterns.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 4","pages":"3003-3022"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12867110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAST VARIABLE SELECTION FOR DISTRIBUTIONAL REGRESSION WITH APPLICATION TO CONTINUOUS GLUCOSE MONITORING DATA. 分布回归的快速变量选择及其在连续血糖监测数据中的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2038
Alexander Coulter, Rashmi N Aurora, Naresh M Punjabi, Irina Gaynanova

With the growing prevalence of diabetes and the associated public health burden, it is crucial to identify modifiable factors that could improve patients' glycemic control. In this work, we seek to examine associations between medication usage, concurrent comorbidities, and glycemic control, utilizing data from continuous glucose monitors (CGMs). CGMs provide high-frequency interstitial glucose measurements, but reducing data to simple statistical summaries is common in clinical studies, resulting in substantial information loss. Recent advancements in the Fréchet regression framework allow to utilize more information by treating the full distributional representation of CGM data as the response, while sparsity regularization enables variable selection. However, the methodology does not scale to large datasets. Crucially, rigorous inference is not possible because the asymptotic behavior of the underlying estimates is unknown, while the application of resampling-based inference methods is computationally infeasible. We develop a new algorithm for sparse distributional regression by deriving a new explicit characterization of the gradient and Hessian of the underlying objective function, while also utilizing rotations on the sphere to perform feasible updates. The updated method is up to 10000+ fold faster than the original approach, opening the door for applying sparse distributional regression to large-scale datasets and enabling previously unattainable resampling-based inference. We combine our algorithm with stability selection to perform variable selection inference on CGM data from patients with type 2 diabetes and obstructive sleep apnea. We find a significant association between sulfonylurea medication and glucose variability without evidence of association with glucose mean. We also find that overnight oxygen desaturation variability has a stronger association with glucose regulation than overall oxygen desaturation levels.

随着糖尿病患病率的增加和相关的公共卫生负担,确定可以改善患者血糖控制的可改变因素至关重要。在这项工作中,我们利用连续血糖监测仪(cgm)的数据,试图检查药物使用、并发合并症和血糖控制之间的关系。cgm提供高频间质葡萄糖测量,但将数据简化为简单的统计摘要在临床研究中很常见,导致大量信息丢失。最近在fr回归框架中的进展允许通过将CGM数据的完整分布表示作为响应来利用更多的信息,而稀疏性正则化则支持变量选择。然而,该方法并不适用于大型数据集。至关重要的是,严格的推理是不可能的,因为底层估计的渐近行为是未知的,而基于重采样的推理方法的应用在计算上是不可行的。我们开发了一种新的稀疏分布回归算法,通过推导出一种新的明确的梯度和潜在目标函数的Hessian特征,同时还利用球体上的旋转来执行可行的更新。更新后的方法比原始方法快10000多倍,为将稀疏分布回归应用于大规模数据集打开了大门,并实现了以前无法实现的基于重采样的推理。我们将算法与稳定性选择相结合,对2型糖尿病和阻塞性睡眠呼吸暂停患者的CGM数据进行变量选择推理。我们发现磺脲类药物与血糖变异性之间存在显著关联,但没有证据表明与血糖平均值相关。我们还发现,与整体氧去饱和水平相比,夜间氧去饱和变异性与葡萄糖调节的关系更强。
{"title":"FAST VARIABLE SELECTION FOR DISTRIBUTIONAL REGRESSION WITH APPLICATION TO CONTINUOUS GLUCOSE MONITORING DATA.","authors":"Alexander Coulter, Rashmi N Aurora, Naresh M Punjabi, Irina Gaynanova","doi":"10.1214/25-aoas2038","DOIUrl":"10.1214/25-aoas2038","url":null,"abstract":"<p><p>With the growing prevalence of diabetes and the associated public health burden, it is crucial to identify modifiable factors that could improve patients' glycemic control. In this work, we seek to examine associations between medication usage, concurrent comorbidities, and glycemic control, utilizing data from continuous glucose monitors (CGMs). CGMs provide high-frequency interstitial glucose measurements, but reducing data to simple statistical summaries is common in clinical studies, resulting in substantial information loss. Recent advancements in the Fréchet regression framework allow to utilize more information by treating the full distributional representation of CGM data as the response, while sparsity regularization enables variable selection. However, the methodology does not scale to large datasets. Crucially, rigorous inference is not possible because the asymptotic behavior of the underlying estimates is unknown, while the application of resampling-based inference methods is computationally infeasible. We develop a new algorithm for sparse distributional regression by deriving a new explicit characterization of the gradient and Hessian of the underlying objective function, while also utilizing rotations on the sphere to perform feasible updates. The updated method is up to 10000+ fold faster than the original approach, opening the door for applying sparse distributional regression to large-scale datasets and enabling previously unattainable resampling-based inference. We combine our algorithm with stability selection to perform variable selection inference on CGM data from patients with type 2 diabetes and obstructive sleep apnea. We find a significant association between sulfonylurea medication and glucose variability without evidence of association with glucose mean. We also find that overnight oxygen desaturation variability has a stronger association with glucose regulation than overall oxygen desaturation levels.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2105-2128"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIXED MODELING APPROACH FOR CHARACTERIZING THE GENETIC EFFECTS IN A LONGITUDINAL PHENOTYPE. 描述纵向表型遗传效应的混合建模方法。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2033
Pei Zhang, Paul S Albert, Hyokyoung G Hong

Approaches for estimating genetic effects at the individual level often focus on analyzing phenotypes at a single time point, with less attention given to longitudinal phenotypes. This paper introduces a mixed modeling approach that includes both genetic and individual-specific random effects, and is designed to estimate genetic effects on both the baseline and slope for a longitudinal trajectory. The inclusion of genetic effects on both baseline and slope, combined with the crossed structure of genetic and individual-specific random effects, creates complex dependencies across repeated measurements for all subjects. These complexities necessitate the development of novel estimation procedures for parameter estimation and individual-specific predictions of genetic effects on both baseline and slope. We employ an Average Information Restricted Maximum Likelihood (AI-ReML) algorithm to estimate the variance components corresponding to genetic and individual-specific effects for the baseline levels and rates of change for a longitudinal phenotype. The algorithm is used to characterizes the prostate-specific antigen (PSA) trajectories for participants who remained prostate cancer-free in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. Understanding genetic and individual-specific variation in this population will provide insights for determining the role of genetics in cancer screening. Our results reveal significant genetic contributions to both the initial PSA levels and their progression over time, highlighting the role of these genetic factors on the variability of PSA across unaffected individuals. We show how genetic factors can be used to identify individuals prone to large baseline and increasing trajectories PSA values among individuals who are prostate cancer-free. In turn, we can identify groups of individuals who have a high probability of falsely screening positive for prostate cancer using well established cutoffs for early detection based on the level and rate of change in this biomarker. The results demonstrate the importance of incorporating genetic factors for monitoring PSA for more accurate prostate cancer detection.

估计个体水平遗传效应的方法通常侧重于分析单个时间点的表型,而对纵向表型的关注较少。本文介绍了一种混合建模方法,该方法包括遗传和个体特异性随机效应,旨在估计纵向轨迹基线和斜率上的遗传效应。包括基线和斜率的遗传效应,结合遗传和个体特异性随机效应的交叉结构,在所有受试者的重复测量中产生复杂的依赖关系。这些复杂性需要开发新的估计程序,用于参数估计和对基线和斜率的遗传效应的个人特定预测。我们采用平均信息限制最大似然(AI-ReML)算法来估计与纵向表型的基线水平和变化率的遗传和个体特异性影响相对应的方差成分。该算法用于在前列腺、肺、结直肠癌和卵巢癌(PLCO)癌症筛查试验中保持无前列腺癌的参与者的前列腺特异性抗原(PSA)轨迹特征。了解这一人群的遗传和个体特异性变异将为确定遗传学在癌症筛查中的作用提供见解。我们的研究结果揭示了遗传因素对初始PSA水平及其随时间变化的重要影响,强调了这些遗传因素在未受影响个体中PSA变异性的作用。我们展示了遗传因素如何用于识别无前列腺癌个体中PSA值基线较大和轨迹增加的个体。反过来,我们可以根据这种生物标志物的水平和变化速度,使用完善的早期检测截止值,识别出高概率误诊为前列腺癌阳性的个体群体。结果表明结合遗传因素监测PSA对于更准确的前列腺癌检测的重要性。
{"title":"MIXED MODELING APPROACH FOR CHARACTERIZING THE GENETIC EFFECTS IN A LONGITUDINAL PHENOTYPE.","authors":"Pei Zhang, Paul S Albert, Hyokyoung G Hong","doi":"10.1214/25-aoas2033","DOIUrl":"10.1214/25-aoas2033","url":null,"abstract":"<p><p>Approaches for estimating genetic effects at the individual level often focus on analyzing phenotypes at a single time point, with less attention given to longitudinal phenotypes. This paper introduces a mixed modeling approach that includes both genetic and individual-specific random effects, and is designed to estimate genetic effects on both the baseline and slope for a longitudinal trajectory. The inclusion of genetic effects on both baseline and slope, combined with the crossed structure of genetic and individual-specific random effects, creates complex dependencies across repeated measurements for all subjects. These complexities necessitate the development of novel estimation procedures for parameter estimation and individual-specific predictions of genetic effects on both baseline and slope. We employ an Average Information Restricted Maximum Likelihood (AI-ReML) algorithm to estimate the variance components corresponding to genetic and individual-specific effects for the baseline levels and rates of change for a longitudinal phenotype. The algorithm is used to characterizes the prostate-specific antigen (PSA) trajectories for participants who remained prostate cancer-free in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. Understanding genetic and individual-specific variation in this population will provide insights for determining the role of genetics in cancer screening. Our results reveal significant genetic contributions to both the initial PSA levels and their progression over time, highlighting the role of these genetic factors on the variability of PSA across unaffected individuals. We show how genetic factors can be used to identify individuals prone to large baseline and increasing trajectories PSA values among individuals who are prostate cancer-free. In turn, we can identify groups of individuals who have a high probability of falsely screening positive for prostate cancer using well established cutoffs for early detection based on the level and rate of change in this biomarker. The results demonstrate the importance of incorporating genetic factors for monitoring PSA for more accurate prostate cancer detection.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2070-2087"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA. 坦桑尼亚北部临床意义的败血症表型的贝叶斯学习。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2045
Alexander Dombowsky, David B Dunson, Deng B Madut, Matthew P Rubach, Amy H Herring

Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of sepsis patients that correspond to subtypes, with the long-term goal of using these clusters to design subtype-specific treatments. Therefore, clinicians rely on clusters having a concrete medical interpretation, usually corresponding to clinically meaningful regions of the sample space that have a concrete implication to practitioners. In this article, we propose Clustering Around Meaningful Regions (CLAMR), a Bayesian clustering approach that explicitly models the medical interpretation of each cluster center. CLAMR favors clusterings that can be summarized via meaningful feature values, leading to medically significant sepsis patient clusters. We also provide details on measuring the effect of each feature on the clustering using Bayesian hypothesis tests, so one can assess what features are relevant for cluster interpretation. Our focus is on clustering sepsis patients from Moshi, Tanzania, where patients are younger and the prevalence of HIV infection is higher than in previous sepsis subtyping cohorts.

败血症是一种危及生命的疾病,由宿主对感染的反应失调引起。最近,研究人员假设脓毒症由不同亚型的异质谱组成,这促使一些研究确定与亚型相对应的脓毒症患者群,并利用这些群设计亚型特异性治疗的长期目标。因此,临床医生依赖具有具体医学解释的聚类,通常对应于对从业者具有具体含义的样本空间中有临床意义的区域。在本文中,我们提出了围绕有意义区域的聚类(CLAMR),这是一种贝叶斯聚类方法,它明确地模拟了每个聚类中心的医学解释。CLAMR倾向于可以通过有意义的特征值进行总结的聚类,从而导致具有医学意义的脓毒症患者聚类。我们还提供了使用贝叶斯假设检验测量每个特征对聚类的影响的详细信息,因此可以评估哪些特征与聚类解释相关。我们的重点是来自坦桑尼亚Moshi的聚类脓毒症患者,那里的患者更年轻,HIV感染的流行率高于以前的脓毒症亚型队列。
{"title":"BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA.","authors":"Alexander Dombowsky, David B Dunson, Deng B Madut, Matthew P Rubach, Amy H Herring","doi":"10.1214/25-aoas2045","DOIUrl":"10.1214/25-aoas2045","url":null,"abstract":"<p><p>Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of sepsis patients that correspond to subtypes, with the long-term goal of using these clusters to design subtype-specific treatments. Therefore, clinicians rely on clusters having a concrete medical interpretation, usually corresponding to clinically meaningful regions of the sample space that have a concrete implication to practitioners. In this article, we propose Clustering Around Meaningful Regions (CLAMR), a Bayesian clustering approach that explicitly models the medical interpretation of each cluster center. CLAMR favors clusterings that can be summarized via meaningful feature values, leading to medically significant sepsis patient clusters. We also provide details on measuring the effect of each feature on the clustering using Bayesian hypothesis tests, so one can assess what features are relevant for cluster interpretation. Our focus is on clustering sepsis patients from Moshi, Tanzania, where patients are younger and the prevalence of HIV infection is higher than in previous sepsis subtyping cohorts.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"2193-2217"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12422288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN DIFFERENTIAL CAUSAL DIRECTED ACYCLIC GRAPHS FOR OBSERVATIONAL ZERO-INFLATED COUNTS WITH AN APPLICATION TO TWO-SAMPLE SINGLE-CELL DATA. 观测零膨胀计数的贝叶斯微分因果有向无环图及其在双样本单细胞数据中的应用。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2042
Junsouk Choi, Robert S Chapkin, Yang Ni

Observational zero-inflated count data arise in a wide range of areas such as genomics. One of the common research questions is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. Moreover, it is often interesting to study differences in the causal networks for data collected from two experimental groups (control vs treatment). To explicitly account for zero-inflation and identify differential causal networks, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model. We prove that the causal relationships under the proposed DAG0 are fully identifiable from purely observational, cross-sectional data, using a general proof technique that is applicable beyond the proposed model. Bayesian inference based on parallel-tempered Markov chain Monte Carlo is developed to efficiently explore the multi-modal posterior landscape. We demonstrate the utility of the proposed DAG0 by comparing it with state-of-the-art alternative methods through extensive simulations. An application in a single-cell RNA-sequencing dataset generated under two experimental groups finds some interesting results that appear to be consistent with existing knowledge. A user-friendly R package that implements DAG0 is available at https://github.com/junsoukchoi/BayesDAG0.git.

观测零膨胀计数数据出现在广泛的领域,如基因组学。一个常见的研究问题是通过学习稀疏有向无环图(DAG)的结构来识别因果关系。虽然dag的结构学习一直是一个活跃的研究领域,但现有的方法不能充分考虑过多的零,因此不适合建模零膨胀计数数据。此外,研究从两个实验组(对照组与实验组)收集的数据的因果网络差异通常是有趣的。为了明确地解释零膨胀和识别差分因果网络,我们提出了一个新的贝叶斯微分零膨胀负二项DAG (DAG0)模型。我们使用一种适用于所提出模型之外的一般证明技术,证明了所提出的DAG0下的因果关系完全可以从纯粹的观察性横截面数据中识别出来。为了有效地探索多模态后验景观,提出了基于并行调节马尔可夫链蒙特卡罗的贝叶斯推理方法。我们通过广泛的模拟将所提出的DAG0与最先进的替代方法进行比较,从而证明了它的实用性。在两个实验组生成的单细胞rna测序数据集中的应用发现了一些有趣的结果,这些结果似乎与现有知识一致。一个实现DAG0的用户友好的R包可以在https://github.com/junsoukchoi/BayesDAG0.git上获得。
{"title":"BAYESIAN DIFFERENTIAL CAUSAL DIRECTED ACYCLIC GRAPHS FOR OBSERVATIONAL ZERO-INFLATED COUNTS WITH AN APPLICATION TO TWO-SAMPLE SINGLE-CELL DATA.","authors":"Junsouk Choi, Robert S Chapkin, Yang Ni","doi":"10.1214/25-aoas2042","DOIUrl":"10.1214/25-aoas2042","url":null,"abstract":"<p><p>Observational zero-inflated count data arise in a wide range of areas such as genomics. One of the common research questions is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. Moreover, it is often interesting to study differences in the causal networks for data collected from two experimental groups (control vs treatment). To explicitly account for zero-inflation and identify differential causal networks, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model. We prove that the causal relationships under the proposed DAG0 are fully identifiable from purely observational, cross-sectional data, using a general proof technique that is applicable beyond the proposed model. Bayesian inference based on parallel-tempered Markov chain Monte Carlo is developed to efficiently explore the multi-modal posterior landscape. We demonstrate the utility of the proposed DAG0 by comparing it with state-of-the-art alternative methods through extensive simulations. An application in a single-cell RNA-sequencing dataset generated under two experimental groups finds some interesting results that appear to be consistent with existing knowledge. A user-friendly R package that implements DAG0 is available at https://github.com/junsoukchoi/BayesDAG0.git.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1908-1930"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVERAGED PREDICTION MODELS (APM): IDENTIFYING CAUSAL EFFECTS IN CONTROLLED PRE-POST SETTINGS WITH APPLICATION TO GUN POLICY. 平均预测模型(apm):识别控制前后设置的因果关系,并应用于枪支政策。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2011
Thomas Leavitt, Laura A Hatfield

To investigate causal impacts, many researchers use controlled pre-post designs that compare over-time differences between a population exposed to a policy change and an unexposed comparison group. However, researchers using these designs often disagree about the "correct" specification of the causal model, perhaps most notably in analyses to identify the effects of gun policies on crime. To help settle these model specification debates, we propose a general identification framework that unifies a variety of models researchers use in practice. In this framework, which nests "brand name" designs like Difference-in-Differences as special cases, we use models to predict untreated outcomes and then correct the treated group's predictions using the comparison group's observed prediction errors. Our point identifying assumption is that treated and comparison groups would have equal prediction errors (in expectation) under no treatment. To choose among candidate models, we propose a data-driven procedure based on models' robustness to violations of this point identifying assumption. Our selection procedure averages over candidate models, weighting by each model's posterior probability of being the most robust given its differential average prediction errors in the pre-period. This approach offers a way out of debates over the "correct" model by choosing on robustness instead and has the desirable property of being feasible in the "locked box" of pre-intervention data only. We apply our methodology to the gun policy debate, focusing specifically on Missouri's 2007 repeal of its permit-to-purchase law, and provide an R package (apm) for implementation.

为了调查因果影响,许多研究人员使用受控的前后设计,比较暴露于政策变化的人群和未暴露的对照组之间的长期差异。然而,使用这些设计的研究人员经常对因果模型的“正确”说明持不同意见,也许最明显的是在确定枪支政策对犯罪的影响的分析中。为了帮助解决这些模型规范的争论,我们提出了一个通用的识别框架,它统一了研究人员在实践中使用的各种模型。在这个框架中,我们使用模型来预测未治疗组的结果,然后使用对照组观察到的预测误差来纠正治疗组的预测。我们的观点识别假设是,治疗组和对照组在没有治疗的情况下会有相同的预测误差(在期望中)。为了在候选模型中进行选择,我们提出了一个基于模型对违反这一点识别假设的鲁棒性的数据驱动程序。我们的选择过程对候选模型进行平均,根据每个模型的后验概率进行加权,给定其在前期的微分平均预测误差。这种方法提供了一种方法,通过选择鲁棒性来解决关于“正确”模型的争论,并且具有仅在干预前数据的“锁定框”中可行的理想特性。我们将我们的方法应用于枪支政策辩论,特别关注密苏里州2007年废除其购买许可法,并提供一个R包(apm)用于实施。
{"title":"AVERAGED PREDICTION MODELS (APM): IDENTIFYING CAUSAL EFFECTS IN CONTROLLED PRE-POST SETTINGS WITH APPLICATION TO GUN POLICY.","authors":"Thomas Leavitt, Laura A Hatfield","doi":"10.1214/25-aoas2011","DOIUrl":"10.1214/25-aoas2011","url":null,"abstract":"<p><p>To investigate causal impacts, many researchers use controlled pre-post designs that compare over-time differences between a population exposed to a policy change and an unexposed comparison group. However, researchers using these designs often disagree about the \"correct\" specification of the causal model, perhaps most notably in analyses to identify the effects of gun policies on crime. To help settle these model specification debates, we propose a general identification framework that unifies a variety of models researchers use in practice. In this framework, which nests \"brand name\" designs like Difference-in-Differences as special cases, we use models to predict untreated outcomes and then correct the treated group's predictions using the comparison group's observed prediction errors. Our point identifying assumption is that treated and comparison groups would have equal prediction errors (in expectation) under no treatment. To choose among candidate models, we propose a data-driven procedure based on models' robustness to violations of this point identifying assumption. Our selection procedure averages over candidate models, weighting by each model's posterior probability of being the most robust given its differential average prediction errors in the pre-period. This approach offers a way out of debates over the \"correct\" model by choosing on robustness instead and has the desirable property of being feasible in the \"locked box\" of pre-intervention data only. We apply our methodology to the gun policy debate, focusing specifically on Missouri's 2007 repeal of its permit-to-purchase law, and provide an R package (apm) for implementation.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1826-1846"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SURROGATE SELECTION OVERSAMPLES EXPANDED T CELL CLONOTYPES. 选择扩增的t细胞克隆型。
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2025-09-01 Epub Date: 2025-08-28 DOI: 10.1214/25-aoas2032
Peng Yu, Yumin Lian, Elliot Xie, Cindy L Zuleger, Richard J Albertini, Mark R Albertini, Michael A Newton

Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.

替代选择是一种实验设计,不需要对任何DNA进行测序,就可以将细胞样本限制在携带某些基因组突变的细胞中。在免疫学疾病研究中,由于中性突变的出现与克隆亚群的增殖历史相关,这种设计可能提供了一种相对简单的方法来丰富与疾病反应相关的淋巴细胞样本。克隆型大小的统计分析提供了一个结构化的,定量的角度对这一有用的属性选择代孕。我们的模型规范在克隆型出生-死亡过程中与跨克隆型的可交换模型耦合。除了关于代理选择设计的丰富问题之外,我们的框架还可以研究基本样本多样性统计的抽样特性;它还指出了新的统计数据,可以有效地测量与克隆扩增相关的体细胞基因组改变的负担。我们研究了由耦合模型规范控制的免疫样本的统计特性,并说明了黑色素瘤的替代选择研究和T细胞谱的单细胞基因组研究中的计算。
{"title":"SURROGATE SELECTION OVERSAMPLES EXPANDED T CELL CLONOTYPES.","authors":"Peng Yu, Yumin Lian, Elliot Xie, Cindy L Zuleger, Richard J Albertini, Mark R Albertini, Michael A Newton","doi":"10.1214/25-aoas2032","DOIUrl":"10.1214/25-aoas2032","url":null,"abstract":"<p><p>Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 3","pages":"1884-1907"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12481847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1