首页 > 最新文献

Annals of Applied Statistics最新文献

英文 中文
BAYESIAN INFERENCE AND DYNAMIC PREDICTION FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA. 多变量纵向和生存数据的贝叶斯推理和动态预测。
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-09-01 Epub Date: 2023-09-07 DOI: 10.1214/23-aoas1733
Haotian Zou, Donglin Zeng, Luo Xiao, Sheng Luo

Alzheimer's disease (AD) is a complex neurological disorder impairing multiple domains such as cognition and daily functions. To better understand the disease and its progression, many AD research studies collect multiple longitudinal outcomes that are strongly predictive of the onset of AD dementia. We propose a joint model based on a multivariate functional mixed model framework (referred to as MFMM-JM) that simultaneously models the multiple longitudinal outcomes and the time to dementia onset. We develop six functional forms to fully investigate the complex association between longitudinal outcomes and dementia onset. Moreover, we use the Bayesian methods for statistical inference and develop a dynamic prediction framework that provides accurate personalized predictions of disease progressions based on new subject-specific data. We apply the proposed MFMM-JM to two large ongoing AD studies: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC), and identify the functional forms with the best predictive performance. our method is also validated by extensive simulation studies with five settings.

阿尔茨海默病(AD)是一种复杂的神经系统疾病,损害认知和日常功能等多个领域。为了更好地了解这种疾病及其进展,许多AD研究收集了多种纵向结果,这些结果有力地预测了AD痴呆的发作。我们提出了一个基于多变量功能混合模型框架(称为MFMM-JM)的联合模型,该模型同时对多个纵向结果和痴呆发作时间进行建模。我们开发了六种功能形式,以全面研究纵向结果与痴呆症发作之间的复杂关联。此外,我们使用贝叶斯方法进行统计推断,并开发了一个动态预测框架,该框架基于新的特定受试者数据对疾病进展进行准确的个性化预测。我们将所提出的MFMM-JM应用于两项正在进行的大型AD研究:阿尔茨海默病神经成像倡议(ADNI)和国家阿尔茨海默病协调中心(NACC),并确定具有最佳预测性能的功能形式。我们的方法也通过五个设置的大量模拟研究得到了验证。
{"title":"BAYESIAN INFERENCE AND DYNAMIC PREDICTION FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.","authors":"Haotian Zou, Donglin Zeng, Luo Xiao, Sheng Luo","doi":"10.1214/23-aoas1733","DOIUrl":"10.1214/23-aoas1733","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a complex neurological disorder impairing multiple domains such as cognition and daily functions. To better understand the disease and its progression, many AD research studies collect multiple longitudinal outcomes that are strongly predictive of the onset of AD dementia. We propose a joint model based on a multivariate functional mixed model framework (referred to as MFMM-JM) that simultaneously models the multiple longitudinal outcomes and the time to dementia onset. We develop six functional forms to fully investigate the complex association between longitudinal outcomes and dementia onset. Moreover, we use the Bayesian methods for statistical inference and develop a dynamic prediction framework that provides accurate personalized predictions of disease progressions based on new subject-specific data. We apply the proposed MFMM-JM to two large ongoing AD studies: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC), and identify the functional forms with the best predictive performance. our method is also validated by extensive simulation studies with five settings.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 3","pages":"2574-2595"},"PeriodicalIF":1.3,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500582/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROBABILISTIC LEARNING OF TREATMENT TREES IN CANCER. 癌症治疗树的概率学习。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-09-01 Epub Date: 2023-09-07 DOI: 10.1214/22-aoas1696
Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Jinju Li, Veerabhadran Baladandayuthapani

Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (Rx-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. Our analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, we uncover new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations. Our accompanying code, data, and shiny application for visualization of results are available at: https://github.com/bayesrx/RxTree.

准确识别协同治疗组合及其潜在的生物学机制对于许多疾病领域至关重要,尤其是癌症。在转化肿瘤学研究中,患者来源的异种移植物(PDX)等临床前系统已成为一种独特的研究设计,用于评估对植入基因相同小鼠的同一人类肿瘤样本进行的多种治疗。在本文中,我们提出了一种新的基于贝叶斯概率树的PDX数据框架,通过推断处理聚类树(称为处理树(Rx树))来研究处理之间的层次关系。该框架激发了两种或两种以上处理之间机制相似性的新度量,考虑到树估计中固有的不确定性;具有高估计相似性的处理具有潜在的高机制协同作用。在Dirichlet扩散树的基础上,我们推导了一种对树结构进行编码的闭式边缘似然,这有助于通过一种新的两阶段算法进行计算高效的后验推理。仿真研究表明,该方法在恢复树结构和处理相似性方面具有优越的性能。我们对最近整理的PDX数据集的分析产生了治疗相似性估计,显示出与五种不同癌症治疗的已知生物学机制高度一致。更重要的是,我们发现了新的、潜在有效的联合疗法,为未来的临床研究提供了对特定下游生物途径的协同调节。我们的附带代码、数据和用于结果可视化的闪亮应用程序可在以下位置获得:https://github.com/bayesrx/RxTree.
{"title":"PROBABILISTIC LEARNING OF TREATMENT TREES IN CANCER.","authors":"Tsung-Hung Yao,&nbsp;Zhenke Wu,&nbsp;Karthik Bharath,&nbsp;Jinju Li,&nbsp;Veerabhadran Baladandayuthapani","doi":"10.1214/22-aoas1696","DOIUrl":"10.1214/22-aoas1696","url":null,"abstract":"<p><p>Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (R<sub>x</sub>-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. Our analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, we uncover new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations. Our accompanying code, data, and shiny application for visualization of results are available at: https://github.com/bayesrx/RxTree.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 3","pages":"1884-1908"},"PeriodicalIF":1.8,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501503/pdf/nihms-1857187.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10308161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDENTIFICATION OF IMMUNE RESPONSE COMBINATIONS ASSOCIATED WITH HETEROGENEOUS INFECTION RISK IN THE IMMUNE CORRELATES ANALYSIS OF HIV VACCINE STUDIES. 在艾滋病毒疫苗研究的免疫相关性分析中确定与异质性感染风险相关的免疫反应组合。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 Epub Date: 2023-05-01 DOI: 10.1214/22-aoas1665
Chaeryon Kang, Ying Huang

In HIV vaccine/prevention research, probing into the vaccine-induced immune responses that can help predict the risk of HIV infection provides valuable information for the development of vaccine regimens. Previous correlate analysis of the Thai vaccine trial aided the discovery of interesting immune correlates related to the risk of developing an HIV infection. The present study aimed to identify the combinations of immune responses associated with the heterogeneous infection risk. We explored a "change-plane" via combination of a subset of immune responses that could help separate vaccine recipients into two heterogeneous subgroups in terms of the association between immune responses and the risk of developing infection. Additionally, we developed a new variable selection algorithm through a penalized likelihood approach to investigate a parsimonious marker combination for the change-plane. The resulting marker combinations can serve as candidate correlates of protection and can be used for predicting the protective effect of the vaccine against HIV infection. The application of the proposed statistical approach to the Thai trial has been presented, wherein the marker combinations were explored among several immune responses and antigens.

在艾滋病疫苗/预防研究中,探究有助于预测艾滋病感染风险的疫苗诱导免疫反应为疫苗方案的开发提供了宝贵的信息。之前对泰国疫苗试验进行的相关分析有助于发现与感染 HIV 风险有关的有趣的免疫相关因素。本研究旨在确定与不同感染风险相关的免疫反应组合。我们通过免疫反应子集的组合探索了一种 "变化平面",它可以帮助将疫苗接受者分为两个异质亚组,即免疫反应与感染风险之间的关联。此外,我们还通过惩罚似然法开发了一种新的变量选择算法,以研究变化平面的合理标记物组合。由此得出的标记物组合可作为保护的候选相关因子,并可用于预测疫苗对艾滋病感染的保护效果。本文介绍了所提出的统计方法在泰国试验中的应用,其中探讨了几种免疫反应和抗原之间的标记物组合。
{"title":"IDENTIFICATION OF IMMUNE RESPONSE COMBINATIONS ASSOCIATED WITH HETEROGENEOUS INFECTION RISK IN THE IMMUNE CORRELATES ANALYSIS OF HIV VACCINE STUDIES.","authors":"Chaeryon Kang, Ying Huang","doi":"10.1214/22-aoas1665","DOIUrl":"10.1214/22-aoas1665","url":null,"abstract":"<p><p>In HIV vaccine/prevention research, probing into the vaccine-induced immune responses that can help predict the risk of HIV infection provides valuable information for the development of vaccine regimens. Previous correlate analysis of the Thai vaccine trial aided the discovery of interesting immune correlates related to the risk of developing an HIV infection. The present study aimed to identify the combinations of immune responses associated with the heterogeneous infection risk. We explored a \"change-plane\" via combination of a subset of immune responses that could help separate vaccine recipients into two heterogeneous subgroups in terms of the association between immune responses and the risk of developing infection. Additionally, we developed a new variable selection algorithm through a penalized likelihood approach to investigate a parsimonious marker combination for the change-plane. The resulting marker combinations can serve as candidate correlates of protection and can be used for predicting the protective effect of the vaccine against HIV infection. The application of the proposed statistical approach to the Thai trial has been presented, wherein the marker combinations were explored among several immune responses and antigens.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1199-1219"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9755428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA. 空间分辨转录组数据的联合聚类。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 Epub Date: 2023-05-01 DOI: 10.1214/22-aoas1677
Andrea Sottosanti, Davide Risso

Spatial transcriptomics is a groundbreaking technology that allows the measurement of the activity of thousands of genes in a tissue sample and maps where the activity occurs. This technology has enabled the study of the spatial variation of the genes across the tissue. Comprehending gene functions and interactions in different areas of the tissue is of great scientific interest, as it might lead to a deeper understanding of several key biological mechanisms, such as cell-cell communication or tumor-microenvironment interaction. To do so, one can group cells of the same type and genes that exhibit similar expression patterns. However, adequate statistical tools that exploit the previously unavailable spatial information to more coherently group cells and genes are still lacking. In this work, we introduce SpaRTaCo, a new statistical model that clusters the spatial expression profiles of the genes according to a partition of the tissue. This is accomplished by performing a co-clustering, i.e., inferring the latent block structure of the data and inducing two types of clustering: of the genes, using their expression across the tissue, and of the image areas, using the gene expression in the spots where the RNA is collected. Our proposed methodology is validated with a series of simulation experiments and its usefulness in responding to specific biological questions is illustrated with an application to a human brain tissue sample processed with the 10X-Visium protocol.

空间转录组学是一项突破性的技术,可以测量组织样本中数千个基因的活性,并绘制活性发生的位置。这项技术使研究跨组织基因的空间变异成为可能。了解组织不同区域的基因功能和相互作用具有重大的科学意义,因为这可能会加深对几个关键生物学机制的理解,如细胞-细胞通讯或肿瘤微环境相互作用。要做到这一点,可以将表现出相似表达模式的相同类型和基因的细胞分组。然而,仍然缺乏充分的统计工具来利用以前不可用的空间信息来更一致地对细胞和基因进行分组。在这项工作中,我们介绍了SpaRTaCo,这是一种新的统计模型,根据组织的划分对基因的空间表达谱进行聚类。这是通过执行共聚类来实现的,即推断数据的潜在块结构并诱导两种类型的聚类:使用基因在组织中的表达的基因聚类和使用收集RNA的点中的基因表达的图像区域聚类。我们提出的方法通过一系列模拟实验得到了验证,并通过应用于用10X Visium协议处理的人脑组织样本来说明其在回答特定生物学问题方面的有用性。
{"title":"CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA.","authors":"Andrea Sottosanti, Davide Risso","doi":"10.1214/22-aoas1677","DOIUrl":"10.1214/22-aoas1677","url":null,"abstract":"<p><p>Spatial transcriptomics is a groundbreaking technology that allows the measurement of the activity of thousands of genes in a tissue sample and maps where the activity occurs. This technology has enabled the study of the spatial variation of the genes across the tissue. Comprehending gene functions and interactions in different areas of the tissue is of great scientific interest, as it might lead to a deeper understanding of several key biological mechanisms, such as cell-cell communication or tumor-microenvironment interaction. To do so, one can group cells of the same type and genes that exhibit similar expression patterns. However, adequate statistical tools that exploit the previously unavailable spatial information to more coherently group cells and genes are still lacking. In this work, we introduce SpaRTaCo, a new statistical model that clusters the spatial expression profiles of the genes according to a partition of the tissue. This is accomplished by performing a co-clustering, i.e., inferring the latent block structure of the data and inducing two types of clustering: of the genes, using their expression across the tissue, and of the image areas, using the gene expression in the <i>spots</i> where the RNA is collected. Our proposed methodology is validated with a series of simulation experiments and its usefulness in responding to specific biological questions is illustrated with an application to a human brain tissue sample processed with the 10X-Visium protocol.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1444-1468"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41163012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS. 电子病历中不平衡阳性未标记诊断码的贝叶斯分析。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 DOI: 10.1214/22-AOAS1666
Ru Wang, Ye Liang, Zhuqi Miao, Tieming Liu

With the increasing availability of electronic health records (EHR), significant progress has been made on developing predictive inference and algorithms by health data analysts and researchers. However, the EHR data are notoriously noisy due to missing and inaccurate inputs despite the information is abundant. One serious problem is that only a small portion of patients in the database has confirmatory diagnoses while many other patients remain undiagnosed because they did not comply with the recommended examinations. The phenomenon leads to a so-called positive-unlabelled situation and the labels are extremely imbalanced. In this paper, we propose a model-based approach to classify the unlabelled patients by using a Bayesian finite mixture model. We also discuss the label switching issue for the imbalanced data and propose a consensus Monte Carlo approach to address the imbalance issue and improve computational efficiency simultaneously. Simulation studies show that our proposed model-based approach outperforms existing positive-unlabelled learning algorithms. The proposed method is applied on the Cerner EHR for detecting diabetic retinopathy (DR) patients using laboratory measurements. With only 3% confirmatory diagnoses in the EHR database, we estimate the actual DR prevalence to be 25% which coincides with reported findings in the medical literature.

随着电子健康记录(EHR)的日益普及,卫生数据分析人员和研究人员在开发预测推理和算法方面取得了重大进展。然而,尽管信息丰富,但EHR数据由于缺失和不准确的输入而臭名昭著。一个严重的问题是,数据库中只有一小部分患者有确诊,而许多其他患者由于没有遵守推荐的检查而未被诊断。这种现象导致了一种所谓的积极无标签的情况,标签是极其不平衡的。在本文中,我们提出了一种基于模型的方法,使用贝叶斯有限混合模型对未标记的患者进行分类。我们还讨论了不平衡数据的标签切换问题,并提出了一种共识蒙特卡罗方法来解决不平衡问题,同时提高计算效率。仿真研究表明,我们提出的基于模型的方法优于现有的正无标签学习算法。将该方法应用于Cerner EHR,通过实验室测量来检测糖尿病视网膜病变(DR)患者。在EHR数据库中只有3%的确诊诊断,我们估计DR的实际患病率为25%,这与医学文献报道的结果一致。
{"title":"BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS.","authors":"Ru Wang,&nbsp;Ye Liang,&nbsp;Zhuqi Miao,&nbsp;Tieming Liu","doi":"10.1214/22-AOAS1666","DOIUrl":"https://doi.org/10.1214/22-AOAS1666","url":null,"abstract":"<p><p>With the increasing availability of electronic health records (EHR), significant progress has been made on developing predictive inference and algorithms by health data analysts and researchers. However, the EHR data are notoriously noisy due to missing and inaccurate inputs despite the information is abundant. One serious problem is that only a small portion of patients in the database has confirmatory diagnoses while many other patients remain undiagnosed because they did not comply with the recommended examinations. The phenomenon leads to a so-called positive-unlabelled situation and the labels are extremely imbalanced. In this paper, we propose a model-based approach to classify the unlabelled patients by using a Bayesian finite mixture model. We also discuss the label switching issue for the imbalanced data and propose a consensus Monte Carlo approach to address the imbalance issue and improve computational efficiency simultaneously. Simulation studies show that our proposed model-based approach outperforms existing positive-unlabelled learning algorithms. The proposed method is applied on the Cerner EHR for detecting diabetic retinopathy (DR) patients using laboratory measurements. With only 3% confirmatory diagnoses in the EHR database, we estimate the actual DR prevalence to be 25% which coincides with reported findings in the medical literature.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1220-1238"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10156089/pdf/nihms-1852796.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9563428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust joint modelling of left-censored longitudinal data and survival data with application to HIV vaccine studies. 左删失纵向数据和生存数据的稳健联合建模,并将其应用于艾滋病疫苗研究。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 Epub Date: 2023-05-01 DOI: 10.1214/22-aoas1656
Tingting Yu, Lang Wu, Jin Qiu, Peter B Gilbert

In jointly modelling longitudinal and survival data, the longitudinal data may be complex in the sense that they may contain outliers and may be left censored. Motivated from an HIV vaccine study, we propose a robust method for joint models of longitudinal and survival data, where the outliers in longitudinal data are addressed using a multivariate t-distribution for b-outliers and using an M-estimator for e-outliers. We also propose a computationally efficient method for approximate likelihood inference. The proposed method is evaluated by simulation studies. Based on the proposed models and method, we analyze the HIV vaccine data and find a strong association between longitudinal biomarkers and the risk of HIV infection.

在对纵向数据和生存数据进行联合建模时,纵向数据可能比较复杂,因为它们可能包含异常值,也可能会被留存。受一项艾滋病疫苗研究的启发,我们提出了一种用于纵向数据和生存数据联合建模的稳健方法,其中对 b 型离群值使用多元 t 分布,对 e 型离群值使用 M 估计器来处理纵向数据中的离群值。我们还提出了一种计算效率高的近似似然推断方法。我们通过模拟研究对提出的方法进行了评估。根据提出的模型和方法,我们分析了 HIV 疫苗数据,发现纵向生物标志物与 HIV 感染风险之间存在密切联系。
{"title":"Robust joint modelling of left-censored longitudinal data and survival data with application to HIV vaccine studies.","authors":"Tingting Yu, Lang Wu, Jin Qiu, Peter B Gilbert","doi":"10.1214/22-aoas1656","DOIUrl":"10.1214/22-aoas1656","url":null,"abstract":"<p><p>In jointly modelling longitudinal and survival data, the longitudinal data may be complex in the sense that they may contain outliers and may be left censored. Motivated from an HIV vaccine study, we propose a robust method for joint models of longitudinal and survival data, where the outliers in longitudinal data are addressed using a multivariate t-distribution for b-outliers and using an M-estimator for e-outliers. We also propose a computationally efficient method for approximate likelihood inference. The proposed method is evaluated by simulation studies. Based on the proposed models and method, we analyze the HIV vaccine data and find a strong association between longitudinal biomarkers and the risk of HIV infection.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1017-1037"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10135025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DYNAMIC RISK PREDICTION TRIGGERED BY INTERMEDIATE EVENTS USING SURVIVAL TREE ENSEMBLES. 利用存活树集合对中间事件引发的动态风险进行预测。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 Epub Date: 2023-05-01 DOI: 10.1214/22-aoas1674
Yifei Sun, Sy Han Chiou, Colin O Wu, Meghan McGarry, Chiung-Yu Huang

With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Foundation Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.

随着电子健康记录和登记数据库中海量数据的出现,结合随时间变化的患者信息来改进风险预测已引起人们的极大关注。为了利用随时间变化而不断增加的预测信息量,我们开发了一种使用生存树集合进行地标预测的统一框架,当有新信息出现时,可以进行更新预测。与具有固定地标时间的传统地标预测相比,我们的方法允许地标时间针对特定受试者,并由中间临床事件触发。此外,非参数方法还避免了不同地标时间模型不兼容的棘手问题。在我们的框架中,纵向预测因子和事件时间结果都受到右删减的影响,因此不能直接应用现有的基于树的方法。为了解决分析上的难题,我们提出了一种基于风险集的集合程序,通过平均各个树的马氏估计方程来实现。我们进行了广泛的模拟研究,以评估我们方法的性能。我们将这些方法应用于囊性纤维化基金会患者登记(CFFPR)数据,对囊性纤维化患者的肺部疾病进行动态预测,并找出重要的预后因素。
{"title":"DYNAMIC RISK PREDICTION TRIGGERED BY INTERMEDIATE EVENTS USING SURVIVAL TREE ENSEMBLES.","authors":"Yifei Sun, Sy Han Chiou, Colin O Wu, Meghan McGarry, Chiung-Yu Huang","doi":"10.1214/22-aoas1674","DOIUrl":"10.1214/22-aoas1674","url":null,"abstract":"<p><p>With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Foundation Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1375-1397"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241448/pdf/nihms-1846847.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9974256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INDIVIDUALIZED RISK ASSESSMENT OF PREOPERATIVE OPIOID USE BY INTERPRETABLE NEURAL NETWORK REGRESSION. 可解释神经网络回归对术前阿片类药物使用的个体化风险评估。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1634
Yuming Sun, Jian Kang, Chad Brummett, Yi Li

Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient-specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straight-forward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.

据报道,术前阿片类药物使用与术前阿片类药物需求增加、术后结果恶化以及术后医疗保健利用和支出增加有关。了解术前使用阿片类药物的风险有助于建立以患者为中心的疼痛管理。在机器学习领域,深度神经网络(deep neural network, DNN)因其卓越的预测能力而成为风险评估的有力手段;然而,与统计模型相比,黑盒算法可能会使结果的可解释性降低。为了弥合统计和机器学习领域之间的差距,我们提出了一种新的可解释神经网络回归(INNER),它结合了统计和深度神经网络模型的优势。我们使用拟议的INNER进行术前阿片类药物使用的个体化风险评估。密集的模拟和分析34186例外科手术中的镇痛效果研究(代谢)表明,该内部不仅可以准确地预测术前阿片类药物使用使用术前特征作为款,但也可以估计不同的阿片类药物使用的几率没有痛苦和阿片类药物使用的优势比单位增加报道全身疼痛,导致更多的直接解释的倾向比款使用阿片类药物。我们的研究结果确定了与阿片类药物使用密切相关的患者特征,并且与先前的研究结果在很大程度上一致,为INNER是术前阿片类药物使用个体化风险评估的有用工具提供了证据。
{"title":"INDIVIDUALIZED RISK ASSESSMENT OF PREOPERATIVE OPIOID USE BY INTERPRETABLE NEURAL NETWORK REGRESSION.","authors":"Yuming Sun,&nbsp;Jian Kang,&nbsp;Chad Brummett,&nbsp;Yi Li","doi":"10.1214/22-aoas1634","DOIUrl":"https://doi.org/10.1214/22-aoas1634","url":null,"abstract":"<p><p>Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient-specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straight-forward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 1","pages":"434-453"},"PeriodicalIF":1.8,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10065608/pdf/nihms-1836641.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9282926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TOPOLOGICAL LEARNING FOR BRAIN NETWORKS. 脑网络拓扑学习
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-01 Epub Date: 2023-01-24 DOI: 10.1214/22-aoas1633
Tananun Songdechakraiwut, Moo K Chung

This paper proposes a novel topological learning framework that integrates networks of different sizes and topology through persistent homology. Such challenging task is made possible through the introduction of a computationally efficient topological loss. The use of the proposed loss bypasses the intrinsic computational bottleneck associated with matching networks. We validate the method in extensive statistical simulations to assess its effectiveness when discriminating networks with different topology. The method is further demonstrated in a twin brain imaging study where we determine if brain networks are genetically heritable. The challenge here is due to the difficulty of overlaying the topologically different functional brain networks obtained from resting-state functional MRI onto the template structural brain network obtained through diffusion MRI.

本文提出了一种新颖的拓扑学习框架,通过持久同源性整合不同规模和拓扑结构的网络。通过引入计算效率高的拓扑损耗,这项具有挑战性的任务得以实现。使用所提出的损失可以绕过与匹配网络相关的内在计算瓶颈。我们在大量统计模拟中验证了这种方法,以评估它在区分不同拓扑结构的网络时的有效性。我们还在一项双胞胎大脑成像研究中进一步验证了该方法,并确定了大脑网络是否具有遗传性。我们面临的挑战是如何将静息态功能磁共振成像获得的拓扑结构不同的大脑功能网络叠加到通过扩散磁共振成像获得的大脑结构网络模板上。
{"title":"TOPOLOGICAL LEARNING FOR BRAIN NETWORKS.","authors":"Tananun Songdechakraiwut, Moo K Chung","doi":"10.1214/22-aoas1633","DOIUrl":"10.1214/22-aoas1633","url":null,"abstract":"<p><p>This paper proposes a novel topological learning framework that integrates networks of different sizes and topology through persistent homology. Such challenging task is made possible through the introduction of a computationally efficient topological loss. The use of the proposed loss bypasses the intrinsic computational bottleneck associated with matching networks. We validate the method in extensive statistical simulations to assess its effectiveness when discriminating networks with different topology. The method is further demonstrated in a twin brain imaging study where we determine if brain networks are genetically heritable. The challenge here is due to the difficulty of overlaying the topologically different functional brain networks obtained from resting-state functional MRI onto the template structural brain network obtained through diffusion MRI.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 1","pages":"403-433"},"PeriodicalIF":1.3,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9997114/pdf/nihms-1868875.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9481040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MODELING CELL POPULATIONS MEASURED BY FLOW CYTOMETRY WITH COVARIATES USING SPARSE MIXTURE OF REGRESSIONS. 用稀疏混合回归的协变量对流式细胞术测量的细胞群进行建模。
IF 1.8 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-03-01 DOI: 10.1214/22-aoas1631
By Sangwon Hyun, Mattias Rolf Cape, Francois Ribalet, Jacob Bien

The ocean is filled with microscopic microalgae, called phytoplankton, which together are responsible for as much photosynthesis as all plants on land combined. Our ability to predict their response to the warming ocean relies on understanding how the dynamics of phytoplankton populations is influenced by changes in environmental conditions. One powerful technique to study the dynamics of phytoplankton is flow cytometry which measures the optical properties of thousands of individual cells per second. Today, oceanographers are able to collect flow cytometry data in real time onboard a moving ship, providing them with fine-scale resolution of the distribution of phytoplankton across thousands of kilometers. One of the current challenges is to understand how these small- and large-scale variations relate to environmental conditions, such as nutrient availability, temperature, light and ocean currents. In this paper we propose a novel sparse mixture of multivariate regressions model to estimate the time-varying phytoplankton subpopulations while simultaneously identifying the specific environmental covariates that are predictive of the observed changes to these subpopulations. We demonstrate the usefulness and interpretability of the approach using both synthetic data and real observations collected on an oceanographic cruise conducted in the northeast Pacific in the spring of 2017.

海洋中充满了被称为浮游植物的微小微藻,它们的光合作用相当于陆地上所有植物的光合作用总和。我们预测它们对海洋变暖的反应的能力依赖于了解浮游植物种群的动态是如何受到环境条件变化的影响的。流式细胞术是研究浮游植物动力学的一种强有力的技术,它可以每秒测量数千个单个细胞的光学特性。今天,海洋学家能够在移动的船上实时收集流式细胞仪数据,为他们提供数千公里范围内浮游植物分布的精细分辨率。目前的挑战之一是了解这些小的和大的变化与环境条件的关系,如营养物质的可用性、温度、光线和洋流。在本文中,我们提出了一种新的稀疏混合多元回归模型来估计随时间变化的浮游植物亚群,同时确定预测这些亚群观测变化的特定环境协变量。我们利用2017年春季在东北太平洋进行的海洋巡航收集的合成数据和实际观测数据证明了该方法的有效性和可解释性。
{"title":"MODELING CELL POPULATIONS MEASURED BY FLOW CYTOMETRY WITH COVARIATES USING SPARSE MIXTURE OF REGRESSIONS.","authors":"By Sangwon Hyun,&nbsp;Mattias Rolf Cape,&nbsp;Francois Ribalet,&nbsp;Jacob Bien","doi":"10.1214/22-aoas1631","DOIUrl":"https://doi.org/10.1214/22-aoas1631","url":null,"abstract":"<p><p>The ocean is filled with microscopic microalgae, called phytoplankton, which together are responsible for as much photosynthesis as all plants on land combined. Our ability to predict their response to the warming ocean relies on understanding how the dynamics of phytoplankton populations is influenced by changes in environmental conditions. One powerful technique to study the dynamics of phytoplankton is flow cytometry which measures the optical properties of thousands of individual cells per second. Today, oceanographers are able to collect flow cytometry data in real time onboard a moving ship, providing them with fine-scale resolution of the distribution of phytoplankton across thousands of kilometers. One of the current challenges is to understand how these small- and large-scale variations relate to environmental conditions, such as nutrient availability, temperature, light and ocean currents. In this paper we propose a novel sparse mixture of multivariate regressions model to estimate the time-varying phytoplankton subpopulations while simultaneously identifying the specific environmental covariates that are predictive of the observed changes to these subpopulations. We demonstrate the usefulness and interpretability of the approach using both synthetic data and real observations collected on an oceanographic cruise conducted in the northeast Pacific in the spring of 2017.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 1","pages":"357-377"},"PeriodicalIF":1.8,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10360992/pdf/nihms-1917146.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9905301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Annals of Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1