Biostatistics最新文献

Unveiling Schizophrenia: a study with generalized functional linear mixed model via the investigation of functional random effects. 揭示精神分裂症：基于功能随机效应的广义泛函线性混合模型研究。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae049

Rongxiang Rui, Wei Xiong, Jianxin Pan, Maozai Tian

Previous studies have identified attenuated pre-speech activity and speech sound suppression in individuals with Schizophrenia, with similar patterns observed in basic tasks entailing button-pressing to perceive a tone. However, it remains unclear whether these patterns are uniform across individuals or vary from person to person. Motivated by electroencephalographic (EEG) data from a Schizophrenia study, we develop a generalized functional linear mixed model (GFLMM) for repeated measurements by incorporating subject-specific functional random effects associated with multiple functional predictors. To assess the significance of these functional effects, we employ two different multivariate functional principal component analysis methods, which transform the GFLMM into a conventional generalized linear mixed model, thereby facilitating its implementation with standard software. Furthermore, we introduce a cutting-edge testing approach utilizing working responses to detect both subject-specific and predictor-specific functional random effects. Monte Carlo simulation studies demonstrate the effectiveness of our proposed testing method. Application of the proposed methods to the Schizophrenia data reveals significant subject-specific effects of human brain activity in the frontal zone (Fz) and the central zone (Cz), providing valuable insights into the potential variations among individuals, from healthy controls to those diagnosed with Schizophrenia.

先前的研究已经发现，精神分裂症患者的言语前活动和语音抑制减弱，在需要按下按钮来感知音调的基本任务中也观察到类似的模式。然而，目前尚不清楚这些模式是否在个体之间是一致的，还是因人而异。受一项精神分裂症研究的脑电图（EEG）数据的启发，我们开发了一种广义功能线性混合模型（GFLMM），通过纳入与多个功能预测因子相关的受试者特异性功能随机效应，用于重复测量。为了评估这些功能效应的重要性，我们采用了两种不同的多元功能主成分分析方法，将GFLMM转换为传统的广义线性混合模型，从而便于在标准软件中实现。此外，我们引入了一种尖端的测试方法，利用工作反应来检测受试者特定和预测者特定的功能随机效应。蒙特卡罗仿真研究证明了我们所提出的测试方法的有效性。将所提出的方法应用于精神分裂症数据，揭示了人类大脑额叶区（Fz）和中央区（Cz）活动的显著主体特异性影响，为从健康对照到被诊断为精神分裂症的个体之间的潜在差异提供了有价值的见解。

{"title":"Unveiling Schizophrenia: a study with generalized functional linear mixed model via the investigation of functional random effects.","authors":"Rongxiang Rui, Wei Xiong, Jianxin Pan, Maozai Tian","doi":"10.1093/biostatistics/kxae049","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae049","url":null,"abstract":"Previous studies have identified attenuated pre-speech activity and speech sound suppression in individuals with Schizophrenia, with similar patterns observed in basic tasks entailing button-pressing to perceive a tone. However, it remains unclear whether these patterns are uniform across individuals or vary from person to person. Motivated by electroencephalographic (EEG) data from a Schizophrenia study, we develop a generalized functional linear mixed model (GFLMM) for repeated measurements by incorporating subject-specific functional random effects associated with multiple functional predictors. To assess the significance of these functional effects, we employ two different multivariate functional principal component analysis methods, which transform the GFLMM into a conventional generalized linear mixed model, thereby facilitating its implementation with standard software. Furthermore, we introduce a cutting-edge testing approach utilizing working responses to detect both subject-specific and predictor-specific functional random effects. Monte Carlo simulation studies demonstrate the effectiveness of our proposed testing method. Application of the proposed methods to the Schizophrenia data reveals significant subject-specific effects of human brain activity in the frontal zone (Fz) and the central zone (Cz), providing valuable insights into the potential variations among individuals, from healthy controls to those diagnosed with Schizophrenia.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incorporating prior information in gene expression network-based cancer heterogeneity analysis. 在基于基因表达网络的癌症异质性分析中纳入先验信息。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae028

Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma

Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.

癌症具有分子异质性，看似相似的患者具有不同的分子图谱，因此临床表现也不尽相同。最近的研究表明，基因表达网络比一些简单的测量方法更能有效地分析癌症的异质性。基因之间的相互联系可分为 "直接 "和 "间接 "两种，后者可能是由共享的基因组调控因子（如转录因子、microRNA 和其他调控分子）和其他机制造成的。有人认为，将基因表达的调控因子纳入网络分析并关注直接的相互联系，可以加深对更本质的基因相互联系的理解。这种分析可能会受到大量参数（由网络分析、纳入调控因子和异质性共同造成）和信号通常较弱的严重挑战。为有效解决这一问题，我们建议将已发表文献中包含的先验信息纳入其中。一个关键的挑战是，这些先验信息可能是片面的，甚至是错误的。我们开发了一种两步程序，可以灵活地适应不同程度的先验信息质量。仿真证明了所提方法的有效性及其优于相关竞争者的优势。在对乳腺癌数据集的分析中，我们得出了与其他方法不同的结论，而且所确定的样本亚群具有重要的临床差异。

{"title":"Incorporating prior information in gene expression network-based cancer heterogeneity analysis.","authors":"Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma","doi":"10.1093/biostatistics/kxae028","DOIUrl":"10.1093/biostatistics/kxae028","url":null,"abstract":"Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as \"direct\" and \"indirect,\" where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A marginal structural model for normal tissue complication probability. 正常组织并发症概率的边际结构模型。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae019

Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela

The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modeling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.

癌症放射治疗的目标是将规定的放射剂量输送到肿瘤，同时尽量减少对周围健康组织的剂量。为了评估治疗计划，通常将健康器官的剂量分布总结为剂量-体积直方图（DVH）。正常组织并发症概率（NTCP）建模的核心是利用从剂量-体积直方图中提取的特征进行患者层面的风险预测，但很少有人考虑采用因果框架来评估替代治疗方案的安全性。我们提出了基于确定性和随机性干预的 NTCP 因果估计值，并提出了基于边际结构模型的估计值，这些模型在剂量、容量和毒性风险之间施加了双变量单调性。通过模拟研究了这些估计器的特性，并以肛管癌患者的放疗治疗为例说明了它们的应用。

引用次数: 0

Recoverability of causal effects under presence of missing data: a longitudinal case study. 数据缺失情况下因果效应的可恢复性：纵向案例研究。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae044

Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker

Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a "closed missingness mechanism": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.

多个变量的缺失数据是一个常见问题。我们研究了处理缺失数据的图形模型框架在一项复杂的纵向药理学研究中的适用性，该研究是 CHAPAS-3 试验的一部分，研究对象是接受以依非韦伦为基础的方案治疗的 HIV 感染儿童。具体来说，我们研究了通过对多个连续变量的静态干预所确定的相关因果效应是否可以仅从现有数据中恢复（一致估计）。到目前为止，还没有可用来决定可恢复性的通用算法，必须根据具体情况做出决定。我们强调了可恢复性对图结构中最小变化的敏感性，并介绍了 CHAPAS-3 研究中三个可信的缺失指向无环图（m-DAG）的可恢复性结果，这些结果是以临床知识为基础的。此外，我们还提出了 "封闭缺失机制 "的概念：如果缺失数据是基于这种机制产生的，那么即使数据不是随机缺失，也可以通过可用的病例分析对任何统计或因果估计进行一致的估计。模拟和理论考虑都表明，在我们研究的假定 MNAR 设置中，完整或可用案例分析如何优于多重估算，估算结果因假定的缺失 DAG 而异。我们的分析展示了缺失 DAG 在复杂的纵向真实世界数据中的创新应用，同时强调了结果对假定因果模型的敏感性。

{"title":"Recoverability of causal effects under presence of missing data: a longitudinal case study.","authors":"Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker","doi":"10.1093/biostatistics/kxae044","DOIUrl":"10.1093/biostatistics/kxae044","url":null,"abstract":"Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a \"closed missingness mechanism\": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bipartite interference and air pollution transport: estimating health effects of power plant interventions.

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae051

Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere

Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a "key-associated" (or "individual") treatment and an "upwind" (or "neighborhood") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).

{"title":"Bipartite interference and air pollution transport: estimating health effects of power plant interventions.","authors":"Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere","doi":"10.1093/biostatistics/kxae051","DOIUrl":"10.1093/biostatistics/kxae051","url":null,"abstract":"Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a \"key-associated\" (or \"individual\") treatment and an \"upwind\" (or \"neighborhood\") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Random forest for dynamic risk prediction of recurrent events: a pseudo-observation approach.

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf007

Abigail Loe, Susan Murray, Zhenke Wu

Recurrent events are common in clinical, healthcare, social, and behavioral studies, yet methods for dynamic risk prediction of these events are limited. To overcome some long-standing challenges in analyzing censored recurrent event data, a recent regression analysis framework constructs a censored longitudinal dataset consisting of times to the first recurrent event in multiple pre-specified follow-up windows of length $ tau $(XMT models). Traditional regression models struggle with nonlinear and multiway interactions, with success depending on the skill of the statistical programmer. With a staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest regression are growing in popularity, as they can nonparametrically incorporate information from many predictors with nonlinear and multiway interactions involved in prediction. In this article, we (i) develop a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $ tau $-duration follow-up period from a reconstructed censored longitudinal data set, (ii) modify the XMT regression approach to predict these same probabilities, subject to the limitations that traditional regression models typically have, and (iii) demonstrate how to incorporate patient-specific history of recurrent events for prediction in settings where this information may be partially missing. We show the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $ tau $-duration follow-up window when compared to our modified XMT method for prediction in settings where association between predictors and recurrent event outcomes is complex in nature. We also show the importance of incorporating past recurrent event history in prediction algorithms when event times are correlated within a subject. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the trial of Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease.

{"title":"Random forest for dynamic risk prediction of recurrent events: a pseudo-observation approach.","authors":"Abigail Loe, Susan Murray, Zhenke Wu","doi":"10.1093/biostatistics/kxaf007","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf007","url":null,"abstract":"Recurrent events are common in clinical, healthcare, social, and behavioral studies, yet methods for dynamic risk prediction of these events are limited. To overcome some long-standing challenges in analyzing censored recurrent event data, a recent regression analysis framework constructs a censored longitudinal dataset consisting of times to the first recurrent event in multiple pre-specified follow-up windows of length $ tau $(XMT models). Traditional regression models struggle with nonlinear and multiway interactions, with success depending on the skill of the statistical programmer. With a staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest regression are growing in popularity, as they can nonparametrically incorporate information from many predictors with nonlinear and multiway interactions involved in prediction. In this article, we (i) develop a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $ tau $-duration follow-up period from a reconstructed censored longitudinal data set, (ii) modify the XMT regression approach to predict these same probabilities, subject to the limitations that traditional regression models typically have, and (iii) demonstrate how to incorporate patient-specific history of recurrent events for prediction in settings where this information may be partially missing. We show the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $ tau $-duration follow-up window when compared to our modified XMT method for prediction in settings where association between predictors and recurrent event outcomes is complex in nature. We also show the importance of incorporating past recurrent event history in prediction algorithms when event times are correlated within a subject. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the trial of Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shared parameter modeling of longitudinal data allowing for possibly informative visiting process and terminal event. 纵向数据的共享参数建模，允许可能有信息的访问过程和终端事件。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae041

Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi

Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.

纵向数据和时间到事件数据的联合建模，特别是通过共享参数模型（SPM），是处理具有信息性终端事件的纵向标记数据的常用方法。在这种情况下，一个关键但经常被忽视的假设是，访问/观测过程是非信息性的，完全依赖于过去的标记值和访问时间。当这一假设失效时，访问过程就变成了信息过程，从而可能导致 SPM 估计值出现偏差。现有方法一般依赖于条件独立性假设，即在共享或相关随机效应下，标记模型、访问过程和时间到事件模型是独立的。此外，这些方法通常建立在使用日历时间的基于强度的访问过程之上。本研究引入了一种统一的方法，以竞争风险的形式对正态分布的标记、访问过程和时间到事件数据进行联合建模。我们的模型以观察到的标记值历史、之前的访问时间、标记的随机效应以及可能独立于随机效应的虚弱项为条件。虽然我们的方法与共享参数框架一致，但并不假定过程之间的条件独立性。此外，探视过程既可以通过比例危险模型在间隙时间尺度上定义，也可以通过比例强度模型在日历时间尺度上定义。通过大量的模拟研究，我们评估了我们提出的方法的性能。我们证明，忽略信息丰富的访问过程会导致标记估计值严重偏差。然而，对访问过程的错误描述也会导致有偏差的估计。与基于强度的模型相比，间隙时间模型在访问过程被错误定义时表现出更强的稳健性。一般来说，用先前的访问历史来丰富访问过程可以提高性能。我们进一步将我们的方法应用于艾滋病的真实纵向数据，在这些数据中，不同个体的访问频率存在很大差异。

{"title":"Shared parameter modeling of longitudinal data allowing for possibly informative visiting process and terminal event.","authors":"Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi","doi":"10.1093/biostatistics/kxae041","DOIUrl":"10.1093/biostatistics/kxae041","url":null,"abstract":"Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation and inference for causal spillover effects in egocentric-network randomized trials in the presence of network membership misclassification.

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf009

Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere

To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.

{"title":"Estimation and inference for causal spillover effects in egocentric-network randomized trials in the presence of network membership misclassification.","authors":"Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere","doi":"10.1093/biostatistics/kxaf009","DOIUrl":"10.1093/biostatistics/kxaf009","url":null,"abstract":"To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional quantile principal component analysis. 功能量化主成分分析

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae040

Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith

This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.

本文介绍了功能量化主成分分析（FQPCA），这是一种降维技术，它将功能主成分分析（FPCA）的概念扩展到了对特定参与者量化曲线的研究。我们的方法借用不同参与者的力量来估计量化曲线的模式，并使用参与者层面的数据来估计这些模式的载荷。因此，FQPCA 能够捕捉到数据规模和分布中影响参与者水平量化曲线的变化，也是一种适用于处理异常值、异方差数据或倾斜数据的稳健方法。使用可穿戴设备收集的身体活动数据就说明了对这种方法的需求。参与者的体力活动行为在时间和强度上往往各不相同，要想对昼夜活动模式进行稳健的量化，就必须捕捉 FPCA 生成的参与者级预期值曲线以外的信息。我们使用美国国家健康与营养调查的加速度计数据来说明我们的方法，并生成了参与者水平的 10%、50% 和 90% 的 24 小时活动量定量曲线。我们提出的方法得到了模拟结果的支持，并以 R 软件包的形式提供。

{"title":"Functional quantile principal component analysis.","authors":"Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith","doi":"10.1093/biostatistics/kxae040","DOIUrl":"10.1093/biostatistics/kxae040","url":null,"abstract":"This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae052

Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler

The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.

{"title":"Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.","authors":"Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler","doi":"10.1093/biostatistics/kxae052","DOIUrl":"10.1093/biostatistics/kxae052","url":null,"abstract":"The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0