Pub Date : 2026-01-20DOI: 10.1093/biostatistics/kxaf051
Bonnie B Smith, Abhirup Datta, Brian Caffo
A number of domains in biomedical research use data with a large number of predictors all representing the same type of measurement. Often, an important summary is the within-person distribution of these predictors. Here we focus on settings where the mean relationship between outcome and predictors is fully captured by this distribution and, more generally, on problems where the goal is to learn a mapping that is invariant under permutations of the input vector. We compare unstructured neural networks, which do not explicitly incorporate the permutation invariance property, versus networks that we call ordered predictors neural networks. We show in simulations that the unstructured deep learning approach can yield higher prediction errors, compared to the approach that explicitly leverages the invariance to simplify the learning task. Additionally, in the context of neural Bayes estimation, in which neural networks are used to construct point estimators, we show that ordered predictors neural networks can yield substantially more precise estimators. We therefore recommend that, when permutation invariance is known or suspected to hold, investigators use a learning or statistical modeling approach that can leverage the invariance, rather than an unstructured deep learning approach.
{"title":"Shortcomings of deep learning for distributional predictors: a note.","authors":"Bonnie B Smith, Abhirup Datta, Brian Caffo","doi":"10.1093/biostatistics/kxaf051","DOIUrl":"10.1093/biostatistics/kxaf051","url":null,"abstract":"<p><p>A number of domains in biomedical research use data with a large number of predictors all representing the same type of measurement. Often, an important summary is the within-person distribution of these predictors. Here we focus on settings where the mean relationship between outcome and predictors is fully captured by this distribution and, more generally, on problems where the goal is to learn a mapping that is invariant under permutations of the input vector. We compare unstructured neural networks, which do not explicitly incorporate the permutation invariance property, versus networks that we call ordered predictors neural networks. We show in simulations that the unstructured deep learning approach can yield higher prediction errors, compared to the approach that explicitly leverages the invariance to simplify the learning task. Additionally, in the context of neural Bayes estimation, in which neural networks are used to construct point estimators, we show that ordered predictors neural networks can yield substantially more precise estimators. We therefore recommend that, when permutation invariance is known or suspected to hold, investigators use a learning or statistical modeling approach that can leverage the invariance, rather than an unstructured deep learning approach.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"27 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1093/biostatistics/kxaf049
Alvin Sheng, Brian J Reich, Ana-Maria Staicu, Santhoshi N Krishnan, Arvind Rao, Timothy L Frankel
Recent advances in multiplex imaging have enabled researchers to locate different types of cells within a tissue sample. This is especially relevant for tumor immunology, as clinical regimes corresponding to different stages of disease or responses to treatment may manifest as different spatial arrangements of tumor and immune cells. Spatial point pattern modeling can be used to partition multiplex tissue images according to these regimes. To this end, we propose a two-stage approach: first, local intensities and pair correlation functions are estimated from the spatial point pattern of cells within each image, and the pair correlation functions are reduced in dimension via spectral decomposition of the covariance function. Second, the estimates are clustered in a Bayesian hierarchical model with spatially-dependent cluster labels. The clusters correspond to regimes of interest that are present across subjects; the cluster labels segment the spatial point patterns according to those regimes. Through Markov Chain Monte Carlo sampling, we jointly estimate and quantify uncertainty in the cluster assignment and spatial characteristics of each cluster. Simulations demonstrate the performance of the method, and it is applied to a set of multiplex immunofluorescence images of diseased pancreatic tissue.
{"title":"A two-stage approach for segmenting spatial point patterns applied to multiplex imaging.","authors":"Alvin Sheng, Brian J Reich, Ana-Maria Staicu, Santhoshi N Krishnan, Arvind Rao, Timothy L Frankel","doi":"10.1093/biostatistics/kxaf049","DOIUrl":"10.1093/biostatistics/kxaf049","url":null,"abstract":"<p><p>Recent advances in multiplex imaging have enabled researchers to locate different types of cells within a tissue sample. This is especially relevant for tumor immunology, as clinical regimes corresponding to different stages of disease or responses to treatment may manifest as different spatial arrangements of tumor and immune cells. Spatial point pattern modeling can be used to partition multiplex tissue images according to these regimes. To this end, we propose a two-stage approach: first, local intensities and pair correlation functions are estimated from the spatial point pattern of cells within each image, and the pair correlation functions are reduced in dimension via spectral decomposition of the covariance function. Second, the estimates are clustered in a Bayesian hierarchical model with spatially-dependent cluster labels. The clusters correspond to regimes of interest that are present across subjects; the cluster labels segment the spatial point patterns according to those regimes. Through Markov Chain Monte Carlo sampling, we jointly estimate and quantify uncertainty in the cluster assignment and spatial characteristics of each cluster. Simulations demonstrate the performance of the method, and it is applied to a set of multiplex immunofluorescence images of diseased pancreatic tissue.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"27 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1093/biostatistics/kxaf052
Jessie K Edwards, Stephen R Cole, Paul N Zivich, Benjamin Ackerman, Sonia Napravnik, Heather Henderson, Timothy Lash, Bonnie E Shook-Sa
Mortality risk estimated from studies that ascertain date of death through linkage to vital statistics registries may be subject to outcome measurement error. As a result, some deaths among study participants may not be captured, some study participants who are alive may be falsely categorized as deceased, and some deaths may be recorded at incorrect times, leading to bias in estimates of mortality risk and survival. Here, we illustrate an extension of the Rogan-Gladen estimator to account for outcome measurement error in risk and survival functions in settings with right censoring. As a motivating application, we consider and account for outcome measurement error that could be induced by incomplete and/or incorrect linkage to death registries when estimating mortality risk among people entering care for HIV in the University of North Carolina Center for AIDS Research HIV Clinical Cohort between 2001 and 2022. A series of simulation studies demonstrates that the approach performed well even when participants selected into the validation study were at higher mortality risk than the main study. The proposed approach may be parameterized using internal or external validation data or used as a form of quantitative bias analysis.
{"title":"Risk functions with outcome measurement error.","authors":"Jessie K Edwards, Stephen R Cole, Paul N Zivich, Benjamin Ackerman, Sonia Napravnik, Heather Henderson, Timothy Lash, Bonnie E Shook-Sa","doi":"10.1093/biostatistics/kxaf052","DOIUrl":"10.1093/biostatistics/kxaf052","url":null,"abstract":"<p><p>Mortality risk estimated from studies that ascertain date of death through linkage to vital statistics registries may be subject to outcome measurement error. As a result, some deaths among study participants may not be captured, some study participants who are alive may be falsely categorized as deceased, and some deaths may be recorded at incorrect times, leading to bias in estimates of mortality risk and survival. Here, we illustrate an extension of the Rogan-Gladen estimator to account for outcome measurement error in risk and survival functions in settings with right censoring. As a motivating application, we consider and account for outcome measurement error that could be induced by incomplete and/or incorrect linkage to death registries when estimating mortality risk among people entering care for HIV in the University of North Carolina Center for AIDS Research HIV Clinical Cohort between 2001 and 2022. A series of simulation studies demonstrates that the approach performed well even when participants selected into the validation study were at higher mortality risk than the main study. The proposed approach may be parameterized using internal or external validation data or used as a form of quantitative bias analysis.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"27 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1093/biostatistics/kxaf050
Huaqing Jin, Fei Jiang
Alzheimer's disease (AD) is a progressive, chronic neurodegenerative disorder affecting millions worldwide. A new clinical magnetoencephalography (MEG) study was conducted to identify neural activity biomarkers and key brain regions in AD. Traditional methods for analyzing MEG data, which typically extract features from power spectral density, suffer from information loss. Furthermore, functional regression with variable selection tends to produce non-robust results, making it less ideal for drawing reliable scientific conclusions. To address these challenges, we propose a high-dimensional hypothesis testing (HDHT) framework for functional covariates and introduce a rigorous inference process to support scientific conclusions. We establish the theoretical properties of the HDHT framework and validate its performance through simulation studies. Applying the HDHT framework to the AD MEG data, we identify 19 important regions associated with cognitive functions that align with established AD pathophysiology. These findings suggest that the non-invasive MEG can be a potential low-risk and low-toxicity modality for monitoring neurodegenerative progression.
{"title":"High-dimensional inference for functional regression with an application to the Alzheimer's disease magnetoencephalography study.","authors":"Huaqing Jin, Fei Jiang","doi":"10.1093/biostatistics/kxaf050","DOIUrl":"10.1093/biostatistics/kxaf050","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a progressive, chronic neurodegenerative disorder affecting millions worldwide. A new clinical magnetoencephalography (MEG) study was conducted to identify neural activity biomarkers and key brain regions in AD. Traditional methods for analyzing MEG data, which typically extract features from power spectral density, suffer from information loss. Furthermore, functional regression with variable selection tends to produce non-robust results, making it less ideal for drawing reliable scientific conclusions. To address these challenges, we propose a high-dimensional hypothesis testing (HDHT) framework for functional covariates and introduce a rigorous inference process to support scientific conclusions. We establish the theoretical properties of the HDHT framework and validate its performance through simulation studies. Applying the HDHT framework to the AD MEG data, we identify 19 important regions associated with cognitive functions that align with established AD pathophysiology. These findings suggest that the non-invasive MEG can be a potential low-risk and low-toxicity modality for monitoring neurodegenerative progression.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12728160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf007
Abigail Loe, Susan Murray, Zhenke Wu
Recurrent events are common in clinical, healthcare, social, and behavioral studies, yet methods for dynamic risk prediction of these events are limited. To overcome some long-standing challenges in analyzing censored recurrent event data, a recent regression analysis framework constructs a censored longitudinal dataset consisting of times to the first recurrent event in multiple pre-specified follow-up windows of length $ tau $(XMT models). Traditional regression models struggle with nonlinear and multiway interactions, with success depending on the skill of the statistical programmer. With a staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest regression are growing in popularity, as they can nonparametrically incorporate information from many predictors with nonlinear and multiway interactions involved in prediction. In this article, we (i) develop a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $ tau $-duration follow-up period from a reconstructed censored longitudinal data set, (ii) modify the XMT regression approach to predict these same probabilities, subject to the limitations that traditional regression models typically have, and (iii) demonstrate how to incorporate patient-specific history of recurrent events for prediction in settings where this information may be partially missing. We show the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $ tau $-duration follow-up window when compared to our modified XMT method for prediction in settings where association between predictors and recurrent event outcomes is complex in nature. We also show the importance of incorporating past recurrent event history in prediction algorithms when event times are correlated within a subject. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the trial of Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease.
复发事件在临床、医疗保健、社会和行为研究中很常见,但这些事件的动态风险预测方法有限。为了克服一些长期存在的问题,最近的回归分析框架构建了一个经过审查的纵向数据集,该数据集由多个预先指定的长度为$ tau $的后续窗口(XMT模型)中的第一个重复事件的时间组成。传统的回归模型与非线性和多方向的相互作用作斗争,其成功取决于统计程序员的技能。随着从遗传、基因组学和电子健康记录来源生成的潜在预测因子数量惊人,随机森林回归等机器学习方法越来越受欢迎,因为它们可以将来自许多预测因子的信息与预测中涉及的非线性和多向交互非参数化地结合起来。在本文中,我们(i)开发了一种随机森林方法,用于从重建的经审查的纵向数据集动态预测后续$ tau $持续时间随访期间剩余无事件的概率,(ii)修改XMT回归方法来预测这些相同的概率,但受传统回归模型通常具有的局限性的限制。(iii)演示如何在可能部分缺少这些信息的情况下,将患者特定的复发事件历史纳入预测。与改进的XMT方法相比,我们的随机森林算法在预测因子和复发事件结果之间的关联本质上是复杂的情况下,预测在$ tau $持续时间的随访窗口内剩余事件无概率的能力有所提高。我们还展示了当事件时间在主题内相关时,在预测算法中纳入过去循环事件历史的重要性。该随机森林算法使用阿奇霉素预防慢性阻塞性肺疾病加重试验的复发性加重数据进行了验证。
{"title":"Random forest for dynamic risk prediction of recurrent events: a pseudo-observation approach.","authors":"Abigail Loe, Susan Murray, Zhenke Wu","doi":"10.1093/biostatistics/kxaf007","DOIUrl":"10.1093/biostatistics/kxaf007","url":null,"abstract":"<p><p>Recurrent events are common in clinical, healthcare, social, and behavioral studies, yet methods for dynamic risk prediction of these events are limited. To overcome some long-standing challenges in analyzing censored recurrent event data, a recent regression analysis framework constructs a censored longitudinal dataset consisting of times to the first recurrent event in multiple pre-specified follow-up windows of length $ tau $(XMT models). Traditional regression models struggle with nonlinear and multiway interactions, with success depending on the skill of the statistical programmer. With a staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest regression are growing in popularity, as they can nonparametrically incorporate information from many predictors with nonlinear and multiway interactions involved in prediction. In this article, we (i) develop a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $ tau $-duration follow-up period from a reconstructed censored longitudinal data set, (ii) modify the XMT regression approach to predict these same probabilities, subject to the limitations that traditional regression models typically have, and (iii) demonstrate how to incorporate patient-specific history of recurrent events for prediction in settings where this information may be partially missing. We show the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $ tau $-duration follow-up window when compared to our modified XMT method for prediction in settings where association between predictors and recurrent event outcomes is complex in nature. We also show the importance of incorporating past recurrent event history in prediction algorithms when event times are correlated within a subject. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the trial of Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae019
Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela
The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modeling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.
{"title":"A marginal structural model for normal tissue complication probability.","authors":"Thai-Son Tang, Zhihui Liu, Ali Hosni, John Kim, Olli Saarela","doi":"10.1093/biostatistics/kxae019","DOIUrl":"10.1093/biostatistics/kxae019","url":null,"abstract":"<p><p>The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modeling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous studies have identified attenuated pre-speech activity and speech sound suppression in individuals with Schizophrenia, with similar patterns observed in basic tasks entailing button-pressing to perceive a tone. However, it remains unclear whether these patterns are uniform across individuals or vary from person to person. Motivated by electroencephalographic (EEG) data from a Schizophrenia study, we develop a generalized functional linear mixed model (GFLMM) for repeated measurements by incorporating subject-specific functional random effects associated with multiple functional predictors. To assess the significance of these functional effects, we employ two different multivariate functional principal component analysis methods, which transform the GFLMM into a conventional generalized linear mixed model, thereby facilitating its implementation with standard software. Furthermore, we introduce a cutting-edge testing approach utilizing working responses to detect both subject-specific and predictor-specific functional random effects. Monte Carlo simulation studies demonstrate the effectiveness of our proposed testing method. Application of the proposed methods to the Schizophrenia data reveals significant subject-specific effects of human brain activity in the frontal zone (Fz) and the central zone (Cz), providing valuable insights into the potential variations among individuals, from healthy controls to those diagnosed with Schizophrenia.
{"title":"Unveiling Schizophrenia: a study with generalized functional linear mixed model via the investigation of functional random effects.","authors":"Rongxiang Rui, Wei Xiong, Jianxin Pan, Maozai Tian","doi":"10.1093/biostatistics/kxae049","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae049","url":null,"abstract":"<p><p>Previous studies have identified attenuated pre-speech activity and speech sound suppression in individuals with Schizophrenia, with similar patterns observed in basic tasks entailing button-pressing to perceive a tone. However, it remains unclear whether these patterns are uniform across individuals or vary from person to person. Motivated by electroencephalographic (EEG) data from a Schizophrenia study, we develop a generalized functional linear mixed model (GFLMM) for repeated measurements by incorporating subject-specific functional random effects associated with multiple functional predictors. To assess the significance of these functional effects, we employ two different multivariate functional principal component analysis methods, which transform the GFLMM into a conventional generalized linear mixed model, thereby facilitating its implementation with standard software. Furthermore, we introduce a cutting-edge testing approach utilizing working responses to detect both subject-specific and predictor-specific functional random effects. Monte Carlo simulation studies demonstrate the effectiveness of our proposed testing method. Application of the proposed methods to the Schizophrenia data reveals significant subject-specific effects of human brain activity in the frontal zone (Fz) and the central zone (Cz), providing valuable insights into the potential variations among individuals, from healthy controls to those diagnosed with Schizophrenia.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf011
Iuliana Ciocănea-Teodorescu, Erin E Gabriel, Arvid Sjölander
For a comprehensive understanding of the effect of a given treatment on an outcome of interest, quantification of individual treatment heterogeneity is essential, alongside estimation of the average causal effect. However, even in randomized controlled trials, quantities such as the probability of benefit or the probability of harm are not identifiable, since multiple potential outcomes cannot be observed simultaneously for the same individual. We propose a sensitivity analysis for the probability of benefit in randomized controlled trial settings with a binary treatment and a binary outcome, by quantifying the deviation from conditional independence of the two potential outcomes, given a set of measured prognostic baseline covariates. We do this using a marginal sensitivity analysis parameter that does not depend on the number or complexity of the measured covariates. We provide a guide to estimation and interpretation, and illustrate our method in simulations, as well as using a real data example from a randomized controlled trial studying the effect of umbilical vein oxytocin administration on the need for manual removal of the placenta during birth.
{"title":"Sensitivity analysis for the probability of benefit in randomized controlled trials with a binary treatment and a binary outcome.","authors":"Iuliana Ciocănea-Teodorescu, Erin E Gabriel, Arvid Sjölander","doi":"10.1093/biostatistics/kxaf011","DOIUrl":"10.1093/biostatistics/kxaf011","url":null,"abstract":"<p><p>For a comprehensive understanding of the effect of a given treatment on an outcome of interest, quantification of individual treatment heterogeneity is essential, alongside estimation of the average causal effect. However, even in randomized controlled trials, quantities such as the probability of benefit or the probability of harm are not identifiable, since multiple potential outcomes cannot be observed simultaneously for the same individual. We propose a sensitivity analysis for the probability of benefit in randomized controlled trial settings with a binary treatment and a binary outcome, by quantifying the deviation from conditional independence of the two potential outcomes, given a set of measured prognostic baseline covariates. We do this using a marginal sensitivity analysis parameter that does not depend on the number or complexity of the measured covariates. We provide a guide to estimation and interpretation, and illustrate our method in simulations, as well as using a real data example from a randomized controlled trial studying the effect of umbilical vein oxytocin administration on the need for manual removal of the placenta during birth.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12129078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf009
Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere
To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.
{"title":"Estimation and inference for causal spillover effects in egocentric-network randomized trials in the presence of network membership misclassification.","authors":"Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere","doi":"10.1093/biostatistics/kxaf009","DOIUrl":"10.1093/biostatistics/kxaf009","url":null,"abstract":"<p><p>To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf044
Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong
Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one "explained disparity," the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.
{"title":"Decomposition of longitudinal disparities: an application to the fetal growth-singletons study.","authors":"Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong","doi":"10.1093/biostatistics/kxaf044","DOIUrl":"10.1093/biostatistics/kxaf044","url":null,"abstract":"<p><p>Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one \"explained disparity,\" the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}