Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf045
Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang
A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.
疫苗研究的一个关键目标是评估疫苗诱导的免疫原性,并确定参与者是否对疫苗产生了反应。细胞免疫应答对于评估疫苗诱导的免疫原性至关重要,单细胞试验,如细胞内细胞因子染色(ICS)和b细胞表型(BCP),通常用于分析个体免疫细胞表型及其在刺激后产生的细胞因子。在本文中,我们介绍了一种新的统计框架,用于使用接种前后收集的ICS数据来识别疫苗应答者。该框架纳入了成对对照数据,以解释分析运行之间潜在的意外变化,例如批量效应,这可能导致将参与者错误分类为疫苗应答者或无应答者。为了正式整合成对对照数据,以解释不同时间点(即接种疫苗之前和之后)的检测变化,我们提出的框架计算并报告两个P值,它们都对成对对照数据进行了调整,但方式不同:(i)最大调整的$ P $值,它对未调整的$ P $值应用最保守的调整,确保与成对对照样本数据一致的所有似是而非的批效应的有效性;(ii)最小调整的$ P $值,它只对未调整的$ P $值施加最小的调整,这样调整后的$ P $值就不会被成对对照样本的数据伪造。最小和最大调整的$ P $值提供了一种平衡的方法来管理第一类错误率和存在批处理效应的统计能力。我们应用这一框架分析了COVID-19预防网络(CoVPN) 3008研究在基线和接种疫苗后4周收集的ICS数据。我们的分析有助于解决两个临床问题:(i)哪些参与者在接受一次疫苗系列的最后剂量后的基线和4周之间表现出意外的Omicron感染的证据,以及(ii)哪些参与者表现出疫苗诱导的针对Omicron BA.4/5刺突蛋白的T细胞反应。
{"title":"Determining vaccine responders in the presence of baseline immunity using single-cell assays and paired control samples.","authors":"Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang","doi":"10.1093/biostatistics/kxaf045","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf045","url":null,"abstract":"<p><p>A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf044
Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong
Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one "explained disparity," the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.
{"title":"Decomposition of longitudinal disparities: an application to the fetal growth-singletons study.","authors":"Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong","doi":"10.1093/biostatistics/kxaf044","DOIUrl":"10.1093/biostatistics/kxaf044","url":null,"abstract":"<p><p>Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one \"explained disparity,\" the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae052
Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler
The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.
{"title":"Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.","authors":"Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler","doi":"10.1093/biostatistics/kxae052","DOIUrl":"10.1093/biostatistics/kxae052","url":null,"abstract":"<p><p>The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf004
Yixi Xu, Yi Zhao
This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.
{"title":"Mediation analysis with graph mediator.","authors":"Yixi Xu, Yi Zhao","doi":"10.1093/biostatistics/kxaf004","DOIUrl":"10.1093/biostatistics/kxaf004","url":null,"abstract":"<p><p>This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11979487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf034
Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho
Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.
{"title":"Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases.","authors":"Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho","doi":"10.1093/biostatistics/kxaf034","DOIUrl":"10.1093/biostatistics/kxaf034","url":null,"abstract":"<p><p>Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf033
Emily Somerset, Justin J Slater, Patrick E Brown
We introduce a hierarchical Bayesian framework for reconstructing epidemic curves using under-reported case counts and wastewater data. Our approach models wastewater signals as differentiable Gaussian processes, enabling inference on their relative growth rates, which are used to define a wastewater-based reproduction rate. These estimates are incorporated into a binomially thinned Poisson autoregressive model for case counts using a modular inference strategy. We apply this framework to reconstruct the Covid-19 epidemic curve in Toronto, validating our model through out-of-sample forecasts and comparisons with independent serosurvey-based cumulative incidence estimates. We also apply the framework to New Zealand's Covid-19 data to reconstruct its epidemic curve and demonstrate improvements over an existing joint model for wastewater and case data. A key advantage of our framework, highlighted in this comparison, is that it does not rely on pre-specified constant parameters, allowing the model to better adapt to evolving pandemic conditions.
{"title":"Wastewater-based reproduction rates for epidemic curve reconstruction.","authors":"Emily Somerset, Justin J Slater, Patrick E Brown","doi":"10.1093/biostatistics/kxaf033","DOIUrl":"10.1093/biostatistics/kxaf033","url":null,"abstract":"<p><p>We introduce a hierarchical Bayesian framework for reconstructing epidemic curves using under-reported case counts and wastewater data. Our approach models wastewater signals as differentiable Gaussian processes, enabling inference on their relative growth rates, which are used to define a wastewater-based reproduction rate. These estimates are incorporated into a binomially thinned Poisson autoregressive model for case counts using a modular inference strategy. We apply this framework to reconstruct the Covid-19 epidemic curve in Toronto, validating our model through out-of-sample forecasts and comparisons with independent serosurvey-based cumulative incidence estimates. We also apply the framework to New Zealand's Covid-19 data to reconstruct its epidemic curve and demonstrate improvements over an existing joint model for wastewater and case data. A key advantage of our framework, highlighted in this comparison, is that it does not rely on pre-specified constant parameters, allowing the model to better adapt to evolving pandemic conditions.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12533577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145314108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae040
Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith
This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.
{"title":"Functional quantile principal component analysis.","authors":"Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith","doi":"10.1093/biostatistics/kxae040","DOIUrl":"10.1093/biostatistics/kxae040","url":null,"abstract":"<p><p>This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf035
Luka Kovačević, Weishi Chen, Helen Barnett, Thomas Jaki, Pavel Mozgunov
Phase I clinical trials are essential to bringing novel therapies from chemical development to widespread use. Traditional approaches to dose-finding in Phase I trials, such as the '3 + 3' method and the continual reassessment method (CRM), provide a principled approach for escalating across dose levels. However, these methods lack the ability to incorporate uncertainty regarding the dose-toxicity ordering as found in combination drug trials. Under this setting, dose levels vary across multiple drugs simultaneously, leading to multiple possible dose-toxicity orderings. The CRM for partial ordering (POCRM) extends to these settings by allowing for multiple dose-toxicity orderings. In this work, it is shown that the POCRM is vulnerable to 'estimation incoherency' whereby toxicity estimates shift in an illogical way, threatening patient safety and undermining clinician trust in dose-finding models. To this end, the Bayesian model averaged POCRM (BMA-POCRM) is formalized. BMA-POCRM uses Bayesian model averaging to take into account all possible orderings simultaneously, reducing the frequency of estimation incoherencies. We derive novel theoretical guarantees on the estimation coherency of the POCRM and BMA-POCRM. The effectiveness of BMA-POCRM in drug combination settings is demonstrated through a specific instance of estimate incoherency of POCRM and simulation studies. The results highlight the improved safety, accuracy, and reduced occurrence of estimate incoherency in trials applying the BMA-POCRM relative to the POCRM model.
{"title":"Bayesian model averaging for partial ordering continual reassessment methods.","authors":"Luka Kovačević, Weishi Chen, Helen Barnett, Thomas Jaki, Pavel Mozgunov","doi":"10.1093/biostatistics/kxaf035","DOIUrl":"10.1093/biostatistics/kxaf035","url":null,"abstract":"<p><p>Phase I clinical trials are essential to bringing novel therapies from chemical development to widespread use. Traditional approaches to dose-finding in Phase I trials, such as the '3 + 3' method and the continual reassessment method (CRM), provide a principled approach for escalating across dose levels. However, these methods lack the ability to incorporate uncertainty regarding the dose-toxicity ordering as found in combination drug trials. Under this setting, dose levels vary across multiple drugs simultaneously, leading to multiple possible dose-toxicity orderings. The CRM for partial ordering (POCRM) extends to these settings by allowing for multiple dose-toxicity orderings. In this work, it is shown that the POCRM is vulnerable to 'estimation incoherency' whereby toxicity estimates shift in an illogical way, threatening patient safety and undermining clinician trust in dose-finding models. To this end, the Bayesian model averaged POCRM (BMA-POCRM) is formalized. BMA-POCRM uses Bayesian model averaging to take into account all possible orderings simultaneously, reducing the frequency of estimation incoherencies. We derive novel theoretical guarantees on the estimation coherency of the POCRM and BMA-POCRM. The effectiveness of BMA-POCRM in drug combination settings is demonstrated through a specific instance of estimate incoherency of POCRM and simulation studies. The results highlight the improved safety, accuracy, and reduced occurrence of estimate incoherency in trials applying the BMA-POCRM relative to the POCRM model.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12538209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145338280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In developing risk prediction models for specific diseases, it is essential to evaluate the calibration performance of the prediction model. Various methods have been proposed to assess the calibration of prediction models, but it has been pointed out that conventional methods based on the predicted probability of the model are insufficient to detect miscalibration. Another problem is that a method for evaluating calibration for continuous variables of interest has not yet been established. We therefore propose two methods to evaluate the calibration of the variable of interest: the variable-based probabilistic calibration plot (VPC-Plot), which is a visual assessment, and the variable-based probabilistic calibration error (VPCE), which is a corresponding evaluation metric. We conducted theoretical and simulation studies to investigate the properties and effectiveness of the proposed method. Theoretical and simulation studies demonstrated that the proposed methods can detect miscalibration by evaluating the calibration based on the variable of interest, even when conventional methods fail to detect miscalibration. To show the usefulness in the real-world data analysis, we evaluated diabetes prediction models developed using the national health insurance database for Osaka, Japan. We show that the proposed method can identify miscalibration of key covariate in a diabetes prediction model.
{"title":"Variable-based probabilistic calibration with binary outcome.","authors":"Hiroe Seto, Shuji Kitora, Asuka Oyama, Hiroshi Toki, Ryohei Yamamoto, Michio Yamamoto","doi":"10.1093/biostatistics/kxaf026","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf026","url":null,"abstract":"<p><p>In developing risk prediction models for specific diseases, it is essential to evaluate the calibration performance of the prediction model. Various methods have been proposed to assess the calibration of prediction models, but it has been pointed out that conventional methods based on the predicted probability of the model are insufficient to detect miscalibration. Another problem is that a method for evaluating calibration for continuous variables of interest has not yet been established. We therefore propose two methods to evaluate the calibration of the variable of interest: the variable-based probabilistic calibration plot (VPC-Plot), which is a visual assessment, and the variable-based probabilistic calibration error (VPCE), which is a corresponding evaluation metric. We conducted theoretical and simulation studies to investigate the properties and effectiveness of the proposed method. Theoretical and simulation studies demonstrated that the proposed methods can detect miscalibration by evaluating the calibration based on the variable of interest, even when conventional methods fail to detect miscalibration. To show the usefulness in the real-world data analysis, we evaluated diabetes prediction models developed using the national health insurance database for Osaka, Japan. We show that the proposed method can identify miscalibration of key covariate in a diabetes prediction model.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145314132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf017
Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller
Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction model with index modification that allows for effect modification of a functional predictor by a weighted average of multiple modifiers. Our model allows for simultaneous estimation of modifier index weights and the exposure-time-response function via a spline cross-basis in a Bayesian hierarchical framework. Through simulations, we showed that our model out-performs competing methods when there are multiple modifiers of unknown importance. We applied our proposed method to a Colorado birth cohort to estimate the association between birth weight and air pollution modified by a neighborhood-vulnerability index and to a Mexican birth cohort to estimate the association between birthing-parent cardio-metabolic endpoints and air pollution modified by a birthing-parent lifetime stress index.
{"title":"Distributed lag interaction model with index modification.","authors":"Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller","doi":"10.1093/biostatistics/kxaf017","DOIUrl":"10.1093/biostatistics/kxaf017","url":null,"abstract":"<p><p>Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction model with index modification that allows for effect modification of a functional predictor by a weighted average of multiple modifiers. Our model allows for simultaneous estimation of modifier index weights and the exposure-time-response function via a spline cross-basis in a Bayesian hierarchical framework. Through simulations, we showed that our model out-performs competing methods when there are multiple modifiers of unknown importance. We applied our proposed method to a Colorado birth cohort to estimate the association between birth weight and air pollution modified by a neighborhood-vulnerability index and to a Mexican birth cohort to estimate the association between birthing-parent cardio-metabolic endpoints and air pollution modified by a birthing-parent lifetime stress index.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144369549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}