Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae030
Serge Aleshin-Guendel, Jon Wakefield
The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.
{"title":"Adaptive Gaussian Markov random fields for child mortality estimation.","authors":"Serge Aleshin-Guendel, Jon Wakefield","doi":"10.1093/biostatistics/kxae030","DOIUrl":"10.1093/biostatistics/kxae030","url":null,"abstract":"<p><p>The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae018
Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu
We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.
{"title":"Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity.","authors":"Fan Bu, Allison E Aiello, Alexander Volfovsky, Jason Xu","doi":"10.1093/biostatistics/kxae018","DOIUrl":"10.1093/biostatistics/kxae018","url":null,"abstract":"<p><p>We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a stochastic EM algorithm for inference, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae038
Changwoo J Lee, Elaine Symanski, Amal Rammah, Dong Hun Kang, Philip K Hopke, Eun Sug Park
Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide [NO2])-specific exposures and birth weight of full-term infants born in 2012 in Harris County, Texas, using several approaches, including the newly developed method.
{"title":"A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology.","authors":"Changwoo J Lee, Elaine Symanski, Amal Rammah, Dong Hun Kang, Philip K Hopke, Eun Sug Park","doi":"10.1093/biostatistics/kxae038","DOIUrl":"10.1093/biostatistics/kxae038","url":null,"abstract":"<p><p>Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide [NO2])-specific exposures and birth weight of full-term infants born in 2012 in Harris County, Texas, using several approaches, including the newly developed method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142378644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functional connectivity. Despite an unprecedented opportunity to uncover neurobiological subtypes through clustering or subtyping analyses on multi-state functional connectivity, few existing approaches are applicable to accommodate the network topology and unique biological architecture. To address this issue, we propose an innovative Bayesian nonparametric network-variate clustering analysis to uncover subgroups of individuals with homogeneous brain functional network patterns under multiple cognitive states. In light of the existing neuroscience literature, we assume there are unknown state-specific modular structures within functional connectivity. Concurrently, we identify informative network features essential for defining subtypes. To further facilitate practical use, we develop a computationally efficient variational inference algorithm to approximate posterior inference with satisfactory estimation accuracy. Extensive simulations show the superiority of our method. We apply the method to the Adolescent Brain Cognitive Development (ABCD) study, and identify neurodevelopmental subtypes and brain sub-network phenotypes under each state to signal neurobiological heterogeneity, suggesting promising directions for further exploration and investigation in neuroscience.
{"title":"Bayesian subtyping for multi-state brain functional connectome with application on preadolescent brain cognition.","authors":"Tianqi Chen, Hongyu Zhao, Chichun Tan, Todd Constable, Sarah Yip, Yize Zhao","doi":"10.1093/biostatistics/kxae045","DOIUrl":"10.1093/biostatistics/kxae045","url":null,"abstract":"<p><p>Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functional connectivity. Despite an unprecedented opportunity to uncover neurobiological subtypes through clustering or subtyping analyses on multi-state functional connectivity, few existing approaches are applicable to accommodate the network topology and unique biological architecture. To address this issue, we propose an innovative Bayesian nonparametric network-variate clustering analysis to uncover subgroups of individuals with homogeneous brain functional network patterns under multiple cognitive states. In light of the existing neuroscience literature, we assume there are unknown state-specific modular structures within functional connectivity. Concurrently, we identify informative network features essential for defining subtypes. To further facilitate practical use, we develop a computationally efficient variational inference algorithm to approximate posterior inference with satisfactory estimation accuracy. Extensive simulations show the superiority of our method. We apply the method to the Adolescent Brain Cognitive Development (ABCD) study, and identify neurodevelopmental subtypes and brain sub-network phenotypes under each state to signal neurobiological heterogeneity, suggesting promising directions for further exploration and investigation in neuroscience.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae021
Xiaoyue Xi, Hélène Ruffieux
Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
贝叶斯图模型是推断高维度复杂关系的强大工具,但在计算和统计方面往往充满挑战。如果以有原则的方式加以利用,那么随着主要兴趣数据的收集而不断增加的信息,就有机会通过指导依赖结构的检测来减轻这些困难。例如,基因网络推断可以利用公开的基因变异调控汇总统计数据。在这里,我们提出了一种新颖的高斯图建模框架,用于识别和利用条件独立图中节点的中心性信息。具体来说,我们考虑了一个完全联合的分层模型,以同时推断 (i) 稀疏精度矩阵和 (ii) 节点级信息对揭示所需的网络结构的相关性。我们使用一个关于节点成为枢纽的倾向的尖峰-板块子模型,将这些信息编码为候选辅助变量,从而可以无假设地选择和解释相关变量的稀疏子集。由于现实世界的应用需要对大型后验空间进行有效探索,我们开发了一种变分期望条件最大化算法,可将推理扩展到数百个样本、节点和辅助变量。我们在模拟和基因网络研究中说明并利用了我们方法的优势,该研究确定了与免疫介导疾病相关的生物通路中的枢纽基因。
{"title":"A modeling framework for detecting and leveraging node-level information in Bayesian network inference.","authors":"Xiaoyue Xi, Hélène Ruffieux","doi":"10.1093/biostatistics/kxae021","DOIUrl":"10.1093/biostatistics/kxae021","url":null,"abstract":"<p><p>Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae025
Yihan Bao, Lauren Bell, Elizabeth Williamson, Claire Garnett, Tianchen Qian
Micro-randomized trials are commonly conducted for optimizing mobile health interventions such as push notifications for behavior change. In analyzing such trials, causal excursion effects are often of primary interest, and their estimation typically involves inverse probability weighting (IPW). However, in a micro-randomized trial, additional treatments can often occur during the time window over which an outcome is defined, and this can greatly inflate the variance of the causal effect estimator because IPW would involve a product of numerous weights. To reduce variance and improve estimation efficiency, we propose two new estimators using a modified version of IPW, which we call "per-decision IPW." The second estimator further improves efficiency using the projection idea from the semiparametric efficiency theory. These estimators are applicable when the outcome is binary and can be expressed as the maximum of a series of sub-outcomes defined over sub-intervals of time. We establish the estimators' consistency and asymptotic normality. Through simulation studies and real data applications, we demonstrate substantial efficiency improvement of the proposed estimator over existing estimators. The new estimators can be used to improve the precision of primary and secondary analyses for micro-randomized trials with binary outcomes.
{"title":"Estimating causal effects for binary outcomes using per-decision inverse probability weighting.","authors":"Yihan Bao, Lauren Bell, Elizabeth Williamson, Claire Garnett, Tianchen Qian","doi":"10.1093/biostatistics/kxae025","DOIUrl":"10.1093/biostatistics/kxae025","url":null,"abstract":"<p><p>Micro-randomized trials are commonly conducted for optimizing mobile health interventions such as push notifications for behavior change. In analyzing such trials, causal excursion effects are often of primary interest, and their estimation typically involves inverse probability weighting (IPW). However, in a micro-randomized trial, additional treatments can often occur during the time window over which an outcome is defined, and this can greatly inflate the variance of the causal effect estimator because IPW would involve a product of numerous weights. To reduce variance and improve estimation efficiency, we propose two new estimators using a modified version of IPW, which we call \"per-decision IPW.\" The second estimator further improves efficiency using the projection idea from the semiparametric efficiency theory. These estimators are applicable when the outcome is binary and can be expressed as the maximum of a series of sub-outcomes defined over sub-intervals of time. We establish the estimators' consistency and asymptotic normality. Through simulation studies and real data applications, we demonstrate substantial efficiency improvement of the proposed estimator over existing estimators. The new estimators can be used to improve the precision of primary and secondary analyses for micro-randomized trials with binary outcomes.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Progress in neuroscience has provided unprecedented opportunities to advance our understanding of brain alterations and their correspondence to phenotypic profiles. With data collected from various imaging techniques, studies have integrated different types of information ranging from brain structure, function, or metabolism. More recently, an emerging way to categorize imaging traits is through a metric hierarchy, including localized node-level measurements and interactive network-level metrics. However, limited research has been conducted to integrate these different hierarchies and achieve a better understanding of the neurobiological mechanisms and communications. In this work, we address this literature gap by proposing a Bayesian regression model under both vector-variate and matrix-variate predictors. To characterize the interplay between different predicting components, we propose a set of biologically plausible prior models centered on an innovative joint thresholded prior. This captures the coupling and grouping effect of signal patterns, as well as their spatial contiguity across brain anatomy. By developing a posterior inference, we can identify and quantify the uncertainty of signaling node- and network-level neuromarkers, as well as their predictive mechanism for phenotypic outcomes. Through extensive simulations, we demonstrate that our proposed method outperforms the alternative approaches substantially in both out-of-sample prediction and feature selection. By implementing the model to study children's general mental abilities, we establish a powerful predictive mechanism based on the identified task contrast traits and resting-state sub-networks.
{"title":"Bayesian thresholded modeling for integrating brain node and network predictors.","authors":"Zhe Sun, Wanwan Xu, Tianxi Li, Jian Kang, Gregorio Alanis-Lobato, Yize Zhao","doi":"10.1093/biostatistics/kxae048","DOIUrl":"10.1093/biostatistics/kxae048","url":null,"abstract":"<p><p>Progress in neuroscience has provided unprecedented opportunities to advance our understanding of brain alterations and their correspondence to phenotypic profiles. With data collected from various imaging techniques, studies have integrated different types of information ranging from brain structure, function, or metabolism. More recently, an emerging way to categorize imaging traits is through a metric hierarchy, including localized node-level measurements and interactive network-level metrics. However, limited research has been conducted to integrate these different hierarchies and achieve a better understanding of the neurobiological mechanisms and communications. In this work, we address this literature gap by proposing a Bayesian regression model under both vector-variate and matrix-variate predictors. To characterize the interplay between different predicting components, we propose a set of biologically plausible prior models centered on an innovative joint thresholded prior. This captures the coupling and grouping effect of signal patterns, as well as their spatial contiguity across brain anatomy. By developing a posterior inference, we can identify and quantify the uncertainty of signaling node- and network-level neuromarkers, as well as their predictive mechanism for phenotypic outcomes. Through extensive simulations, we demonstrate that our proposed method outperforms the alternative approaches substantially in both out-of-sample prediction and feature selection. By implementing the model to study children's general mental abilities, we establish a powerful predictive mechanism based on the identified task contrast traits and resting-state sub-networks.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae013
Qianhan Zeng, Jing Zhou, Ying Ji, Hansheng Wang
Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.
{"title":"A semiparametric Gaussian mixture model for chest CT-based 3D blood vessel reconstruction.","authors":"Qianhan Zeng, Jing Zhou, Ying Ji, Hansheng Wang","doi":"10.1093/biostatistics/kxae013","DOIUrl":"10.1093/biostatistics/kxae013","url":null,"abstract":"<p><p>Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae015
Gen Li, Miaoyan Wang
Gaussian graphical models are widely used to study the dependence structure among variables. When samples are obtained from multiple conditions or populations, joint analysis of multiple graphical models are desired due to their capacity to borrow strength across populations. Nonetheless, existing methods often overlook the varying levels of similarity between populations, leading to unsatisfactory results. Moreover, in many applications, learning the population-level clustering structure itself is of particular interest. In this article, we develop a novel method, called Simultaneous Clustering and Estimation of Networks via Tensor decomposition (SCENT), that simultaneously clusters and estimates graphical models from multiple populations. Precision matrices from different populations are uniquely organized as a three-way tensor array, and a low-rank sparse model is proposed for joint population clustering and network estimation. We develop a penalized likelihood method and an augmented Lagrangian algorithm for model fitting. We also establish the clustering accuracy and norm consistency of the estimated precision matrices. We demonstrate the efficacy of the proposed method with comprehensive simulation studies. The application to the Genotype-Tissue Expression multi-tissue gene expression data provides important insights into tissue clustering and gene coexpression patterns in multiple brain tissues.
{"title":"Simultaneous clustering and estimation of networks in multiple graphical models.","authors":"Gen Li, Miaoyan Wang","doi":"10.1093/biostatistics/kxae015","DOIUrl":"10.1093/biostatistics/kxae015","url":null,"abstract":"<p><p>Gaussian graphical models are widely used to study the dependence structure among variables. When samples are obtained from multiple conditions or populations, joint analysis of multiple graphical models are desired due to their capacity to borrow strength across populations. Nonetheless, existing methods often overlook the varying levels of similarity between populations, leading to unsatisfactory results. Moreover, in many applications, learning the population-level clustering structure itself is of particular interest. In this article, we develop a novel method, called Simultaneous Clustering and Estimation of Networks via Tensor decomposition (SCENT), that simultaneously clusters and estimates graphical models from multiple populations. Precision matrices from different populations are uniquely organized as a three-way tensor array, and a low-rank sparse model is proposed for joint population clustering and network estimation. We develop a penalized likelihood method and an augmented Lagrangian algorithm for model fitting. We also establish the clustering accuracy and norm consistency of the estimated precision matrices. We demonstrate the efficacy of the proposed method with comprehensive simulation studies. The application to the Genotype-Tissue Expression multi-tissue gene expression data provides important insights into tissue clustering and gene coexpression patterns in multiple brain tissues.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11826093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae031
Ying Huang, Dean Follmann
Immune response decays over time, and vaccine-induced protection often wanes. Understanding how vaccine efficacy changes over time is critical to guiding the development and application of vaccines in preventing infectious diseases. The objective of this article is to develop statistical methods that assess the effect of decaying immune responses on the risk of disease and on vaccine efficacy, within the context of Cox regression with sparse sampling of immune responses, in a baseline-naive population. We aim to further disentangle the various aspects of the time-varying vaccine effect, whether direct on disease or mediated through immune responses. Based on time-to-event data from a vaccine efficacy trial and sparse sampling of longitudinal immune responses, we propose a weighted estimated induced likelihood approach that models the longitudinal immune response trajectory and the time to event separately. This approach assesses the effects of the decaying immune response, the peak immune response, and/or the waning vaccine effect on the risk of disease. The proposed method is applicable not only to standard randomized trial designs but also to augmented vaccine trial designs that re-vaccinate uninfected placebo recipients at the end of the standard trial period. We conducted simulation studies to evaluate the performance of our method and applied the method to analyze immune correlates from a phase III SARS-CoV-2 vaccine trial.
免疫反应会随着时间的推移而衰减,疫苗诱导的保护作用往往会减弱。了解疫苗效力如何随时间而变化,对于指导疫苗的开发和应用以预防传染病至关重要。本文旨在开发统计方法,在对基线免疫人群的免疫反应进行稀疏采样的考克斯回归背景下,评估衰减的免疫反应对疾病风险和疫苗效力的影响。我们的目标是进一步厘清疫苗时变效应的各个方面,无论是直接影响疾病还是通过免疫反应介导。基于疫苗疗效试验的事件发生时间数据和纵向免疫反应的稀疏采样,我们提出了一种加权估计诱导似然法,该方法对纵向免疫反应轨迹和事件发生时间分别建模。这种方法可评估免疫反应衰减、免疫反应高峰和/或疫苗效果减弱对疾病风险的影响。所提出的方法不仅适用于标准随机试验设计,也适用于在标准试验期结束时对未感染的安慰剂受试者进行再接种的增强疫苗试验设计。我们进行了模拟研究来评估我们的方法的性能,并将该方法应用于分析 SARS-CoV-2 疫苗 III 期试验的免疫相关性。
{"title":"Exposure proximal immune correlates analysis.","authors":"Ying Huang, Dean Follmann","doi":"10.1093/biostatistics/kxae031","DOIUrl":"10.1093/biostatistics/kxae031","url":null,"abstract":"<p><p>Immune response decays over time, and vaccine-induced protection often wanes. Understanding how vaccine efficacy changes over time is critical to guiding the development and application of vaccines in preventing infectious diseases. The objective of this article is to develop statistical methods that assess the effect of decaying immune responses on the risk of disease and on vaccine efficacy, within the context of Cox regression with sparse sampling of immune responses, in a baseline-naive population. We aim to further disentangle the various aspects of the time-varying vaccine effect, whether direct on disease or mediated through immune responses. Based on time-to-event data from a vaccine efficacy trial and sparse sampling of longitudinal immune responses, we propose a weighted estimated induced likelihood approach that models the longitudinal immune response trajectory and the time to event separately. This approach assesses the effects of the decaying immune response, the peak immune response, and/or the waning vaccine effect on the risk of disease. The proposed method is applicable not only to standard randomized trial designs but also to augmented vaccine trial designs that re-vaccinate uninfected placebo recipients at the end of the standard trial period. We conducted simulation studies to evaluate the performance of our method and applied the method to analyze immune correlates from a phase III SARS-CoV-2 vaccine trial.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}