首页 > 最新文献

Biostatistics最新文献

英文 中文
A semiparametric Gaussian mixture model for chest CT-based 3D blood vessel reconstruction. 基于胸部 CT 的三维血管重建半参数高斯混合物模型
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-19 DOI: 10.1093/biostatistics/kxae013
Qianhan Zeng, Jing Zhou, Ying Ji, Hansheng Wang

Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.

计算机断层扫描(CT)自 20 世纪 70 年代问世以来,一直是一种强大的诊断工具。利用 CT 数据,可以通过专业软件重建血管等人体内部器官和组织的三维结构。这种三维重建对外科手术至关重要,并可作为生动的医学教学范例。然而,传统的三维重建严重依赖人工操作,耗时长、主观性强,而且需要丰富的经验。为解决这一问题,我们开发了一种专为血管三维重建量身定制的新型半参数高斯混合模型。该模型扩展了经典的高斯混合模型,可根据体素位置对相关分量参数进行非参数变化。我们开发了一种基于核的期望最大化算法来估计模型参数,并辅以渐近理论。此外,我们还提出了一种优化带宽选择的新型回归方法。与传统的基于交叉验证(CV)的方法相比,回归方法在计算和统计效率方面都优于 CV 方法。在应用中,该方法有助于全自动重建三维血管结构,且精确度极高。
{"title":"A semiparametric Gaussian mixture model for chest CT-based 3D blood vessel reconstruction.","authors":"Qianhan Zeng, Jing Zhou, Ying Ji, Hansheng Wang","doi":"10.1093/biostatistics/kxae013","DOIUrl":"10.1093/biostatistics/kxae013","url":null,"abstract":"<p><p>Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the overall fraction of phenotypic variance attributed to high-dimensional predictors measured with error. 估算表型变异归因于误差测量的高维预测因子的总体比例。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad001
Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi

In prospective genomic studies (e.g., DNA methylation, metagenomics, and transcriptomics), it is crucial to estimate the overall fraction of phenotypic variance (OFPV) attributed to the high-dimensional genomic variables, a concept similar to heritability analyses in genome-wide association studies (GWAS). Unlike genetic variants in GWAS, these genomic variables are typically measured with error due to technical limitation and temporal instability. While the existing methods developed for GWAS can be used, ignoring measurement error may severely underestimate OFPV and mislead the design of future studies. Assuming that measurement error variances are distributed similarly between causal and noncausal variables, we show that the asymptotic attenuation factor equals to the average intraclass correlation coefficients of all genomic variables, which can be estimated based on a pilot study with repeated measurements. We illustrate the method by estimating the contribution of microbiome taxa to body mass index and multiple allergy traits in the American Gut Project. Finally, we show that measurement error does not cause meaningful bias when estimating the correlation of effect sizes for two traits.

在前瞻性基因组研究(如 DNA 甲基化、元基因组学和转录组学)中,估算归因于高维基因组变量的表型变异(OFPV)的总体比例至关重要,这一概念类似于全基因组关联研究(GWAS)中的遗传率分析。与全基因组关联研究中的遗传变异不同,这些基因组变量的测量通常会因技术限制和时间不稳定性而产生误差。虽然可以使用为全基因组关联研究(GWAS)开发的现有方法,但忽略测量误差可能会严重低估 OFPV,并误导未来的研究设计。假设测量误差方差在因果变量和非因果变量之间分布相似,我们证明渐近衰减因子等于所有基因组变量的平均类内相关系数,这可以根据重复测量的试验研究来估计。我们通过估算美国肠道项目中微生物群分类群对体重指数和多种过敏特征的贡献来说明这种方法。最后,我们表明,在估计两个性状的效应大小相关性时,测量误差不会造成有意义的偏差。
{"title":"Estimating the overall fraction of phenotypic variance attributed to high-dimensional predictors measured with error.","authors":"Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi","doi":"10.1093/biostatistics/kxad001","DOIUrl":"10.1093/biostatistics/kxad001","url":null,"abstract":"<p><p>In prospective genomic studies (e.g., DNA methylation, metagenomics, and transcriptomics), it is crucial to estimate the overall fraction of phenotypic variance (OFPV) attributed to the high-dimensional genomic variables, a concept similar to heritability analyses in genome-wide association studies (GWAS). Unlike genetic variants in GWAS, these genomic variables are typically measured with error due to technical limitation and temporal instability. While the existing methods developed for GWAS can be used, ignoring measurement error may severely underestimate OFPV and mislead the design of future studies. Assuming that measurement error variances are distributed similarly between causal and noncausal variables, we show that the asymptotic attenuation factor equals to the average intraclass correlation coefficients of all genomic variables, which can be estimated based on a pilot study with repeated measurements. We illustrate the method by estimating the contribution of microbiome taxa to body mass index and multiple allergy traits in the American Gut Project. Finally, we show that measurement error does not cause meaningful bias when estimating the correlation of effect sizes for two traits.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"486-503"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10728987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree-based subgroup discovery using electronic health record data: heterogeneity of treatment effects for DTG-containing therapies. 利用电子健康记录数据进行基于树状结构的亚组发现:含 DTG 疗法治疗效果的异质性。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad014
Jiabei Yang, Ann W Mwangi, Rami Kantor, Issa J Dahabreh, Monicah Nyambura, Allison Delong, Joseph W Hogan, Jon A Steingrimsson

The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.

电子健康记录(EHR)提供了丰富的个人纵向数据,可用于研究治疗效果的异质性。然而,利用电子病历数据估计治疗效果面临着一些挑战,包括时变混杂因素、协变量、治疗分配和结果的重复和时间不一致测量,以及因辍学造成的随访损失。在此,我们开发了纵向数据亚组发现算法,这是一种基于树的算法,通过将广义交互树算法(一种用于发现亚组的通用数据驱动方法)与纵向目标最大似然估计相结合,利用纵向数据发现具有异质性治疗效果的亚组。我们将该算法应用于电子病历数据,以发现接受含多鲁特韦(DTG)的抗逆转录病毒疗法(ARTs)与接受不含 DTG 的抗逆转录病毒疗法时体重增加风险较高的人类免疫缺陷病毒感染者亚群。
{"title":"Tree-based subgroup discovery using electronic health record data: heterogeneity of treatment effects for DTG-containing therapies.","authors":"Jiabei Yang, Ann W Mwangi, Rami Kantor, Issa J Dahabreh, Monicah Nyambura, Allison Delong, Joseph W Hogan, Jon A Steingrimsson","doi":"10.1093/biostatistics/kxad014","DOIUrl":"10.1093/biostatistics/kxad014","url":null,"abstract":"<p><p>The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"323-335"},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10204527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A joint Bayesian hierarchical model for estimating SARS-CoV-2 genomic and subgenomic RNA viral dynamics and seroconversion. 用于估计 SARS-CoV-2 基因组和亚基因组 RNA 病毒动态和血清转换的贝叶斯分层联合模型。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad016
Tracy Q Dong, Elizabeth R Brown

Understanding the viral dynamics of and natural immunity to the severe acute respiratory syndrome coronavirus 2 is crucial for devising better therapeutic and prevention strategies for coronavirus disease 2019 (COVID-19). Here, we present a Bayesian hierarchical model that jointly estimates the genomic RNA viral load, the subgenomic RNA (sgRNA) viral load (correlated to active viral replication), and the rate and timing of seroconversion (correlated to presence of antibodies). Our proposed method accounts for the dynamical relationship and correlation structure between the two types of viral load, allows for borrowing of information between viral load and antibody data, and identifies potential correlates of viral load characteristics and propensity for seroconversion. We demonstrate the features of the joint model through application to the COVID-19 post-exposure prophylaxis study and conduct a cross-validation exercise to illustrate the model's ability to impute the sgRNA viral trajectories for people who only had genomic RNA viral load data.

了解严重急性呼吸系统综合征冠状病毒2的病毒动态和天然免疫对于制定更好的2019年冠状病毒病(COVID-19)治疗和预防策略至关重要。在此,我们提出了一种贝叶斯分层模型,该模型可联合估算基因组 RNA 病毒载量、亚基因组 RNA (sgRNA) 病毒载量(与活跃的病毒复制相关)以及血清转换率和时间(与抗体的存在相关)。我们提出的方法考虑了两类病毒载量之间的动态关系和相关结构,允许借用病毒载量和抗体数据之间的信息,并识别病毒载量特征和血清转换倾向的潜在相关因素。我们将联合模型应用于 COVID-19 暴露后预防研究,展示了该模型的特点,并进行了交叉验证,以说明该模型能够为仅有基因组 RNA 病毒载量数据的人群估算 sgRNA 病毒轨迹。
{"title":"A joint Bayesian hierarchical model for estimating SARS-CoV-2 genomic and subgenomic RNA viral dynamics and seroconversion.","authors":"Tracy Q Dong, Elizabeth R Brown","doi":"10.1093/biostatistics/kxad016","DOIUrl":"10.1093/biostatistics/kxad016","url":null,"abstract":"<p><p>Understanding the viral dynamics of and natural immunity to the severe acute respiratory syndrome coronavirus 2 is crucial for devising better therapeutic and prevention strategies for coronavirus disease 2019 (COVID-19). Here, we present a Bayesian hierarchical model that jointly estimates the genomic RNA viral load, the subgenomic RNA (sgRNA) viral load (correlated to active viral replication), and the rate and timing of seroconversion (correlated to presence of antibodies). Our proposed method accounts for the dynamical relationship and correlation structure between the two types of viral load, allows for borrowing of information between viral load and antibody data, and identifies potential correlates of viral load characteristics and propensity for seroconversion. We demonstrate the features of the joint model through application to the COVID-19 post-exposure prophylaxis study and conduct a cross-validation exercise to illustrate the model's ability to impute the sgRNA viral trajectories for people who only had genomic RNA viral load data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"336-353"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10247403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: A transformation perspective on marginal and conditional models. Correction to:边际模型和条件模型的转换视角。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad017
{"title":"Correction to: A transformation perspective on marginal and conditional models.","authors":"","doi":"10.1093/biostatistics/kxad017","DOIUrl":"10.1093/biostatistics/kxad017","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"597"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10301897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-trait analysis of gene-by-environment interactions in large-scale genetic studies. 大规模遗传研究中基因与环境相互作用的多属性分析。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad004
Lan Luo, Devan V Mehrotra, Judong Shen, Zheng-Zheng Tang

Identifying genotype-by-environment interaction (GEI) is challenging because the GEI analysis generally has low power. Large-scale consortium-based studies are ultimately needed to achieve adequate power for identifying GEI. We introduce Multi-Trait Analysis of Gene-Environment Interactions (MTAGEI), a powerful, robust, and computationally efficient framework to test gene-environment interactions on multiple traits in large data sets, such as the UK Biobank (UKB). To facilitate the meta-analysis of GEI studies in a consortium, MTAGEI efficiently generates summary statistics of genetic associations for multiple traits under different environmental conditions and integrates the summary statistics for GEI analysis. MTAGEI enhances the power of GEI analysis by aggregating GEI signals across multiple traits and variants that would otherwise be difficult to detect individually. MTAGEI achieves robustness by combining complementary tests under a wide spectrum of genetic architectures. We demonstrate the advantages of MTAGEI over existing single-trait-based GEI tests through extensive simulation studies and the analysis of the whole exome sequencing data from the UKB.

鉴定基因型与环境的交互作用(GEI)具有挑战性,因为 GEI 分析的功率通常较低。最终需要进行大规模的联合研究,以获得足够的功率来识别 GEI。我们介绍了基因与环境互作的多性状分析(MTAGEI),这是一个功能强大、稳健且计算效率高的框架,用于测试英国生物库(UKB)等大型数据集中多个性状的基因与环境互作。为便于在联合体中对 GEI 研究进行荟萃分析,MTAGEI 可高效生成不同环境条件下多个性状的遗传关联汇总统计,并将汇总统计整合到 GEI 分析中。MTAGEI 通过汇总多个性状和变异的 GEI 信号,增强了 GEI 分析的能力,否则很难单独检测到这些信号。MTAGEI 通过在广泛的遗传结构下结合互补测试来实现稳健性。我们通过广泛的模拟研究和对英国广播公司全外显子组测序数据的分析,证明了 MTAGEI 与现有的基于单一性状的 GEI 检测相比所具有的优势。
{"title":"Multi-trait analysis of gene-by-environment interactions in large-scale genetic studies.","authors":"Lan Luo, Devan V Mehrotra, Judong Shen, Zheng-Zheng Tang","doi":"10.1093/biostatistics/kxad004","DOIUrl":"10.1093/biostatistics/kxad004","url":null,"abstract":"<p><p>Identifying genotype-by-environment interaction (GEI) is challenging because the GEI analysis generally has low power. Large-scale consortium-based studies are ultimately needed to achieve adequate power for identifying GEI. We introduce Multi-Trait Analysis of Gene-Environment Interactions (MTAGEI), a powerful, robust, and computationally efficient framework to test gene-environment interactions on multiple traits in large data sets, such as the UK Biobank (UKB). To facilitate the meta-analysis of GEI studies in a consortium, MTAGEI efficiently generates summary statistics of genetic associations for multiple traits under different environmental conditions and integrates the summary statistics for GEI analysis. MTAGEI enhances the power of GEI analysis by aggregating GEI signals across multiple traits and variants that would otherwise be difficult to detect individually. MTAGEI achieves robustness by combining complementary tests under a wide spectrum of genetic architectures. We demonstrate the advantages of MTAGEI over existing single-trait-based GEI tests through extensive simulation studies and the analysis of the whole exome sequencing data from the UKB.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"504-520"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9090518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling biomarker variability in joint analysis of longitudinal and time-to-event data. 在纵向数据和时间到事件数据的联合分析中建立生物标记变异性模型。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad009
Chunyu Wang, Jiaming Shen, Christiana Charalambous, Jianxin Pan

The role of visit-to-visit variability of a biomarker in predicting related disease has been recognized in medical science. Existing measures of biological variability are criticized for being entangled with random variability resulted from measurement error or being unreliable due to limited measurements per individual. In this article, we propose a new measure to quantify the biological variability of a biomarker by evaluating the fluctuation of each individual-specific trajectory behind longitudinal measurements. Given a mixed-effects model for longitudinal data with the mean function over time specified by cubic splines, our proposed variability measure can be mathematically expressed as a quadratic form of random effects. A Cox model is assumed for time-to-event data by incorporating the defined variability as well as the current level of the underlying longitudinal trajectory as covariates, which, together with the longitudinal model, constitutes the joint modeling framework in this article. Asymptotic properties of maximum likelihood estimators are established for the present joint model. Estimation is implemented via an Expectation-Maximization (EM) algorithm with fully exponential Laplace approximation used in E-step to reduce the computation burden due to the increase of the random effects dimension. Simulation studies are conducted to reveal the advantage of the proposed method over the two-stage method, as well as a simpler joint modeling approach which does not take into account biomarker variability. Finally, we apply our model to investigate the effect of systolic blood pressure variability on cardiovascular events in the Medical Research Council elderly trial, which is also the motivating example for this article.

医学界已经认识到生物标志物的逐次变异性在预测相关疾病中的作用。现有的生物变异性测量方法因与测量误差导致的随机变异性纠缠在一起或因每个人的测量值有限而不可靠而受到批评。在本文中,我们提出了一种新的测量方法,通过评估纵向测量背后每个个体特定轨迹的波动来量化生物标志物的生物变异性。鉴于纵向数据的混合效应模型中,随时间变化的均值函数是由三次样条指定的,我们提出的变异性测量方法在数学上可以表示为随机效应的二次形式。通过将定义的变异性和基本纵向轨迹的当前水平作为协变量,假设时间到事件数据采用 Cox 模型,该模型与纵向模型一起构成了本文的联合建模框架。本文为本联合模型建立了最大似然估计器的渐近特性。估计是通过期望最大化(EM)算法实现的,在 E 步中使用了全指数拉普拉斯近似,以减少随机效应维度增加带来的计算负担。我们进行了模拟研究,以揭示所提出的方法相对于两阶段方法的优势,以及不考虑生物标记变异性的更简单的联合建模方法的优势。最后,我们应用我们的模型研究了医学研究委员会老年试验中收缩压变异性对心血管事件的影响,这也是本文的激励实例。
{"title":"Modeling biomarker variability in joint analysis of longitudinal and time-to-event data.","authors":"Chunyu Wang, Jiaming Shen, Christiana Charalambous, Jianxin Pan","doi":"10.1093/biostatistics/kxad009","DOIUrl":"10.1093/biostatistics/kxad009","url":null,"abstract":"<p><p>The role of visit-to-visit variability of a biomarker in predicting related disease has been recognized in medical science. Existing measures of biological variability are criticized for being entangled with random variability resulted from measurement error or being unreliable due to limited measurements per individual. In this article, we propose a new measure to quantify the biological variability of a biomarker by evaluating the fluctuation of each individual-specific trajectory behind longitudinal measurements. Given a mixed-effects model for longitudinal data with the mean function over time specified by cubic splines, our proposed variability measure can be mathematically expressed as a quadratic form of random effects. A Cox model is assumed for time-to-event data by incorporating the defined variability as well as the current level of the underlying longitudinal trajectory as covariates, which, together with the longitudinal model, constitutes the joint modeling framework in this article. Asymptotic properties of maximum likelihood estimators are established for the present joint model. Estimation is implemented via an Expectation-Maximization (EM) algorithm with fully exponential Laplace approximation used in E-step to reduce the computation burden due to the increase of the random effects dimension. Simulation studies are conducted to reveal the advantage of the proposed method over the two-stage method, as well as a simpler joint modeling approach which does not take into account biomarker variability. Finally, we apply our model to investigate the effect of systolic blood pressure variability on cardiovascular events in the Medical Research Council elderly trial, which is also the motivating example for this article.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"577-596"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9522826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying covariate-related subnetworks for whole-brain connectome analysis. 为全脑连接组分析识别协变量相关子网络
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad007
Shuo Chen, Yuan Zhang, Qiong Wu, Chuan Bi, Peter Kochunov, L Elliot Hong

Whole-brain connectome data characterize the connections among distributed neural populations as a set of edges in a large network, and neuroscience research aims to systematically investigate associations between brain connectome and clinical or experimental conditions as covariates. A covariate is often related to a number of edges connecting multiple brain areas in an organized structure. However, in practice, neither the covariate-related edges nor the structure is known. Therefore, the understanding of underlying neural mechanisms relies on statistical methods that are capable of simultaneously identifying covariate-related connections and recognizing their network topological structures. The task can be challenging because of false-positive noise and almost infinite possibilities of edges combining into subnetworks. To address these challenges, we propose a new statistical approach to handle multivariate edge variables as outcomes and output covariate-related subnetworks. We first study the graph properties of covariate-related subnetworks from a graph and combinatorics perspective and accordingly bridge the inference for individual connectome edges and covariate-related subnetworks. Next, we develop efficient algorithms to exact covariate-related subnetworks from the whole-brain connectome data with an $ell_0$ norm penalty. We validate the proposed methods based on an extensive simulation study, and we benchmark our performance against existing methods. Using our proposed method, we analyze two separate resting-state functional magnetic resonance imaging data sets for schizophrenia research and obtain highly replicable disease-related subnetworks.

全脑连接组数据将分布式神经群之间的连接描述为大型网络中的一组边缘,神经科学研究旨在系统地调查大脑连接组与作为协变量的临床或实验条件之间的关联。协变量通常与有组织结构中连接多个脑区的若干边缘有关。然而,在实践中,与协变量相关的边缘和结构都是未知的。因此,对潜在神经机制的理解有赖于能够同时识别协变量相关连接和识别其网络拓扑结构的统计方法。由于假阳性噪声和几乎无限可能的边缘组合成子网络,这项任务具有挑战性。为了应对这些挑战,我们提出了一种新的统计方法来处理作为结果的多变量边缘变量,并输出与协变量相关的子网络。我们首先从图和组合学的角度研究了共变相关子网的图属性,并相应地为单个连接组边缘和共变相关子网架起了推断的桥梁。接下来,我们开发了高效算法,从全脑连接组数据中精确推导出具有$ell_0$规范惩罚的协变量相关子网络。我们基于广泛的模拟研究验证了所提出的方法,并将我们的性能与现有方法进行了比较。利用我们提出的方法,我们分析了两个独立的静息态功能磁共振成像数据集,用于精神分裂症研究,并获得了高度可复制的疾病相关子网络。
{"title":"Identifying covariate-related subnetworks for whole-brain connectome analysis.","authors":"Shuo Chen, Yuan Zhang, Qiong Wu, Chuan Bi, Peter Kochunov, L Elliot Hong","doi":"10.1093/biostatistics/kxad007","DOIUrl":"10.1093/biostatistics/kxad007","url":null,"abstract":"<p><p>Whole-brain connectome data characterize the connections among distributed neural populations as a set of edges in a large network, and neuroscience research aims to systematically investigate associations between brain connectome and clinical or experimental conditions as covariates. A covariate is often related to a number of edges connecting multiple brain areas in an organized structure. However, in practice, neither the covariate-related edges nor the structure is known. Therefore, the understanding of underlying neural mechanisms relies on statistical methods that are capable of simultaneously identifying covariate-related connections and recognizing their network topological structures. The task can be challenging because of false-positive noise and almost infinite possibilities of edges combining into subnetworks. To address these challenges, we propose a new statistical approach to handle multivariate edge variables as outcomes and output covariate-related subnetworks. We first study the graph properties of covariate-related subnetworks from a graph and combinatorics perspective and accordingly bridge the inference for individual connectome edges and covariate-related subnetworks. Next, we develop efficient algorithms to exact covariate-related subnetworks from the whole-brain connectome data with an $ell_0$ norm penalty. We validate the proposed methods based on an extensive simulation study, and we benchmark our performance against existing methods. Using our proposed method, we analyze two separate resting-state functional magnetic resonance imaging data sets for schizophrenia research and obtain highly replicable disease-related subnetworks.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"541-558"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformation perspective on marginal and conditional models. 边际模型和条件模型的转换视角。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxac048
Luisa Barbanti, Torsten Hothorn

Clustered observations are ubiquitous in controlled and observational studies and arise naturally in multicenter trials or longitudinal surveys. We present a novel model for the analysis of clustered observations where the marginal distributions are described by a linear transformation model and the correlations by a joint multivariate normal distribution. The joint model provides an analytic formula for the marginal distribution. Owing to the richness of transformation models, the techniques are applicable to any type of response variable, including bounded, skewed, binary, ordinal, or survival responses. We demonstrate how the common normal assumption for reaction times can be relaxed in the sleep deprivation benchmark data set and report marginal odds ratios for the notoriously difficult toe nail data. We furthermore discuss the analysis of two clinical trials aiming at the estimation of marginal treatment effects. In the first trial, pain was repeatedly assessed on a bounded visual analog scale and marginal proportional-odds models are presented. The second trial reported disease-free survival in rectal cancer patients, where the marginal hazard ratio from Weibull and Cox models is of special interest. An empirical evaluation compares the performance of the novel approach to general estimation equations for binary responses and to conditional mixed-effects models for continuous responses. An implementation is available in the tram add-on package to the R system and was benchmarked against established models in the literature.

聚类观察结果在对照研究和观察研究中无处不在,在多中心试验或纵向调查中也会自然出现。我们提出了一种新的聚类观测数据分析模型,其中边际分布由线性变换模型描述,相关性由联合多元正态分布描述。联合模型提供了边际分布的分析公式。由于变换模型的丰富性,这些技术适用于任何类型的响应变量,包括有界、倾斜、二元、序数或生存响应。我们展示了如何在睡眠剥夺基准数据集中放宽反应时间的常见正态假设,并报告了众所周知的脚趾甲数据的边际几率比。此外,我们还讨论了旨在估计边际治疗效果的两项临床试验的分析。在第一项试验中,用有界视觉模拟量表对疼痛进行了反复评估,并给出了边际比例-胜数模型。第二项试验报告了直肠癌患者的无病生存期,其中 Weibull 和 Cox 模型的边际危险比特别值得关注。经验评估比较了新方法与二元反应的一般估计方程和连续反应的条件混合效应模型的性能。在$texttt{R}$系统的Tram附加软件包中提供了实现方法,并与文献中的成熟模型进行了基准比较。
{"title":"A transformation perspective on marginal and conditional models.","authors":"Luisa Barbanti, Torsten Hothorn","doi":"10.1093/biostatistics/kxac048","DOIUrl":"10.1093/biostatistics/kxac048","url":null,"abstract":"<p><p>Clustered observations are ubiquitous in controlled and observational studies and arise naturally in multicenter trials or longitudinal surveys. We present a novel model for the analysis of clustered observations where the marginal distributions are described by a linear transformation model and the correlations by a joint multivariate normal distribution. The joint model provides an analytic formula for the marginal distribution. Owing to the richness of transformation models, the techniques are applicable to any type of response variable, including bounded, skewed, binary, ordinal, or survival responses. We demonstrate how the common normal assumption for reaction times can be relaxed in the sleep deprivation benchmark data set and report marginal odds ratios for the notoriously difficult toe nail data. We furthermore discuss the analysis of two clinical trials aiming at the estimation of marginal treatment effects. In the first trial, pain was repeatedly assessed on a bounded visual analog scale and marginal proportional-odds models are presented. The second trial reported disease-free survival in rectal cancer patients, where the marginal hazard ratio from Weibull and Cox models is of special interest. An empirical evaluation compares the performance of the novel approach to general estimation equations for binary responses and to conditional mixed-effects models for continuous responses. An implementation is available in the tram add-on package to the R system and was benchmarked against established models in the literature.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"402-428"},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple imputation of more than one environmental exposure with nondifferential measurement error. 具有非微分测量误差的一次以上环境暴露的多重插补。
IF 2.1 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-15 DOI: 10.1093/biostatistics/kxad011
Yuanzhi Yu, Roderick J Little, Matthew Perzanowski, Qixuan Chen

Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.

测量误差在环境流行病学研究中很常见,但在以多种环境暴露为协变量的回归模型中校正测量误差的方法尚未得到很好的研究。我们考虑了一种多重插补方法,将包含真实和易出错暴露信息的外部或内部校准样本与误差测量的多重暴露的主要研究数据相结合。我们提出了一种约束链式方程多重插补(CEMI)算法,该算法基于强非微分测量误差的假设,对链式方程插补中的插补模型参数进行约束。我们还扩展了约束CEMI方法,以适应主要研究数据中容易出错的暴露中的非检测。我们使用bootstrap估计回归系数的方差,每个bootstrap样本有两个输入。模拟表明,约束CEMI方法优于现有方法,即忽略测量误差、经典校准和回归预测的方法,产生具有较小偏差的估计回归系数和覆盖率接近标称水平的置信区间。我们将所提出的方法应用于社区哮喘和过敏研究,以调查纽约市哮喘儿童中多种室内过敏原的浓度与呼出一氧化氮水平之间的关系。约束CEMI方法可以通过使用R中的鼠标和bootImpute包对插补矩阵施加约束来实现。
{"title":"Multiple imputation of more than one environmental exposure with nondifferential measurement error.","authors":"Yuanzhi Yu, Roderick J Little, Matthew Perzanowski, Qixuan Chen","doi":"10.1093/biostatistics/kxad011","DOIUrl":"10.1093/biostatistics/kxad011","url":null,"abstract":"<p><p>Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"306-322"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9522828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1