首页 > 最新文献

Statistics in Medicine最新文献

英文 中文
Bayesian Clustering Factor Models. 贝叶斯聚类因子模型。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70350
Hwasoo Shin, Marco A R Ferreira, Allison N Tegge

We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors follow a mixture of Gaussian distributions. We develop a Gibbs sampler to explore the posterior distribution and propose an information criterion to select the number of clusters and the number of factors. Simulation studies show that our inferential approach appropriately quantifies uncertainty. In addition, when compared to two previously published competitor methods, our information criterion has favorable performance in terms of correct selection of number of clusters and number of factors. Finally, we illustrate the capabilities of our framework with an application to data on recovery from opioid use disorder where clustering of individuals may facilitate personalized health care.

提出了一种新的伴随降维聚类框架。该框架基于一类新的贝叶斯聚类因子模型。这些模型采用因子模型结构,其中公共因子的向量遵循混合高斯分布。我们开发了一个吉布斯采样器来探索后验分布,并提出了一个信息标准来选择聚类数量和因子数量。仿真研究表明,我们的推理方法适当地量化了不确定性。此外,与之前发表的两种竞争方法相比,我们的信息准则在正确选择聚类数量和因素数量方面具有良好的性能。最后,我们通过应用于阿片类药物使用障碍恢复数据来说明我们的框架的能力,其中个体聚类可能促进个性化医疗保健。
{"title":"Bayesian Clustering Factor Models.","authors":"Hwasoo Shin, Marco A R Ferreira, Allison N Tegge","doi":"10.1002/sim.70350","DOIUrl":"10.1002/sim.70350","url":null,"abstract":"<p><p>We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors follow a mixture of Gaussian distributions. We develop a Gibbs sampler to explore the posterior distribution and propose an information criterion to select the number of clusters and the number of factors. Simulation studies show that our inferential approach appropriately quantifies uncertainty. In addition, when compared to two previously published competitor methods, our information criterion has favorable performance in terms of correct selection of number of clusters and number of factors. Finally, we illustrate the capabilities of our framework with an application to data on recovery from opioid use disorder where clustering of individuals may facilitate personalized health care.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70350"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Stability-Enhanced Lasso Approach for Covariate Selection in Non-Linear Mixed Effects Model. 非线性混合效应模型协变量选择的稳定性增强Lasso方法。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70407
Auriane Gabaut, Rodolphe Thiébaut, Cécile Proust-Lima, Mélanie Prague

Non-linear mixed effects models (NLMEMs) defined by ordinary differential equations (ODEs) are central to modeling complex biological systems over time, particularly in pharmacometrics, viral dynamics, and immunology. However, selecting relevant covariates associated with dynamics in high-dimensional settings remains a major challenge. This study introduces a novel model-building approach called Lasso-SAMBA (SAMBA: Stochastic Approximation for Model Building Algorithm) that integrates Lasso regression with a stability selection algorithm for robust covariate selection within ODE-based NLMEMs. The method iteratively constructs models by coupling penalized regression with mechanistic model estimation using the SAEM algorithm. It extends a prior strategy named SAMBA, originally based on stepwise inclusion, by replacing this step with a penalized, stability-driven approach that reduces false discoveries and improves selection robustness. By maintaining the monotonic decrease of the information criterion through a calibrated exploration of penalization parameters, the proposed method outperforms conventional stepwise and Bayesian variable selection alternatives. Extensive simulation studies, spanning pharmacokinetic and immunological models, demonstrate the superiority of Lasso-SAMBA in variable selection fidelity, FDR (False Discovery Proportion) control, and computational efficiency. The Lasso-SAMBA method is implemented in an R package. Applied to a Varicella-Zoster virus vaccination study, the method reveals robust, biologically plausible associations between parameters of the mechanistic model of the humoral immune response and early transcriptomic expressions. These results underscore the practical utility of our method for high-dimensional model building in systems vaccinology and beyond.

由常微分方程(ode)定义的非线性混合效应模型(NLMEMs)是长期复杂生物系统建模的核心,特别是在药物计量学、病毒动力学和免疫学中。然而,在高维环境中选择与动态相关的协变量仍然是一个主要挑战。本研究介绍了一种新的模型构建方法,称为Lasso-SAMBA (SAMBA:随机逼近模型构建算法),该方法将Lasso回归与稳定性选择算法集成在基于ode的NLMEMs中进行鲁棒协变量选择。该方法采用SAEM算法,将惩罚回归与机制模型估计耦合,迭代构建模型。它扩展了先前名为SAMBA的策略(最初基于逐步包含),用一种受惩罚的、稳定性驱动的方法取代了这一步骤,这种方法减少了错误发现并提高了选择的鲁棒性。该方法通过对惩罚参数的校准探索来保持信息准则的单调减少,优于传统的逐步和贝叶斯变量选择方法。广泛的模拟研究,跨越药代动力学和免疫学模型,证明了Lasso-SAMBA在变量选择保真度,FDR(错误发现比例)控制和计算效率方面的优势。Lasso-SAMBA方法在一个R包中实现。应用于水痘-带状疱疹病毒疫苗接种研究,该方法揭示了体液免疫反应机制模型参数与早期转录组表达之间强大的、生物学上合理的关联。这些结果强调了我们的方法在系统疫苗学和其他领域的高维模型构建的实际效用。
{"title":"A Stability-Enhanced Lasso Approach for Covariate Selection in Non-Linear Mixed Effects Model.","authors":"Auriane Gabaut, Rodolphe Thiébaut, Cécile Proust-Lima, Mélanie Prague","doi":"10.1002/sim.70407","DOIUrl":"10.1002/sim.70407","url":null,"abstract":"<p><p>Non-linear mixed effects models (NLMEMs) defined by ordinary differential equations (ODEs) are central to modeling complex biological systems over time, particularly in pharmacometrics, viral dynamics, and immunology. However, selecting relevant covariates associated with dynamics in high-dimensional settings remains a major challenge. This study introduces a novel model-building approach called Lasso-SAMBA (SAMBA: Stochastic Approximation for Model Building Algorithm) that integrates Lasso regression with a stability selection algorithm for robust covariate selection within ODE-based NLMEMs. The method iteratively constructs models by coupling penalized regression with mechanistic model estimation using the SAEM algorithm. It extends a prior strategy named SAMBA, originally based on stepwise inclusion, by replacing this step with a penalized, stability-driven approach that reduces false discoveries and improves selection robustness. By maintaining the monotonic decrease of the information criterion through a calibrated exploration of penalization parameters, the proposed method outperforms conventional stepwise and Bayesian variable selection alternatives. Extensive simulation studies, spanning pharmacokinetic and immunological models, demonstrate the superiority of Lasso-SAMBA in variable selection fidelity, FDR (False Discovery Proportion) control, and computational efficiency. The Lasso-SAMBA method is implemented in an R package. Applied to a Varicella-Zoster virus vaccination study, the method reveals robust, biologically plausible associations between parameters of the mechanistic model of the humoral immune response and early transcriptomic expressions. These results underscore the practical utility of our method for high-dimensional model building in systems vaccinology and beyond.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70407"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Incorporation of External Summary Information in the Cox Regression Under Population Heterogeneity. 种群异质性下外部汇总信息在Cox回归中的自适应融合。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70371
Yiqi Li, Yuan Huang, Ying Sheng, Qingzhao Zhang

With the increasing availability of data from different sources, there is a growing interest in leveraging summary information from external studies to improve parameter estimation efficiency for the internal study that collects individual-level data. However, when analyzing right-censored survival data, covariate effects often vary across studies due to differences in study environments, research designs, and patients' inclusion criteria. Such heterogeneity, if not accounted for properly, can lead to biased estimates of covariate effects. In this article, we develop a Privacy-preserving and Heterogeneity-aware Integration (PHI) method to improve efficiency in estimating regression parameters of the internal Cox model under population heterogeneity. The PHI method characterizes parameter heterogeneity by assuming an unknown cluster structure across datasets, and constructs an augmented log partial likelihood function with a fusion penalty to simultaneously estimate the cluster structure and adaptively incorporate summary statistics from external datasets. Estimation consistency and asymptotic normality are established for the proposed estimator. We further prove that the proposed estimator is asymptotically more efficient than the traditional maximum partial likelihood estimator under mild conditions. The PHI method also achieves consistency in estimating the underlying cluster structure across datasets. Simulation studies and brain tumor data analysis are used to investigate the finite-sample performance of the proposed method.

随着来自不同来源的数据可用性的增加,利用来自外部研究的摘要信息来提高收集个人层面数据的内部研究的参数估计效率的兴趣越来越大。然而,在分析右删节生存数据时,由于研究环境、研究设计和患者纳入标准的差异,协变量效应通常在研究中有所不同。这种异质性,如果没有得到适当的解释,可能导致协变量效应的有偏估计。为了提高种群异质性下内部Cox模型回归参数估计的效率,本文提出了一种隐私保护和异构感知集成(PHI)方法。该方法通过假设未知的数据集聚类结构来表征参数的异质性,并构建具有融合惩罚的增广对数偏似然函数来同时估计聚类结构并自适应地纳入外部数据集的汇总统计。建立了该估计量的估计相合性和渐近正态性。进一步证明了在温和条件下,所提出的估计量比传统的极大部分似然估计量渐近地更有效。PHI方法还实现了跨数据集估计底层聚类结构的一致性。通过仿真研究和脑肿瘤数据分析,研究了该方法的有限样本性能。
{"title":"Adaptive Incorporation of External Summary Information in the Cox Regression Under Population Heterogeneity.","authors":"Yiqi Li, Yuan Huang, Ying Sheng, Qingzhao Zhang","doi":"10.1002/sim.70371","DOIUrl":"https://doi.org/10.1002/sim.70371","url":null,"abstract":"<p><p>With the increasing availability of data from different sources, there is a growing interest in leveraging summary information from external studies to improve parameter estimation efficiency for the internal study that collects individual-level data. However, when analyzing right-censored survival data, covariate effects often vary across studies due to differences in study environments, research designs, and patients' inclusion criteria. Such heterogeneity, if not accounted for properly, can lead to biased estimates of covariate effects. In this article, we develop a Privacy-preserving and Heterogeneity-aware Integration (PHI) method to improve efficiency in estimating regression parameters of the internal Cox model under population heterogeneity. The PHI method characterizes parameter heterogeneity by assuming an unknown cluster structure across datasets, and constructs an augmented log partial likelihood function with a fusion penalty to simultaneously estimate the cluster structure and adaptively incorporate summary statistics from external datasets. Estimation consistency and asymptotic normality are established for the proposed estimator. We further prove that the proposed estimator is asymptotically more efficient than the traditional maximum partial likelihood estimator under mild conditions. The PHI method also achieves consistency in estimating the underlying cluster structure across datasets. Simulation studies and brain tumor data analysis are used to investigate the finite-sample performance of the proposed method.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70371"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured Nonlinear Cure Model With Deep Neural Networks for High-Dimensional Survival Analysis. 基于深度神经网络的高维生存分析结构非线性治愈模型。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70368
Xingdong Feng, Qiaoling Li, Xing Qin, Mengyun Wu, Liang Yu

Accurate prognosis and effective variable selection are essential in high-dimensional survival analysis, particularly for understanding long-term survival outcomes. The mixture cure rate model has been commonly adopted for subjects with exceptionally long survival times. However, traditional models usually assume log-linear effects of covariates, which may not capture the complex and nonlinear relationships in real-world data. Additionally, clinical observations reveal structural similarity between covariates that influence both patient cure rates and survival times. Existing methods typically estimate the two components of the mixture cure model independently, neglecting their inherent connections. To address these limitations, in this study, we enhance the conventional cure rate model by incorporating deep neural networks with a selection layer, while preserving the similarity structure between the cured and susceptible fractions. By integrating regularization constraints on the selection parameters and weight matrices within the neural network, the proposed approach simultaneously achieves effective variable selection and handles a series of complex nonlinear relationships within the data. To further enhance consistency in variable selection across both components of the cure model, a novel penalty is introduced, enabling the proposed model to identify key variables and enhance overall performance and interpretability in high-dimensional datasets. Through extensive simulation studies and real-world data analysis, the superior performance and robustness of the proposed approach are evident.

准确的预后和有效的变量选择在高维生存分析中是必不可少的,特别是对于了解长期生存结果。混合治愈率模型通常用于生存时间特别长的受试者。然而,传统的模型通常假设协变量的对数线性效应,这可能无法捕捉到现实数据中复杂的非线性关系。此外,临床观察揭示了影响患者治愈率和生存时间的协变量之间的结构相似性。现有的方法通常是单独估计混合固化模型的两个组成部分,而忽略了它们之间的内在联系。为了解决这些限制,在本研究中,我们通过结合带有选择层的深度神经网络来增强传统的固化率模型,同时保留固化和敏感组分之间的相似性结构。该方法通过对神经网络中选择参数和权值矩阵的正则化约束进行集成,实现了有效的变量选择,同时处理了数据内部一系列复杂的非线性关系。为了进一步增强cure模型中两个组成部分变量选择的一致性,引入了一种新的惩罚,使所提出的模型能够识别关键变量,并提高高维数据集的整体性能和可解释性。通过广泛的仿真研究和实际数据分析,所提出的方法具有优异的性能和鲁棒性。
{"title":"Structured Nonlinear Cure Model With Deep Neural Networks for High-Dimensional Survival Analysis.","authors":"Xingdong Feng, Qiaoling Li, Xing Qin, Mengyun Wu, Liang Yu","doi":"10.1002/sim.70368","DOIUrl":"https://doi.org/10.1002/sim.70368","url":null,"abstract":"<p><p>Accurate prognosis and effective variable selection are essential in high-dimensional survival analysis, particularly for understanding long-term survival outcomes. The mixture cure rate model has been commonly adopted for subjects with exceptionally long survival times. However, traditional models usually assume log-linear effects of covariates, which may not capture the complex and nonlinear relationships in real-world data. Additionally, clinical observations reveal structural similarity between covariates that influence both patient cure rates and survival times. Existing methods typically estimate the two components of the mixture cure model independently, neglecting their inherent connections. To address these limitations, in this study, we enhance the conventional cure rate model by incorporating deep neural networks with a selection layer, while preserving the similarity structure between the cured and susceptible fractions. By integrating regularization constraints on the selection parameters and weight matrices within the neural network, the proposed approach simultaneously achieves effective variable selection and handles a series of complex nonlinear relationships within the data. To further enhance consistency in variable selection across both components of the cure model, a novel penalty is introduced, enabling the proposed model to identify key variables and enhance overall performance and interpretability in high-dimensional datasets. Through extensive simulation studies and real-world data analysis, the superior performance and robustness of the proposed approach are evident.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70368"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Functional Approach to Testing Overall Effect of Interaction Between DNA Methylation and SNPs. 一种测试DNA甲基化和snp相互作用总体效应的功能方法。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70364
Yvelin Gansou, Karim Oualkacha, Marzia Angela Cremona, Lajmi Lakhal-Chaieb

We introduce a test for the overall effect of interaction between DNA methylation and a set of single nucleotide polymorphisms on a quantitative phenotype. The developed inference procedure is based on a functional approach that extend existing regression models in functional data analysis. Through extensive simulations, we show that the proposed test effectively controls type I error rates and highlights increased empirical power over existing methods, particularly when multiple interactions are present. The use of the proposed test is illustrated with an application to data from obesity patients and controls.

我们介绍了DNA甲基化和一组单核苷酸多态性之间相互作用对定量表型的总体影响的测试。所开发的推理程序是基于一种功能方法,扩展了功能数据分析中现有的回归模型。通过广泛的模拟,我们表明所提出的测试有效地控制了I型错误率,并突出了现有方法的经验力量,特别是当存在多种交互作用时。通过对肥胖患者和对照组数据的应用,说明了所提出的测试的用途。
{"title":"A Functional Approach to Testing Overall Effect of Interaction Between DNA Methylation and SNPs.","authors":"Yvelin Gansou, Karim Oualkacha, Marzia Angela Cremona, Lajmi Lakhal-Chaieb","doi":"10.1002/sim.70364","DOIUrl":"10.1002/sim.70364","url":null,"abstract":"<p><p>We introduce a test for the overall effect of interaction between DNA methylation and a set of single nucleotide polymorphisms on a quantitative phenotype. The developed inference procedure is based on a functional approach that extend existing regression models in functional data analysis. Through extensive simulations, we show that the proposed test effectively controls type I error rates and highlights increased empirical power over existing methods, particularly when multiple interactions are present. The use of the proposed test is illustrated with an application to data from obesity patients and controls.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70364"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mendelian Randomization Methods for Causal Inference: Estimands, Identification and Inference. 因果推理的孟德尔随机化方法:估计、识别和推理。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70394
Minhao Yao, Anqi Wang, Xihao Li, Zhonghua Liu

Mendelian randomization (MR) has become an essential tool for causal inference in biomedical and public health research. By using genetic variants as instrumental variables, MR helps address unmeasured confounding and reverse causation, offering a quasi-experimental framework to evaluate causal effects of modifiable exposures on health outcomes. Despite its promise, MR faces substantial methodological challenges, including invalid instruments, weak instrument bias, and design complexities across different data structures. In this tutorial review, we aim to provide a systematic overview of MR methods for causal inference, emphasizing clarity of causal interpretation, study design comparisons, availability of software tools, and practical guidance for applied scientists. We organize the review around causal estimands, ensuring that analyses are anchored to well-defined causal questions. We discuss the problems of invalid and weak instruments, comparing available strategies for their detection and correction. We integrate discussions of population-based versus family-based MR designs, analyses based on individual-level versus summary-level data, and one-sample versus two-sample MR designs, highlighting their relative advantages and limitations. We also summarize recent methodological advances and software developments that extend MR to settings with many weak or invalid instruments and to modern high-dimensional omics data. Real-data applications, including UK Biobank and Alzheimer's disease proteomics studies, illustrate the use of these methods in practice. This review aims to serve as a tutorial-style reference for both methodologists and applied scientists.

孟德尔随机化(MR)已成为生物医学和公共卫生研究中因果推理的重要工具。通过使用遗传变异作为工具变量,MR有助于解决无法测量的混杂和反向因果关系,提供了一个准实验框架来评估可改变暴露对健康结果的因果影响。尽管MR前景光明,但它面临着方法论上的重大挑战,包括无效的仪器、弱仪器偏差和跨不同数据结构的设计复杂性。在本教程回顾中,我们的目标是提供因果推理的MR方法的系统概述,强调因果解释的清晰度,研究设计比较,软件工具的可用性,以及应用科学家的实践指导。我们围绕因果估计组织审查,确保分析锚定在定义明确的因果问题上。我们讨论了无效和弱仪器的问题,比较了它们的检测和纠正的可用策略。我们整合了基于人群与基于家庭的MR设计的讨论,基于个人水平与汇总水平数据的分析,单样本与双样本MR设计,突出了它们的相对优势和局限性。我们还总结了最近的方法进步和软件开发,将MR扩展到具有许多薄弱或无效仪器的设置以及现代高维组学数据。包括英国生物银行和阿尔茨海默病蛋白质组学研究在内的实际数据应用说明了这些方法在实践中的应用。这篇综述的目的是为方法学家和应用科学家提供一个教程式的参考。
{"title":"Mendelian Randomization Methods for Causal Inference: Estimands, Identification and Inference.","authors":"Minhao Yao, Anqi Wang, Xihao Li, Zhonghua Liu","doi":"10.1002/sim.70394","DOIUrl":"10.1002/sim.70394","url":null,"abstract":"<p><p>Mendelian randomization (MR) has become an essential tool for causal inference in biomedical and public health research. By using genetic variants as instrumental variables, MR helps address unmeasured confounding and reverse causation, offering a quasi-experimental framework to evaluate causal effects of modifiable exposures on health outcomes. Despite its promise, MR faces substantial methodological challenges, including invalid instruments, weak instrument bias, and design complexities across different data structures. In this tutorial review, we aim to provide a systematic overview of MR methods for causal inference, emphasizing clarity of causal interpretation, study design comparisons, availability of software tools, and practical guidance for applied scientists. We organize the review around causal estimands, ensuring that analyses are anchored to well-defined causal questions. We discuss the problems of invalid and weak instruments, comparing available strategies for their detection and correction. We integrate discussions of population-based versus family-based MR designs, analyses based on individual-level versus summary-level data, and one-sample versus two-sample MR designs, highlighting their relative advantages and limitations. We also summarize recent methodological advances and software developments that extend MR to settings with many weak or invalid instruments and to modern high-dimensional omics data. Real-data applications, including UK Biobank and Alzheimer's disease proteomics studies, illustrate the use of these methods in practice. This review aims to serve as a tutorial-style reference for both methodologists and applied scientists.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70394"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extension of Bootstrap MARS With Group LASSO for Heterogeneous Treatment Effect Estimation. 基于群LASSO的Bootstrap MARS在异质处理效果估计中的推广。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70370
Guanwenqing He, Ke Wan, Toshio Shimokawa, Kazushi Maruo

In recent years, clinical data obtained from patient surveys and medical records have become increasingly pivotal in medical data science. These clinical data, collectively referred to as "real-world data (RWD)," are anticipated to play a key role in observational studies of specific diseases and in advancing personalized or precision medicine by identifying effective treatments for particular patient subgroups. Consequently, the estimation of heterogeneous treatment effects (HTEs) using RWD has garnered substantial attention. HTE estimation meaningfully contributes to precision medicine by enabling clinicians to make informed treatment decisions tailored to individual patient characteristics. Various treatment effect models for observational studies highlight the robust performance of bagging causal multivariate adaptive regression splines (MARS) (BCM). However, despite the notable efficacy of BCM, there remains potential for refinement. Here, we introduce a novel treatment effect model, the shrinkage causal bootstrap MARS method, which builds upon the following framework: initially, basis functions are estimated using transformed outcome bootstrap sampling MARS, followed by optimization of the model and parameter estimation via the group least absolute shrinkage and selection operator (LASSO) method. Our simulations demonstrate that the proposed method achieves improved mean square error and bias across most scenarios. Additionally, we validate the practical applicability of the method by implementing it on the ACTG 175 dataset.

近年来,从患者调查和医疗记录中获得的临床数据在医疗数据科学中变得越来越重要。这些临床数据统称为“真实世界数据(RWD)”,预计将在特定疾病的观察性研究中发挥关键作用,并通过确定针对特定患者亚群的有效治疗方法来推进个性化或精准医疗。因此,使用RWD估计非均质处理效果(HTEs)已经引起了大量关注。通过使临床医生能够根据患者的个体特征做出明智的治疗决策,HTE估计对精准医疗有意义。观察性研究的各种治疗效果模型强调了套袋因果多变量自适应回归样条(MARS) (BCM)的稳健性能。然而,尽管BCM的疗效显著,但仍有改进的潜力。在这里,我们介绍了一种新的治疗效果模型,即收缩因果bootstrap MARS方法,该方法建立在以下框架之上:首先,使用转换结果bootstrap抽样MARS估计基函数,然后通过群体最小绝对收缩和选择算子(LASSO)方法优化模型和参数估计。仿真结果表明,该方法在大多数情况下均方误差和偏差都得到了改善。此外,我们通过在ACTG 175数据集上实现该方法来验证该方法的实际适用性。
{"title":"Extension of Bootstrap MARS With Group LASSO for Heterogeneous Treatment Effect Estimation.","authors":"Guanwenqing He, Ke Wan, Toshio Shimokawa, Kazushi Maruo","doi":"10.1002/sim.70370","DOIUrl":"https://doi.org/10.1002/sim.70370","url":null,"abstract":"<p><p>In recent years, clinical data obtained from patient surveys and medical records have become increasingly pivotal in medical data science. These clinical data, collectively referred to as \"real-world data (RWD),\" are anticipated to play a key role in observational studies of specific diseases and in advancing personalized or precision medicine by identifying effective treatments for particular patient subgroups. Consequently, the estimation of heterogeneous treatment effects (HTEs) using RWD has garnered substantial attention. HTE estimation meaningfully contributes to precision medicine by enabling clinicians to make informed treatment decisions tailored to individual patient characteristics. Various treatment effect models for observational studies highlight the robust performance of bagging causal multivariate adaptive regression splines (MARS) (BCM). However, despite the notable efficacy of BCM, there remains potential for refinement. Here, we introduce a novel treatment effect model, the shrinkage causal bootstrap MARS method, which builds upon the following framework: initially, basis functions are estimated using transformed outcome bootstrap sampling MARS, followed by optimization of the model and parameter estimation via the group least absolute shrinkage and selection operator (LASSO) method. Our simulations demonstrate that the proposed method achieves improved mean square error and bias across most scenarios. Additionally, we validate the practical applicability of the method by implementing it on the ACTG 175 dataset.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70370"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Biomarker-Based Dose-Schedule Optimization Design for Immunotherapy Trials. 基于生物标志物的免疫治疗试验剂量方案优化设计
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70357
Yingjie Qiu, Yan Han, Beibei Guo

In immunotherapy, both the dose and the schedule of drug administration can significantly influence therapeutic effects by modulating immune system activation. Incorporating immune response measures into clinical trial designs offers an opportunity to enhance decision-making by leveraging their close association with therapeutic efficacy and toxicity. Motivated by settings where biomarker data indicate improved efficacy in biomarker-positive patients, we propose a dose-schedule optimization strategy tailored to each biomarker-defined subgroup, based on elicited utility functions that capture risk-benefit tradeoffs. We introduce a joint modeling framework that simultaneously evaluates immune response, toxicity, and efficacy, enabling information sharing across outcome types and patient subgroups. Our approach utilizes parsimonious yet flexible models designed specifically to address challenges due to small sample sizes commonly encountered in early-phase trials. Simulation studies demonstrate that the proposed design achieves desirable operating characteristics and effectively informs dose-schedule optimization.

在免疫治疗中,给药剂量和给药时间表都可以通过调节免疫系统的激活来显著影响治疗效果。将免疫反应措施纳入临床试验设计提供了一个机会,通过利用它们与治疗疗效和毒性的密切联系来加强决策。在生物标志物数据表明生物标志物阳性患者的疗效得到改善的情况下,我们提出了一种针对每个生物标志物定义的亚组量身定制的剂量计划优化策略,该策略基于捕获风险-收益权衡的诱导效用函数。我们引入了一个联合建模框架,可以同时评估免疫反应、毒性和疗效,使结果类型和患者亚组之间的信息共享成为可能。我们的方法利用简洁而灵活的模型,专门设计用于解决早期试验中常见的小样本量所带来的挑战。仿真研究表明,所提出的设计达到了理想的工作特性,并有效地指导了剂量计划的优化。
{"title":"A Biomarker-Based Dose-Schedule Optimization Design for Immunotherapy Trials.","authors":"Yingjie Qiu, Yan Han, Beibei Guo","doi":"10.1002/sim.70357","DOIUrl":"10.1002/sim.70357","url":null,"abstract":"<p><p>In immunotherapy, both the dose and the schedule of drug administration can significantly influence therapeutic effects by modulating immune system activation. Incorporating immune response measures into clinical trial designs offers an opportunity to enhance decision-making by leveraging their close association with therapeutic efficacy and toxicity. Motivated by settings where biomarker data indicate improved efficacy in biomarker-positive patients, we propose a dose-schedule optimization strategy tailored to each biomarker-defined subgroup, based on elicited utility functions that capture risk-benefit tradeoffs. We introduce a joint modeling framework that simultaneously evaluates immune response, toxicity, and efficacy, enabling information sharing across outcome types and patient subgroups. Our approach utilizes parsimonious yet flexible models designed specifically to address challenges due to small sample sizes commonly encountered in early-phase trials. Simulation studies demonstrate that the proposed design achieves desirable operating characteristics and effectively informs dose-schedule optimization.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70357"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Rank Information for Robust Regression Analysis: A Nomination Sampling Approach. 利用秩信息进行稳健回归分析:一种提名抽样方法。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70362
Neve Loewen, Mohammad Jafari Jozani

This paper introduces a novel methodology for robust regression analysis when traditional mean regression falls short due to the presence of outliers. Unlike conventional approaches that rely on simple random sampling (SRS), our methodology leverages median nomination sampling (MedNS) by utilizing readily available ranking information to obtain training data that more accurately captures the central tendency of the underlying population, thereby enhancing the representativeness of the sample in the presence of extensive outliers in the population. We propose a new loss function that integrates the extra rank information of MedNS data during the training phase of model fitting, thus offering a form of robust regression. Further, we provide an alternative approach that translates the median regression estimation using MedNS to corresponding problems under SRS. Through simulation studies, including a high-dimensional and a nonlinear regression setting, we evaluate the efficacy of our proposed approach compared to its SRS counterpart by comparing the integrated mean squared error of regression estimates. We observe that our proposed method provides higher relative efficiency (RE) compared to its SRS counterparts. Lastly, the proposed methods are applied to a real data set collected for body fat analysis in adults.

本文介绍了一种新的鲁棒回归分析方法,当传统的均值回归由于异常值的存在而无法进行鲁棒回归分析时。与依赖简单随机抽样(SRS)的传统方法不同,我们的方法利用中位数提名抽样(MedNS),利用现成的排名信息来获得训练数据,更准确地捕捉潜在群体的集中趋势,从而在群体中存在大量异常值的情况下增强样本的代表性。我们提出了一种新的损失函数,该函数在模型拟合的训练阶段集成了MedNS数据的额外秩信息,从而提供了一种鲁棒回归形式。此外,我们提供了一种替代方法,将使用MedNS的中位数回归估计转换为SRS下的相应问题。通过模拟研究,包括高维和非线性回归设置,我们通过比较回归估计的综合均方误差来评估我们提出的方法与SRS方法相比的有效性。我们发现,与SRS方法相比,我们提出的方法提供了更高的相对效率(RE)。最后,将所提出的方法应用于成人体脂分析所收集的真实数据集。
{"title":"Leveraging Rank Information for Robust Regression Analysis: A Nomination Sampling Approach.","authors":"Neve Loewen, Mohammad Jafari Jozani","doi":"10.1002/sim.70362","DOIUrl":"10.1002/sim.70362","url":null,"abstract":"<p><p>This paper introduces a novel methodology for robust regression analysis when traditional mean regression falls short due to the presence of outliers. Unlike conventional approaches that rely on simple random sampling (SRS), our methodology leverages median nomination sampling (MedNS) by utilizing readily available ranking information to obtain training data that more accurately captures the central tendency of the underlying population, thereby enhancing the representativeness of the sample in the presence of extensive outliers in the population. We propose a new loss function that integrates the extra rank information of MedNS data during the training phase of model fitting, thus offering a form of robust regression. Further, we provide an alternative approach that translates the median regression estimation using MedNS to corresponding problems under SRS. Through simulation studies, including a high-dimensional and a nonlinear regression setting, we evaluate the efficacy of our proposed approach compared to its SRS counterpart by comparing the integrated mean squared error of regression estimates. We observe that our proposed method provides higher relative efficiency (RE) compared to its SRS counterparts. Lastly, the proposed methods are applied to a real data set collected for body fat analysis in adults.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70362"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mendelian Randomization With Longitudinal Exposure Data: Simulation Study and Real Data Application. 纵向暴露数据的孟德尔随机化:模拟研究和实际数据应用。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 DOI: 10.1002/sim.70378
Janne Pott, Marco Palma, Yi Liu, Jasmine A Mack, Ulla Sovio, Gordon C S Smith, Jessica Barrett, Stephen Burgess

Background and aim: Mendelian randomization (MR) is a widely used tool to estimate causal effects using genetic variants as instrumental variables. MR is limited to cross-sectional summary statistics of different samples and time points to analyze time-varying effects. We aimed at using longitudinal summary statistics for an exposure in a multivariable MR setting and validating the effect estimates for the mean, slope, and within-individual variability.

Simulation study: We tested our approach in 12 scenarios for power and type I error, depending on shared instruments between the mean, slope, and variability, and regression model specifications. We observed high power to detect causal effects of the mean and slope throughout the simulation, but the variability effect was low powered in the case of shared SNPs between the mean and variability. Mis-specified regression models led to lower power and increased the type I error.

Real data application: We applied our approach to two real data sets (POPS, UK Biobank). We detected significant causal estimates for both the mean and the slope in both cases, but no independent effect of the variability. However, we only had weak instruments in both data sets.

Conclusion: We used a new approach to test a time-varying exposure for causal effects of the exposure's mean, slope and variability. The simulation with strong instruments seems promising but also highlights three crucial points: (1) The difficulty to define the correct exposure regression model, (2) the dependency on the genetic correlation, and (3) the lack of strong instruments in real data. Taken together, this demands a cautious evaluation of the results, accounting for known biology and the trajectory of the exposure.

背景与目的:孟德尔随机化(MR)是一种广泛使用的工具,以遗传变异作为工具变量来估计因果关系。MR仅限于对不同样本和时间点的横截面汇总统计来分析时变效应。我们的目的是对多变量MR环境下的暴露使用纵向汇总统计,并验证平均、斜率和个体内变异性的影响估计。模拟研究:我们根据平均值、斜率、可变性和回归模型规格之间的共享工具,在12种情况下测试了我们的方法的功率和I型误差。在整个模拟过程中,我们观察到均值和斜率的因果效应的检测功率很高,但在均值和变异性之间共享snp的情况下,变异性效应的检测功率很低。错误指定的回归模型导致较低的功率并增加了I型误差。真实数据应用:我们将我们的方法应用于两个真实数据集(POPS, UK Biobank)。在这两种情况下,我们都发现了均值和斜率的显著因果估计,但没有可变性的独立影响。然而,在这两个数据集中,我们只有较弱的仪器。结论:我们采用了一种新的方法来检验时变暴露对暴露的平均值、斜率和变异性的因果影响。使用强仪器的模拟似乎很有希望,但也突出了三个关键点:(1)难以定义正确的暴露回归模型;(2)对遗传相关性的依赖;(3)在实际数据中缺乏强仪器。综上所述,这需要对结果进行谨慎的评估,考虑到已知的生物学和暴露的轨迹。
{"title":"Mendelian Randomization With Longitudinal Exposure Data: Simulation Study and Real Data Application.","authors":"Janne Pott, Marco Palma, Yi Liu, Jasmine A Mack, Ulla Sovio, Gordon C S Smith, Jessica Barrett, Stephen Burgess","doi":"10.1002/sim.70378","DOIUrl":"10.1002/sim.70378","url":null,"abstract":"<p><strong>Background and aim: </strong>Mendelian randomization (MR) is a widely used tool to estimate causal effects using genetic variants as instrumental variables. MR is limited to cross-sectional summary statistics of different samples and time points to analyze time-varying effects. We aimed at using longitudinal summary statistics for an exposure in a multivariable MR setting and validating the effect estimates for the mean, slope, and within-individual variability.</p><p><strong>Simulation study: </strong>We tested our approach in 12 scenarios for power and type I error, depending on shared instruments between the mean, slope, and variability, and regression model specifications. We observed high power to detect causal effects of the mean and slope throughout the simulation, but the variability effect was low powered in the case of shared SNPs between the mean and variability. Mis-specified regression models led to lower power and increased the type I error.</p><p><strong>Real data application: </strong>We applied our approach to two real data sets (POPS, UK Biobank). We detected significant causal estimates for both the mean and the slope in both cases, but no independent effect of the variability. However, we only had weak instruments in both data sets.</p><p><strong>Conclusion: </strong>We used a new approach to test a time-varying exposure for causal effects of the exposure's mean, slope and variability. The simulation with strong instruments seems promising but also highlights three crucial points: (1) The difficulty to define the correct exposure regression model, (2) the dependency on the genetic correlation, and (3) the lack of strong instruments in real data. Taken together, this demands a cautious evaluation of the results, accounting for known biology and the trajectory of the exposure.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70378"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12824831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1