In immunotherapy, both the dose and the schedule of drug administration can significantly influence therapeutic effects by modulating immune system activation. Incorporating immune response measures into clinical trial designs offers an opportunity to enhance decision-making by leveraging their close association with therapeutic efficacy and toxicity. Motivated by settings where biomarker data indicate improved efficacy in biomarker-positive patients, we propose a dose-schedule optimization strategy tailored to each biomarker-defined subgroup, based on elicited utility functions that capture risk-benefit tradeoffs. We introduce a joint modeling framework that simultaneously evaluates immune response, toxicity, and efficacy, enabling information sharing across outcome types and patient subgroups. Our approach utilizes parsimonious yet flexible models designed specifically to address challenges due to small sample sizes commonly encountered in early-phase trials. Simulation studies demonstrate that the proposed design achieves desirable operating characteristics and effectively informs dose-schedule optimization.
{"title":"A Biomarker-Based Dose-Schedule Optimization Design for Immunotherapy Trials.","authors":"Yingjie Qiu, Yan Han, Beibei Guo","doi":"10.1002/sim.70357","DOIUrl":"10.1002/sim.70357","url":null,"abstract":"<p><p>In immunotherapy, both the dose and the schedule of drug administration can significantly influence therapeutic effects by modulating immune system activation. Incorporating immune response measures into clinical trial designs offers an opportunity to enhance decision-making by leveraging their close association with therapeutic efficacy and toxicity. Motivated by settings where biomarker data indicate improved efficacy in biomarker-positive patients, we propose a dose-schedule optimization strategy tailored to each biomarker-defined subgroup, based on elicited utility functions that capture risk-benefit tradeoffs. We introduce a joint modeling framework that simultaneously evaluates immune response, toxicity, and efficacy, enabling information sharing across outcome types and patient subgroups. Our approach utilizes parsimonious yet flexible models designed specifically to address challenges due to small sample sizes commonly encountered in early-phase trials. Simulation studies demonstrate that the proposed design achieves desirable operating characteristics and effectively informs dose-schedule optimization.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70357"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a novel methodology for robust regression analysis when traditional mean regression falls short due to the presence of outliers. Unlike conventional approaches that rely on simple random sampling (SRS), our methodology leverages median nomination sampling (MedNS) by utilizing readily available ranking information to obtain training data that more accurately captures the central tendency of the underlying population, thereby enhancing the representativeness of the sample in the presence of extensive outliers in the population. We propose a new loss function that integrates the extra rank information of MedNS data during the training phase of model fitting, thus offering a form of robust regression. Further, we provide an alternative approach that translates the median regression estimation using MedNS to corresponding problems under SRS. Through simulation studies, including a high-dimensional and a nonlinear regression setting, we evaluate the efficacy of our proposed approach compared to its SRS counterpart by comparing the integrated mean squared error of regression estimates. We observe that our proposed method provides higher relative efficiency (RE) compared to its SRS counterparts. Lastly, the proposed methods are applied to a real data set collected for body fat analysis in adults.
{"title":"Leveraging Rank Information for Robust Regression Analysis: A Nomination Sampling Approach.","authors":"Neve Loewen, Mohammad Jafari Jozani","doi":"10.1002/sim.70362","DOIUrl":"10.1002/sim.70362","url":null,"abstract":"<p><p>This paper introduces a novel methodology for robust regression analysis when traditional mean regression falls short due to the presence of outliers. Unlike conventional approaches that rely on simple random sampling (SRS), our methodology leverages median nomination sampling (MedNS) by utilizing readily available ranking information to obtain training data that more accurately captures the central tendency of the underlying population, thereby enhancing the representativeness of the sample in the presence of extensive outliers in the population. We propose a new loss function that integrates the extra rank information of MedNS data during the training phase of model fitting, thus offering a form of robust regression. Further, we provide an alternative approach that translates the median regression estimation using MedNS to corresponding problems under SRS. Through simulation studies, including a high-dimensional and a nonlinear regression setting, we evaluate the efficacy of our proposed approach compared to its SRS counterpart by comparing the integrated mean squared error of regression estimates. We observe that our proposed method provides higher relative efficiency (RE) compared to its SRS counterparts. Lastly, the proposed methods are applied to a real data set collected for body fat analysis in adults.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70362"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janne Pott, Marco Palma, Yi Liu, Jasmine A Mack, Ulla Sovio, Gordon C S Smith, Jessica Barrett, Stephen Burgess
Background and aim: Mendelian randomization (MR) is a widely used tool to estimate causal effects using genetic variants as instrumental variables. MR is limited to cross-sectional summary statistics of different samples and time points to analyze time-varying effects. We aimed at using longitudinal summary statistics for an exposure in a multivariable MR setting and validating the effect estimates for the mean, slope, and within-individual variability.
Simulation study: We tested our approach in 12 scenarios for power and type I error, depending on shared instruments between the mean, slope, and variability, and regression model specifications. We observed high power to detect causal effects of the mean and slope throughout the simulation, but the variability effect was low powered in the case of shared SNPs between the mean and variability. Mis-specified regression models led to lower power and increased the type I error.
Real data application: We applied our approach to two real data sets (POPS, UK Biobank). We detected significant causal estimates for both the mean and the slope in both cases, but no independent effect of the variability. However, we only had weak instruments in both data sets.
Conclusion: We used a new approach to test a time-varying exposure for causal effects of the exposure's mean, slope and variability. The simulation with strong instruments seems promising but also highlights three crucial points: (1) The difficulty to define the correct exposure regression model, (2) the dependency on the genetic correlation, and (3) the lack of strong instruments in real data. Taken together, this demands a cautious evaluation of the results, accounting for known biology and the trajectory of the exposure.
背景与目的:孟德尔随机化(MR)是一种广泛使用的工具,以遗传变异作为工具变量来估计因果关系。MR仅限于对不同样本和时间点的横截面汇总统计来分析时变效应。我们的目的是对多变量MR环境下的暴露使用纵向汇总统计,并验证平均、斜率和个体内变异性的影响估计。模拟研究:我们根据平均值、斜率、可变性和回归模型规格之间的共享工具,在12种情况下测试了我们的方法的功率和I型误差。在整个模拟过程中,我们观察到均值和斜率的因果效应的检测功率很高,但在均值和变异性之间共享snp的情况下,变异性效应的检测功率很低。错误指定的回归模型导致较低的功率并增加了I型误差。真实数据应用:我们将我们的方法应用于两个真实数据集(POPS, UK Biobank)。在这两种情况下,我们都发现了均值和斜率的显著因果估计,但没有可变性的独立影响。然而,在这两个数据集中,我们只有较弱的仪器。结论:我们采用了一种新的方法来检验时变暴露对暴露的平均值、斜率和变异性的因果影响。使用强仪器的模拟似乎很有希望,但也突出了三个关键点:(1)难以定义正确的暴露回归模型;(2)对遗传相关性的依赖;(3)在实际数据中缺乏强仪器。综上所述,这需要对结果进行谨慎的评估,考虑到已知的生物学和暴露的轨迹。
{"title":"Mendelian Randomization With Longitudinal Exposure Data: Simulation Study and Real Data Application.","authors":"Janne Pott, Marco Palma, Yi Liu, Jasmine A Mack, Ulla Sovio, Gordon C S Smith, Jessica Barrett, Stephen Burgess","doi":"10.1002/sim.70378","DOIUrl":"10.1002/sim.70378","url":null,"abstract":"<p><strong>Background and aim: </strong>Mendelian randomization (MR) is a widely used tool to estimate causal effects using genetic variants as instrumental variables. MR is limited to cross-sectional summary statistics of different samples and time points to analyze time-varying effects. We aimed at using longitudinal summary statistics for an exposure in a multivariable MR setting and validating the effect estimates for the mean, slope, and within-individual variability.</p><p><strong>Simulation study: </strong>We tested our approach in 12 scenarios for power and type I error, depending on shared instruments between the mean, slope, and variability, and regression model specifications. We observed high power to detect causal effects of the mean and slope throughout the simulation, but the variability effect was low powered in the case of shared SNPs between the mean and variability. Mis-specified regression models led to lower power and increased the type I error.</p><p><strong>Real data application: </strong>We applied our approach to two real data sets (POPS, UK Biobank). We detected significant causal estimates for both the mean and the slope in both cases, but no independent effect of the variability. However, we only had weak instruments in both data sets.</p><p><strong>Conclusion: </strong>We used a new approach to test a time-varying exposure for causal effects of the exposure's mean, slope and variability. The simulation with strong instruments seems promising but also highlights three crucial points: (1) The difficulty to define the correct exposure regression model, (2) the dependency on the genetic correlation, and (3) the lack of strong instruments in real data. Taken together, this demands a cautious evaluation of the results, accounting for known biology and the trajectory of the exposure.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70378"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12824831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benmei Liu, Hyune-Ju Kim, Joe Zou, Eric J Feuer, Barry I Graubard
Joinpoint regression can model trends in time-specific aggregated estimates. These methods have been developed mainly for non-survey data such as cancer registry data, and only recently have been extended to utilize survey data that accounts for complex sample designs resulting in non-zero correlation between the time-specific estimates. This correlation can occur for surveys with data from the same sampled units used across time points, for example, the annual National Health Interview Survey with multistage cluster samples using the same first-stage sampled clusters over consecutive time points. Another issue when modeling aggregated data is that the degrees of freedom for joinpoint analyses of multistage cluster samples are based on the number of time points, not the number of first-stage sampled clusters as used in survey methods. To address this, we propose models of individual-level data that incorporate both the correlation between time points and correct the degrees of freedom due to the sampling design that is needed for accurate inferences. Also, a modified design-based Akaike Information Criterion (M-dAIC) for model selection is proposed to account for complex sample designs. These new methods are empirically compared to existing methods using simulation studies and health survey data examples. The simulation studies indicated that this new individual-level model identified the true number of joinpoints more accurately than the established aggregate-level models for data collected using complex survey designs with moderate to large interclass correlation coefficients (ICC).
{"title":"Extended Joinpoint Regression Methodology for Complex Survey Data.","authors":"Benmei Liu, Hyune-Ju Kim, Joe Zou, Eric J Feuer, Barry I Graubard","doi":"10.1002/sim.70374","DOIUrl":"10.1002/sim.70374","url":null,"abstract":"<p><p>Joinpoint regression can model trends in time-specific aggregated estimates. These methods have been developed mainly for non-survey data such as cancer registry data, and only recently have been extended to utilize survey data that accounts for complex sample designs resulting in non-zero correlation between the time-specific estimates. This correlation can occur for surveys with data from the same sampled units used across time points, for example, the annual National Health Interview Survey with multistage cluster samples using the same first-stage sampled clusters over consecutive time points. Another issue when modeling aggregated data is that the degrees of freedom for joinpoint analyses of multistage cluster samples are based on the number of time points, not the number of first-stage sampled clusters as used in survey methods. To address this, we propose models of individual-level data that incorporate both the correlation between time points and correct the degrees of freedom due to the sampling design that is needed for accurate inferences. Also, a modified design-based Akaike Information Criterion (M-dAIC) for model selection is proposed to account for complex sample designs. These new methods are empirically compared to existing methods using simulation studies and health survey data examples. The simulation studies indicated that this new individual-level model identified the true number of joinpoints more accurately than the established aggregate-level models for data collected using complex survey designs with moderate to large interclass correlation coefficients (ICC).</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70374"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828251/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The objective of this study is to perform variable selection and parameter estimation for analyzing partly interval-censored data based on a proportional hazards model that incorporates spatial effects. To broaden the model's applicability across diverse scenarios, we consider two types of spatial structures: adjacency and distance information. Leveraging the differentiable properties of the -ball prior developed through projection-based methods, we have devised an efficient Bayesian algorithm by introducing latent variables and applying stochastic gradient Langevin dynamics principles. This algorithm can rapidly deliver results without resorting to complex sampling steps. Through simulations encompassing various scenarios, we have validated the performance of this method in both variable selection and parameter estimation. In our real data application, the proposed approach selects important variables associated with the emergence time of permanent teeth. Additionally, it identifies the spatial structure that best fits these data characteristics. This selection and identification are based on two Bayesian model selection criteria: the log pseudo-marginal likelihood and the deviance information criterion.
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Bayesian Variable Selection With <ns0:math> <ns0:semantics> <ns0:mrow> <ns0:msub><ns0:mrow><ns0:mi>l</ns0:mi></ns0:mrow> <ns0:mrow><ns0:mn>1</ns0:mn></ns0:mrow> </ns0:msub> </ns0:mrow> <ns0:annotation>$$ {l}_1 $$</ns0:annotation></ns0:semantics> </ns0:math> -Ball for Spatially Partly Interval-Censored Data.","authors":"Mingyue Qiu, Lianming Wang, Qingning Zhou, Tao Hu","doi":"10.1002/sim.70369","DOIUrl":"10.1002/sim.70369","url":null,"abstract":"<p><p>The objective of this study is to perform variable selection and parameter estimation for analyzing partly interval-censored data based on a proportional hazards model that incorporates spatial effects. To broaden the model's applicability across diverse scenarios, we consider two types of spatial structures: adjacency and distance information. Leveraging the differentiable properties of the <math> <semantics> <mrow> <msub><mrow><mi>l</mi></mrow> <mrow><mn>1</mn></mrow> </msub> </mrow> <annotation>$$ {l}_1 $$</annotation></semantics> </math> -ball prior developed through projection-based methods, we have devised an efficient Bayesian algorithm by introducing latent variables and applying stochastic gradient Langevin dynamics principles. This algorithm can rapidly deliver results without resorting to complex sampling steps. Through simulations encompassing various scenarios, we have validated the performance of this method in both variable selection and parameter estimation. In our real data application, the proposed approach selects important variables associated with the emergence time of permanent teeth. Additionally, it identifies the spatial structure that best fits these data characteristics. This selection and identification are based on two Bayesian model selection criteria: the log pseudo-marginal likelihood and the deviance information criterion.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70369"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cluster randomized controlled trials (CRCTs) are commonly used when interventions are delivered at the group level. Since data from CTCTs are inherently multilevel, methods that properly account for clustering are required. Joint modeling (JM) of longitudinal and survival data allows for simultaneous evaluation of intervention effects on repeated measures and time-to-event outcomes, offering a comprehensive view of intervention effects. However, existing JMs do not accommodate clustered data structures typically of CRCTs. This study introduces a multilevel joint model (MJM) to simultaneously evaluate intervention effects on correlated longitudinal and survival outcomes. The model was applied to empirical data from a large CRCT evaluating the PAX Good Behavior Game, a classroom-based mental health intervention involving 4189 Grade 1 students across 313 classrooms during the 2011-2012 school year. Mental health was assessed at three time points: pre-PAX (January 2012), post-PAX (June 2012), and Grade 5 (June 2016). Time-to-first mental disorder diagnosis was tracked through March 2024. Simulation studies further evaluated the MJM's performance under varying conditions, including censoring rates, cluster sizes, group-level variances, and survival model specifications. Results indicated the PAX program significantly improved mental health trajectories and reduced the risk of mental disorder diagnoses. The MJM outperformed traditional JMs by producing more accurate estimates and standard errors. Both empirical and simulation findings demonstrated that ignoring hierarchical structures leads to biased inferences and underestimation of intervention effects. The proposed MJM offers a robust and flexible analytic framework for analyzing data from CRCTs, emphasizing the importance of accounting for clustering in evaluating group-based interventions.
{"title":"A Bayesian Multilevel Joint Modeling of Longitudinal and Survival Outcomes in Cluster Randomized Controlled Trial Studies.","authors":"Yixiu Liu, Depeng Jiang, Mahmoud Torabi, Xuekui Zhang","doi":"10.1002/sim.70385","DOIUrl":"10.1002/sim.70385","url":null,"abstract":"<p><p>Cluster randomized controlled trials (CRCTs) are commonly used when interventions are delivered at the group level. Since data from CTCTs are inherently multilevel, methods that properly account for clustering are required. Joint modeling (JM) of longitudinal and survival data allows for simultaneous evaluation of intervention effects on repeated measures and time-to-event outcomes, offering a comprehensive view of intervention effects. However, existing JMs do not accommodate clustered data structures typically of CRCTs. This study introduces a multilevel joint model (MJM) to simultaneously evaluate intervention effects on correlated longitudinal and survival outcomes. The model was applied to empirical data from a large CRCT evaluating the PAX Good Behavior Game, a classroom-based mental health intervention involving 4189 Grade 1 students across 313 classrooms during the 2011-2012 school year. Mental health was assessed at three time points: pre-PAX (January 2012), post-PAX (June 2012), and Grade 5 (June 2016). Time-to-first mental disorder diagnosis was tracked through March 2024. Simulation studies further evaluated the MJM's performance under varying conditions, including censoring rates, cluster sizes, group-level variances, and survival model specifications. Results indicated the PAX program significantly improved mental health trajectories and reduced the risk of mental disorder diagnoses. The MJM outperformed traditional JMs by producing more accurate estimates and standard errors. Both empirical and simulation findings demonstrated that ignoring hierarchical structures leads to biased inferences and underestimation of intervention effects. The proposed MJM offers a robust and flexible analytic framework for analyzing data from CRCTs, emphasizing the importance of accounting for clustering in evaluating group-based interventions.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70385"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12824832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Shandross, Emily Howerton, Lucie Contamin, Harry Hochheiser, Anna Krystalli, Nicholas G Reich, Evan L Ray
Combining predictions from multiple models into an ensemble is a widely used practice across many fields with demonstrated performance benefits. Popularized through domains such as weather forecasting and climate modeling, multi-model ensembles are becoming increasingly common in public health and biological applications. For example, multi-model outbreak forecasting provides more accurate and reliable information about the timing and burden of infectious disease outbreaks to public health officials and medical practitioners. Yet, understanding and interpreting multi-model ensemble results can be difficult, as there are a diversity of methods proposed in the literature with no clear consensus on which is best. Moreover, a lack of standard, easy-to-use software implementations impedes the generation of multi-model ensembles in practice. To address these challenges, we provide an introduction to the statistical foundations of applied probabilistic forecasting, including the role of multi-model ensembles. We introduce the hubEnsembles package, a flexible framework for ensembling various types of predictions using a range of methods. Finally, we present a tutorial and case-study of ensemble methods using the hubEnsembles package on a subset of real, publicly available data from the FluSight Forecast Hub.
{"title":"Multi-Model Ensembles in Infectious Disease and Public Health: Methods, Interpretation, and Implementation in R.","authors":"Li Shandross, Emily Howerton, Lucie Contamin, Harry Hochheiser, Anna Krystalli, Nicholas G Reich, Evan L Ray","doi":"10.1002/sim.70333","DOIUrl":"10.1002/sim.70333","url":null,"abstract":"<p><p>Combining predictions from multiple models into an ensemble is a widely used practice across many fields with demonstrated performance benefits. Popularized through domains such as weather forecasting and climate modeling, multi-model ensembles are becoming increasingly common in public health and biological applications. For example, multi-model outbreak forecasting provides more accurate and reliable information about the timing and burden of infectious disease outbreaks to public health officials and medical practitioners. Yet, understanding and interpreting multi-model ensemble results can be difficult, as there are a diversity of methods proposed in the literature with no clear consensus on which is best. Moreover, a lack of standard, easy-to-use software implementations impedes the generation of multi-model ensembles in practice. To address these challenges, we provide an introduction to the statistical foundations of applied probabilistic forecasting, including the role of multi-model ensembles. We introduce the hubEnsembles package, a flexible framework for ensembling various types of predictions using a range of methods. Finally, we present a tutorial and case-study of ensemble methods using the hubEnsembles package on a subset of real, publicly available data from the FluSight Forecast Hub.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70333"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integrated analysis of multi-omics datasets holds great promise for uncovering complex biological processes. However, the large dimensionality of omics data poses significant interpretability and multiple testing challenges. Simultaneous enrichment analysis (SEA) was introduced to address these issues in single-omics analysis, providing an in-built multiple testing correction and enabling simultaneous feature set testing. In this article, we introduce OCEAN, an extension of SEA to multi-omics data. OCEAN is a flexible approach to analyze potentially all possible two-way feature sets from any pair of genomics datasets. We also propose two new error rates which are in line with the two-way structure of the data and facilitate interpretation of the results. The power and utility of OCEAN are demonstrated by analyzing copy number and gene expression data for breast and colon cancer.
{"title":"Multiple Testing of Mix-and-Match Feature Sets in Multi-Omics.","authors":"Mitra Ebrahimpoor, Renée Menezes, Ningning Xu, Jelle J Goeman","doi":"10.1002/sim.70367","DOIUrl":"10.1002/sim.70367","url":null,"abstract":"<p><p>Integrated analysis of multi-omics datasets holds great promise for uncovering complex biological processes. However, the large dimensionality of omics data poses significant interpretability and multiple testing challenges. Simultaneous enrichment analysis (SEA) was introduced to address these issues in single-omics analysis, providing an in-built multiple testing correction and enabling simultaneous feature set testing. In this article, we introduce OCEAN, an extension of SEA to multi-omics data. OCEAN is a flexible approach to analyze potentially all possible two-way feature sets from any pair of genomics datasets. We also propose two new error rates which are in line with the two-way structure of the data and facilitate interpretation of the results. The power and utility of OCEAN are demonstrated by analyzing copy number and gene expression data for breast and colon cancer.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70367"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12825407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Platform trials enable the evaluation of multiple investigational drugs for a single disease and offer flexibility in adding or dropping treatments during the trial. This design would be advantageous for reducing the sample size and drug development time, particularly in contexts such as pandemics. In the platform trials, non-concurrent controls (NCCs) are often used for drug-control comparisons, but temporal shifts in subject characteristics, trial conduct, or standard of care can introduce bias in the estimation of treatment effects and increase the type I error rate. In this study, we develop a new Bayesian power prior to incorporate NCC data in platform trials with binary outcomes. To address temporal shifts, our method adjusts the amount of information borrowed from NCCs using a data-driven similarity index between NCC and concurrent control (CC) data. This index serves as the power parameter in the power prior, enabling adaptive borrowing. We evaluated the proposed method through extensive simulation studies, comparing its operating characteristics with seven alternatives: analysis using only CC data, naïve pooling method, a frequentist linear regression model, and four Bayesian methods designed to address temporal shifts. Across a range of temporal shift scenarios, the proposed method consistently achieved a favorable balance between type I error control and statistical power, maintaining type I error rates below 10% while avoiding the overborrowing seen in more aggressive methods. The practical utility of the proposed method was also examined by applying it to data from a platform trial involving patients with COVID-19.
{"title":"Bayesian Power Prior in Platform Trials With Non-Concurrent Control for Binary Outcomes: Development and Comparative Evaluation.","authors":"Junichi Asano, Hiroyuki Sato, Shin Watanabe, Akihiro Hirakawa","doi":"10.1002/sim.70387","DOIUrl":"https://doi.org/10.1002/sim.70387","url":null,"abstract":"<p><p>Platform trials enable the evaluation of multiple investigational drugs for a single disease and offer flexibility in adding or dropping treatments during the trial. This design would be advantageous for reducing the sample size and drug development time, particularly in contexts such as pandemics. In the platform trials, non-concurrent controls (NCCs) are often used for drug-control comparisons, but temporal shifts in subject characteristics, trial conduct, or standard of care can introduce bias in the estimation of treatment effects and increase the type I error rate. In this study, we develop a new Bayesian power prior to incorporate NCC data in platform trials with binary outcomes. To address temporal shifts, our method adjusts the amount of information borrowed from NCCs using a data-driven similarity index between NCC and concurrent control (CC) data. This index serves as the power parameter in the power prior, enabling adaptive borrowing. We evaluated the proposed method through extensive simulation studies, comparing its operating characteristics with seven alternatives: analysis using only CC data, naïve pooling method, a frequentist linear regression model, and four Bayesian methods designed to address temporal shifts. Across a range of temporal shift scenarios, the proposed method consistently achieved a favorable balance between type I error control and statistical power, maintaining type I error rates below 10% while avoiding the overborrowing seen in more aggressive methods. The practical utility of the proposed method was also examined by applying it to data from a platform trial involving patients with COVID-19.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70387"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigrid Leithe, Bjørn Møller, Bjarte Aagnes, Yngvar Nilssen, Paul C Lambert, Tor Åge Myklebust
Synthetic patient data has the potential to advance research in the medical field by providing privacy-preserving access to data resembling sensitive personal data. Assessing the level of privacy offered is essential to ensure privacy compliance, but it is challenging in practice. Many common methods either fail to capture central aspects of privacy or result in excessive caution based on unrealistic worst-case scenarios. We present a new approach to evaluating the privacy of synthetic datasets from known probability distributions based on the maximal local privacy loss. The strategy is based on measuring individual contributions to the likelihood of generating a specific synthetic dataset, to detect possibilities of reconstructing records in the original data. To demonstrate the method, we generate synthetic time-to-event data based on pancreatic and colon cancer data from the Cancer Registry of Norway using sequential regressions including a flexible parametric survival model. This illustrates the method's ability to measure information leakage at an individual level, which can be used to ensure acceptable privacy risks for every patient in the data.
{"title":"Maximal Local Privacy Loss-A New Method for Privacy Evaluation of Synthetic Datasets.","authors":"Sigrid Leithe, Bjørn Møller, Bjarte Aagnes, Yngvar Nilssen, Paul C Lambert, Tor Åge Myklebust","doi":"10.1002/sim.70376","DOIUrl":"https://doi.org/10.1002/sim.70376","url":null,"abstract":"<p><p>Synthetic patient data has the potential to advance research in the medical field by providing privacy-preserving access to data resembling sensitive personal data. Assessing the level of privacy offered is essential to ensure privacy compliance, but it is challenging in practice. Many common methods either fail to capture central aspects of privacy or result in excessive caution based on unrealistic worst-case scenarios. We present a new approach to evaluating the privacy of synthetic datasets from known probability distributions based on the maximal local privacy loss. The strategy is based on measuring individual contributions to the likelihood of generating a specific synthetic dataset, to detect possibilities of reconstructing records in the original data. To demonstrate the method, we generate synthetic time-to-event data based on pancreatic and colon cancer data from the Cancer Registry of Norway using sequential regressions including a flexible parametric survival model. This illustrates the method's ability to measure information leakage at an individual level, which can be used to ensure acceptable privacy risks for every patient in the data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70376"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}