Pub Date : 2026-02-11DOI: 10.1177/09622802261417216
Shi-Fang Qiu, Dai-Min Li, Wai-Yin Poon
Various approaches have been developed to assess equivalence/non-inferiority with assay sensitivity in a three-arm trial with continuous or discrete endpoints. However, there is little work done on ordinal endpoints. Ordinal data do not have metric information, the method for analyzing metric endpoints can systematically lead to errors for ordinal observations. The win probability that a subject receiving one treatment achieves a better outcome (or "wins" against) compared to a subject receiving the other treatment, is developed to quantify the treatment effect. In this article, the equivalence/non-inferiority with assay sensitivity in a three-arm trial are assessed by the win probabilities from the perspective of simultaneous confidence intervals (SCIs). The proposed methods can be applied to studies with ordinal or continuous outcomes without making parametric assumptions. Empirical results show that the Fisher-z transformation-based SCI, the method of variance estimates recovery SCIs combing with logit transformation, logit with arcsinh transformation confidence limits perform well in the sense that their empirical coverage probabilities are pretty close to the nominal confidence level. Sample size determination for achieving the pre-specified power is also investigated according to the duality of hypothesis testing and interval estimation. An example taken from the study of prophylaxis of postoperative nausea and vomiting is used to illustrate the proposed methods.
{"title":"Rank-based methods for assessing equivalence/non-inferiority with assay sensitivity in a three-arm trial with ordinal endpoints.","authors":"Shi-Fang Qiu, Dai-Min Li, Wai-Yin Poon","doi":"10.1177/09622802261417216","DOIUrl":"https://doi.org/10.1177/09622802261417216","url":null,"abstract":"<p><p>Various approaches have been developed to assess equivalence/non-inferiority with assay sensitivity in a three-arm trial with continuous or discrete endpoints. However, there is little work done on ordinal endpoints. Ordinal data do not have metric information, the method for analyzing metric endpoints can systematically lead to errors for ordinal observations. The win probability that a subject receiving one treatment achieves a better outcome (or \"wins\" against) compared to a subject receiving the other treatment, is developed to quantify the treatment effect. In this article, the equivalence/non-inferiority with assay sensitivity in a three-arm trial are assessed by the win probabilities from the perspective of simultaneous confidence intervals (SCIs). The proposed methods can be applied to studies with ordinal or continuous outcomes without making parametric assumptions. Empirical results show that the Fisher-z transformation-based SCI, the method of variance estimates recovery SCIs combing with logit transformation, logit with arcsinh transformation confidence limits perform well in the sense that their empirical coverage probabilities are pretty close to the nominal confidence level. Sample size determination for achieving the pre-specified power is also investigated according to the duality of hypothesis testing and interval estimation. An example taken from the study of prophylaxis of postoperative nausea and vomiting is used to illustrate the proposed methods.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802261417216"},"PeriodicalIF":1.9,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1177/09622802251414586
Keer Chen, Zengyue Zheng, Pengfei Zhu, Shuping Jiang, Nan Li, Jumin Deng, Pingyan Chen, Zhenyu Wu, Ying Wu
BackgroundHybrid clinical trial design integrates traditional randomized controlled trials (RCTs) with real-world data (RWD), aiming to enhance trial efficiency through dynamic incorporation of external data (External trial data and RWD). However, existing methods, such as the Meta-Analytic Predictive (MAP) Prior, exhibit serious limitations in controlling data heterogeneity, adjusting baseline discrepancies, and optimizing dynamic borrowing proportions. These limitations often introduce external bias or compromise evidence reliability, hindering their application in complex analyses like bridging trials and multi-regional clinical trials (MRCTs).ObjectiveThis study proposes a novel hybrid Bayesian framework, EQPS Robust MAP (rMAP), to address heterogeneity and bias in multi-source data integration. Its feasibility and robustness are validated through systematic simulations and retrospective case analyses, using two independent datasets to evaluate the effect of Risankizumab in patients with moderate-to-severe plaque psoriasis.Design and MethodsThe EQPS-rMAP method operates in three stages: (1) Eliminating baseline covariate discrepancies through propensity score stratification; (2) constructing stratum-specific MAP priors to dynamically adjust weights for external data; and (3) introducing equivalence probability weights to quantify data conflict risks. The study evaluates the method's performance across six simulated analyses (heterogeneity differences, baseline shifts, etc.), comparing it with traditional methods (MAP, PSMAP, Empirical Bayes MAP) in terms of estimation bias, type I error control, and sample size requirements. Real-world case analyses further validate its applicability.ResultsSimulations demonstrate that EQPS-rMAP maintains estimation robustness under considerable heterogeneity while reducing sample size demands and enhancing trial efficiency. Case analyses confirm its ability to control external bias while preserving high estimation accuracy compared to conventional approaches.ConclusionThe EQPS-rMAP method provides empirical evidence for the feasibility of hybrid clinical designs. Its methodological advancements-resolving baseline and heterogeneity conflicts through adaptive mechanisms-offer broader applicability for integrating external and RWD across diverse analyses, including bridging trials, MRCTs, and post-marketing studies.
{"title":"A hybrid prior Bayesian method for combining domestic real-world data and overseas data in global drug development.","authors":"Keer Chen, Zengyue Zheng, Pengfei Zhu, Shuping Jiang, Nan Li, Jumin Deng, Pingyan Chen, Zhenyu Wu, Ying Wu","doi":"10.1177/09622802251414586","DOIUrl":"https://doi.org/10.1177/09622802251414586","url":null,"abstract":"<p><p>BackgroundHybrid clinical trial design integrates traditional randomized controlled trials (RCTs) with real-world data (RWD), aiming to enhance trial efficiency through dynamic incorporation of external data (External trial data and RWD). However, existing methods, such as the Meta-Analytic Predictive (MAP) Prior, exhibit serious limitations in controlling data heterogeneity, adjusting baseline discrepancies, and optimizing dynamic borrowing proportions. These limitations often introduce external bias or compromise evidence reliability, hindering their application in complex analyses like bridging trials and multi-regional clinical trials (MRCTs).ObjectiveThis study proposes a novel hybrid Bayesian framework, EQPS Robust MAP (rMAP), to address heterogeneity and bias in multi-source data integration. Its feasibility and robustness are validated through systematic simulations and retrospective case analyses, using two independent datasets to evaluate the effect of Risankizumab in patients with moderate-to-severe plaque psoriasis.Design and MethodsThe EQPS-rMAP method operates in three stages: (1) Eliminating baseline covariate discrepancies through propensity score stratification; (2) constructing stratum-specific MAP priors to dynamically adjust weights for external data; and (3) introducing equivalence probability weights to quantify data conflict risks. The study evaluates the method's performance across six simulated analyses (heterogeneity differences, baseline shifts, etc.), comparing it with traditional methods (MAP, PSMAP, Empirical Bayes MAP) in terms of estimation bias, type I error control, and sample size requirements. Real-world case analyses further validate its applicability.ResultsSimulations demonstrate that EQPS-rMAP maintains estimation robustness under considerable heterogeneity while reducing sample size demands and enhancing trial efficiency. Case analyses confirm its ability to control external bias while preserving high estimation accuracy compared to conventional approaches.ConclusionThe EQPS-rMAP method provides empirical evidence for the feasibility of hybrid clinical designs. Its methodological advancements-resolving baseline and heterogeneity conflicts through adaptive mechanisms-offer broader applicability for integrating external and RWD across diverse analyses, including bridging trials, MRCTs, and post-marketing studies.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251414586"},"PeriodicalIF":1.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1177/09622802251411540
Jimmy Huy Tran, Jan Terje Kvaløy, Hartwig Kørner
An aspect of interest in surveillance of diseases is whether the survival time distribution changes over time. By following data in health registries over time, this can be monitored, either in real time or retrospectively. With relevant risk factors registered, these can be taken into account in the monitoring as well. A challenge in monitoring survival times based on registry data is that the information related to cause of death might either be missing or uncertain. To quantify the burden of disease in such cases, relative survival methods can be used, where the total hazard is modelled as the population hazard plus the excess hazard due to the disease.We propose a cumulative sum (CUSUM) procedure for monitoring for changes in the survival time distribution in cases where the use of excess hazard models is relevant. The CUSUM chart is based on a survival log-likelihood ratio and extends previously suggested methods for monitoring of time to event data to the excess hazard setting. The procedure takes into account changes in the population risk over time, as well as changes in the excess hazard which is explained by observed covariates. Properties, challenges and an application to cancer registry data will be presented.
{"title":"Monitoring time to event in registry data using CUSUMs based on relative survival models.","authors":"Jimmy Huy Tran, Jan Terje Kvaløy, Hartwig Kørner","doi":"10.1177/09622802251411540","DOIUrl":"https://doi.org/10.1177/09622802251411540","url":null,"abstract":"<p><p>An aspect of interest in surveillance of diseases is whether the survival time distribution changes over time. By following data in health registries over time, this can be monitored, either in real time or retrospectively. With relevant risk factors registered, these can be taken into account in the monitoring as well. A challenge in monitoring survival times based on registry data is that the information related to cause of death might either be missing or uncertain. To quantify the burden of disease in such cases, relative survival methods can be used, where the total hazard is modelled as the population hazard plus the excess hazard due to the disease.We propose a cumulative sum (CUSUM) procedure for monitoring for changes in the survival time distribution in cases where the use of excess hazard models is relevant. The CUSUM chart is based on a survival log-likelihood ratio and extends previously suggested methods for monitoring of time to event data to the excess hazard setting. The procedure takes into account changes in the population risk over time, as well as changes in the excess hazard which is explained by observed covariates. Properties, challenges and an application to cancer registry data will be presented.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251411540"},"PeriodicalIF":1.9,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1177/09622802251414429
N Balakrishnan, M Mar Fenoy, M Carmen Pardo
In many practical situations, some subjects may never experience the event of interest in their lifetime. These subjects are referred to as the cured or non-susceptible subjects. In the context of chronic disease treatment, this is referred to as a cure fraction. In this work, we extend the generalized time-dependent logistic (GTDL) model proposed by MacKenzie (1996) to a flexible family of models which accommodates not only non-proportional hazards, but also long-term survivors. Inferential methods are then developed for the proposed model and a Monte Carlo simulation study is also carried out to evaluate the performance of the model as well as the inferential method developed here. A real data example on gastric cancer is then used to illustrate the usefulness of the proposed model.
{"title":"A non-proportional hazards cure model with an application to gastric cancer data analysis.","authors":"N Balakrishnan, M Mar Fenoy, M Carmen Pardo","doi":"10.1177/09622802251414429","DOIUrl":"https://doi.org/10.1177/09622802251414429","url":null,"abstract":"<p><p>In many practical situations, some subjects may never experience the event of interest in their lifetime. These subjects are referred to as the cured or non-susceptible subjects. In the context of chronic disease treatment, this is referred to as a cure fraction. In this work, we extend the generalized time-dependent logistic (GTDL) model proposed by MacKenzie (1996) to a flexible family of models which accommodates not only non-proportional hazards, but also long-term survivors. Inferential methods are then developed for the proposed model and a Monte Carlo simulation study is also carried out to evaluate the performance of the model as well as the inferential method developed here. A real data example on gastric cancer is then used to illustrate the usefulness of the proposed model.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251414429"},"PeriodicalIF":1.9,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1177/09622802251415157
Yan Li, Yong-Fang Kuo, Liang Li
Estimating the causal treatment effects by subgroups is important in observational studies when the treatment effect heterogeneity is present. Existing propensity score methods rely on a correctly specified propensity score model. Model misspecification results in biased treatment effect estimation and covariate imbalance. We proposed a method for the propensity score analysis with controlled subgroup balance (G-SBPS) to achieve covariate mean balance in all subgroups. We further incorporated nonparametric kernel regression for the propensity scores and developed a kernelized G-SBPS (kG-SBPS) to improve the subgroup mean balance of covariate transformations in a rich functional class. This extension increased robustness to propensity score model misspecification. Extensive numerical studies showed that G-SBPS and kG-SBPS improve both subgroup covariate balance and subgroup treatment effect estimation, compared to existing approaches. For illustration, we applied G-SBPS and kG-SBPS to a dataset on right heart catheterization to estimate the subgroup average treatment effects on the hospital length of stay and a dataset on diabetes self-management training to estimate the subgroup average treatment effects for the treated on the hospitalization rate.
{"title":"Parametric and nonparametric propensity score weighting analysis with subgroup covariate balance.","authors":"Yan Li, Yong-Fang Kuo, Liang Li","doi":"10.1177/09622802251415157","DOIUrl":"https://doi.org/10.1177/09622802251415157","url":null,"abstract":"<p><p>Estimating the causal treatment effects by subgroups is important in observational studies when the treatment effect heterogeneity is present. Existing propensity score methods rely on a correctly specified propensity score model. Model misspecification results in biased treatment effect estimation and covariate imbalance. We proposed a method for the propensity score analysis with controlled subgroup balance (G-SBPS) to achieve covariate mean balance in all subgroups. We further incorporated nonparametric kernel regression for the propensity scores and developed a kernelized G-SBPS (kG-SBPS) to improve the subgroup mean balance of covariate transformations in a rich functional class. This extension increased robustness to propensity score model misspecification. Extensive numerical studies showed that G-SBPS and kG-SBPS improve both subgroup covariate balance and subgroup treatment effect estimation, compared to existing approaches. For illustration, we applied G-SBPS and kG-SBPS to a dataset on right heart catheterization to estimate the subgroup average treatment effects on the hospital length of stay and a dataset on diabetes self-management training to estimate the subgroup average treatment effects for the treated on the hospitalization rate.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251415157"},"PeriodicalIF":1.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1177/09622802251414939
Mirajul Islam, Michael J Daniels, Zeynab Aghabazaz, Juned Siddique
Cardiovascular disease (CVD) cohorts collect data longitudinally to study the association between CVD risk factors and event times. An important area of scientific research is to better understand what features of CVD risk factor trajectories are associated with CVD. We develop methods for feature selection in joint models where feature selection is viewed as a bi-level variable selection problem with multiple features nested within multiple longitudinal risk factors. We modify a previously proposed Bayesian sparse group selection (BSGS) prior, which has not been implemented in joint models until now, to better represent prior beliefs when selecting features both at the group level (longitudinal risk factor) and within group (features of a longitudinal risk factor). One of the advantages of our method over the BSGS method is its ability to account for correlation among the features within a risk factor. As a result, it selects important features similarly, but excludes unimportant features within risk factors more efficiently than the BSGS prior. We evaluate our prior via simulations and apply our method to data from the Atherosclerosis Risk in Communities (ARIC) study, a population-based, prospective cohort study consisting of over 15,000 men and women aged 45-64 at baseline who were measured six additional times. We evaluate which CVD risk factors and which characteristics of their trajectories (features) are associated with death from CVD. We find that systolic and diastolic blood pressure, glucose, and total cholesterol are important risk factors with different important features associated with CVD death in both men and women.
{"title":"Bayesian feature selection in joint models with application to a cardiovascular disease cohort study.","authors":"Mirajul Islam, Michael J Daniels, Zeynab Aghabazaz, Juned Siddique","doi":"10.1177/09622802251414939","DOIUrl":"https://doi.org/10.1177/09622802251414939","url":null,"abstract":"<p><p>Cardiovascular disease (CVD) cohorts collect data longitudinally to study the association between CVD risk factors and event times. An important area of scientific research is to better understand what features of CVD risk factor trajectories are associated with CVD. We develop methods for feature selection in joint models where feature selection is viewed as a bi-level variable selection problem with multiple features nested within multiple longitudinal risk factors. We modify a previously proposed Bayesian sparse group selection (BSGS) prior, which has not been implemented in joint models until now, to better represent prior beliefs when selecting features both at the group level (longitudinal risk factor) and within group (features of a longitudinal risk factor). One of the advantages of our method over the BSGS method is its ability to account for correlation among the features within a risk factor. As a result, it selects important features similarly, but excludes unimportant features within risk factors more efficiently than the BSGS prior. We evaluate our prior via simulations and apply our method to data from the Atherosclerosis Risk in Communities (ARIC) study, a population-based, prospective cohort study consisting of over 15,000 men and women aged 45-64 at baseline who were measured six additional times. We evaluate which CVD risk factors and which characteristics of their trajectories (features) are associated with death from CVD. We find that systolic and diastolic blood pressure, glucose, and total cholesterol are important risk factors with different important features associated with CVD death in both men and women.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251414939"},"PeriodicalIF":1.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1177/09622802251412855
Marta Spreafico, Anja J Rueten-Budde, Hein Putter, Marta Fiocco
In clinical studies, the illness-death model is often used to describe disease progression. A subject starts disease-free, may develop the disease and then die, or die directly. In clinical practice, disease can only be diagnosed at pre-specified follow-up visits, so the exact time of disease onset is often unknown, resulting in interval-censored data. This study examines the impact of ignoring this interval-censored nature of disease data on the discrimination performance of illness-death models, focusing on the time-specific area under the receiver operating characteristic curve in both incident/dynamic and cumulative/dynamic definitions. A simulation study with data simulated from Weibull transition hazards and disease state censored at regular intervals is conducted. Estimates are derived using different methods: the Cox model with a time-dependent binary disease marker, which ignores interval-censoring, and the illness-death model for interval-censored data estimated with three implementations-the piecewise-constant model from the msm package, the Weibull and M-spline models from the SmoothHazard package. These methods are also applied to a dataset of 2232 patients with high-grade soft tissue sarcoma, where the interval-censored disease state is the post-operative development of distant metastases. The results suggest that, in the presence of interval-censored disease times, it is important to account for interval-censoring not only when estimating the parameters of the model but also when evaluating the discrimination performance of the disease.
{"title":"Discrimination performance in illness-death models with interval-censored disease data.","authors":"Marta Spreafico, Anja J Rueten-Budde, Hein Putter, Marta Fiocco","doi":"10.1177/09622802251412855","DOIUrl":"https://doi.org/10.1177/09622802251412855","url":null,"abstract":"<p><p>In clinical studies, the illness-death model is often used to describe disease progression. A subject starts disease-free, may develop the disease and then die, or die directly. In clinical practice, disease can only be diagnosed at pre-specified follow-up visits, so the exact time of disease onset is often unknown, resulting in interval-censored data. This study examines the impact of ignoring this interval-censored nature of disease data on the discrimination performance of illness-death models, focusing on the time-specific area under the receiver operating characteristic curve in both incident/dynamic and cumulative/dynamic definitions. A simulation study with data simulated from Weibull transition hazards and disease state censored at regular intervals is conducted. Estimates are derived using different methods: the Cox model with a time-dependent binary disease marker, which ignores interval-censoring, and the illness-death model for interval-censored data estimated with three implementations-the piecewise-constant model from the <i>msm</i> package, the Weibull and M-spline models from the <i>SmoothHazard</i> package. These methods are also applied to a dataset of 2232 patients with high-grade soft tissue sarcoma, where the interval-censored disease state is the post-operative development of distant metastases. The results suggest that, in the presence of interval-censored disease times, it is important to account for interval-censoring not only when estimating the parameters of the model but also when evaluating the discrimination performance of the disease.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251412855"},"PeriodicalIF":1.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1177/09622802251415022
Yujie Wu, Ce Yang, Molin Wang
We develop methods to analyze clustered competing risks data when the event types are only available in a training dataset and are missing in the main study. We propose to estimate the exposure effects through the cause-specific proportional hazards frailty model where random effects are introduced into the model to account for the within-cluster correlation. We propose a weighted penalized partial likelihood method where the weights represent the probabilities of the occurrence of events, and the weights can be obtained by fitting a classification model for the event types on the training dataset. Alternatively, we propose an imputation approach where the missing event types are imputed based on the predictions from the classification model. We derive the analytical variances, and evaluate the finite sample properties of our methods in an extensive simulation study. As an illustrative example, we apply our methods to estimate the associations between tinnitus and metabolic, sensory and metabolic+sensory hearing loss in the Conservation of Hearing Study Audiology Assessment Arm.
{"title":"Statistical methods for clustered competing risk data when the event types are only available in a training dataset.","authors":"Yujie Wu, Ce Yang, Molin Wang","doi":"10.1177/09622802251415022","DOIUrl":"https://doi.org/10.1177/09622802251415022","url":null,"abstract":"<p><p>We develop methods to analyze clustered competing risks data when the event types are only available in a training dataset and are missing in the main study. We propose to estimate the exposure effects through the cause-specific proportional hazards frailty model where random effects are introduced into the model to account for the within-cluster correlation. We propose a weighted penalized partial likelihood method where the weights represent the probabilities of the occurrence of events, and the weights can be obtained by fitting a classification model for the event types on the training dataset. Alternatively, we propose an imputation approach where the missing event types are imputed based on the predictions from the classification model. We derive the analytical variances, and evaluate the finite sample properties of our methods in an extensive simulation study. As an illustrative example, we apply our methods to estimate the associations between tinnitus and metabolic, sensory and metabolic+sensory hearing loss in the Conservation of Hearing Study Audiology Assessment Arm.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251415022"},"PeriodicalIF":1.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1177/09622802251412844
Lei Wang, Yang Ni, Irina Gaynanova
Increasing epidemiologic evidence suggests that the diversity and composition of the gut microbiome can predict infection risk in cancer patients. Infections remain a major cause of morbidity and mortality during chemotherapy. Analyzing microbiome data to identify associations with infection pathogenesis for proactive treatment has become a critical research focus. However, the high-dimensional nature of the data necessitates the use of dimension-reduction methods to facilitate inference and interpretation. Traditional dimension reduction methods, which assume Gaussianity, perform poorly with skewed and zero-inflated microbiome data. To address these challenges, we propose a semiparametric principal component analysis method based on a truncated latent Gaussian copula model that accommodates both skewness and zero inflation. Simulation studies demonstrate that the proposed method outperforms existing approaches by providing more accurate estimates of scores and loadings across various copula transformation settings. We apply our method, along with competing approaches, to gut microbiome data from pediatric patients with acute lymphoblastic leukemia. The principal scores derived from the proposed method reveal the strongest associations between pre-chemotherapy microbiome composition and adverse events during subsequent chemotherapy, offering valuable insights for improving patient outcomes.
{"title":"Truncated Gaussian copula principal component analysis with application to pediatric acute lymphoblastic leukemia patients' gut microbiome.","authors":"Lei Wang, Yang Ni, Irina Gaynanova","doi":"10.1177/09622802251412844","DOIUrl":"https://doi.org/10.1177/09622802251412844","url":null,"abstract":"<p><p>Increasing epidemiologic evidence suggests that the diversity and composition of the gut microbiome can predict infection risk in cancer patients. Infections remain a major cause of morbidity and mortality during chemotherapy. Analyzing microbiome data to identify associations with infection pathogenesis for proactive treatment has become a critical research focus. However, the high-dimensional nature of the data necessitates the use of dimension-reduction methods to facilitate inference and interpretation. Traditional dimension reduction methods, which assume Gaussianity, perform poorly with skewed and zero-inflated microbiome data. To address these challenges, we propose a semiparametric principal component analysis method based on a truncated latent Gaussian copula model that accommodates both skewness and zero inflation. Simulation studies demonstrate that the proposed method outperforms existing approaches by providing more accurate estimates of scores and loadings across various copula transformation settings. We apply our method, along with competing approaches, to gut microbiome data from pediatric patients with acute lymphoblastic leukemia. The principal scores derived from the proposed method reveal the strongest associations between pre-chemotherapy microbiome composition and adverse events during subsequent chemotherapy, offering valuable insights for improving patient outcomes.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251412844"},"PeriodicalIF":1.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1177/09622802251406584
Abdalkarim Alnajjar, Helen Bian, Zihang Lu
Cluster analysis has been widely used in biomedical studies for disaggregating heterogeneous diseases and identifying disease subtypes that may inform clinical decisions. In the era of advanced data science and engineering, cluster analysis faces new challenges due to high dimensionality, multimodality and computational complexity. In the present study, we propose a fast integrative clustering approach based on variational Bayesian inference, called iClusterVB. The iClusterVB enables the integration of multiple datasets into the clustering process while performing feature selection in high-dimensional settings for mixed data types, including continuous, categorical, and count data. Simulation studies are performed to compare the performance of iClusterVB with six competing methods and highlight its advantages. Additionally, iClusterVB is applied to three real-life studies to demonstrate its utility in identifying important features and cancer subtypes that are associated with distinct survival probabilities. A user-friendly R package iClusterVB and a tutorial are developed to implement the proposed approach.
{"title":"A fast integrative clustering and feature selection approach for high-dimensional multiview data.","authors":"Abdalkarim Alnajjar, Helen Bian, Zihang Lu","doi":"10.1177/09622802251406584","DOIUrl":"https://doi.org/10.1177/09622802251406584","url":null,"abstract":"<p><p>Cluster analysis has been widely used in biomedical studies for disaggregating heterogeneous diseases and identifying disease subtypes that may inform clinical decisions. In the era of advanced data science and engineering, cluster analysis faces new challenges due to high dimensionality, multimodality and computational complexity. In the present study, we propose a fast integrative clustering approach based on variational Bayesian inference, called iClusterVB. The iClusterVB enables the integration of multiple datasets into the clustering process while performing feature selection in high-dimensional settings for mixed data types, including continuous, categorical, and count data. Simulation studies are performed to compare the performance of iClusterVB with six competing methods and highlight its advantages. Additionally, iClusterVB is applied to three real-life studies to demonstrate its utility in identifying important features and cancer subtypes that are associated with distinct survival probabilities. A user-friendly R package iClusterVB and a tutorial are developed to implement the proposed approach.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251406584"},"PeriodicalIF":1.9,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}