The identification of latent profile trajectories in longitudinal studies represents an important challenge for specialists since they could provide insights to better understand their problem of interest. The majority of the statistical methodologies for cluster analysis for longitudinal data are based on growth curve or mixed-effects models, and often incorporate covariates for a better adjustment. In particular, for Bayesian nonparametric methods, Dirichlet process mixture models are widely used together. We propose a clustering methodology for longitudinal data based on mixture models generated by a discrete random probability measure whose weights are decreasingly ordered by construction. Additionally, data is modeled without making use of covariates and assuming independence across time for individual measurements. Our approach also provides a straightforward procedure to merge some estimated groups, since it could happen that there are many of them, to be easily explained by experts. Our results suggest that, at least for a first analysis, this framework is enough to effectively detect groups in the data; further exploration of each group could incorporate extra information. We apply our methodology for detecting adiposity trajectories in Mexican children in a secondary analysis of the "Prenatal Omega-3 fatty acid Supplementation and Child Growth and Development" study (POSGRAD) cohort.
{"title":"Cluster analysis for longitudinal data and its application in the detection of adiposity trajectories.","authors":"Asael Fabian Martínez, Ivonne Ramírez-Silva, Ruth Fuentes-García","doi":"10.1177/09622802251414594","DOIUrl":"https://doi.org/10.1177/09622802251414594","url":null,"abstract":"<p><p>The identification of latent profile trajectories in longitudinal studies represents an important challenge for specialists since they could provide insights to better understand their problem of interest. The majority of the statistical methodologies for cluster analysis for longitudinal data are based on growth curve or mixed-effects models, and often incorporate covariates for a better adjustment. In particular, for Bayesian nonparametric methods, Dirichlet process mixture models are widely used together. We propose a clustering methodology for longitudinal data based on mixture models generated by a discrete random probability measure whose weights are decreasingly ordered by construction. Additionally, data is modeled without making use of covariates and assuming independence across time for individual measurements. Our approach also provides a straightforward procedure to merge some estimated groups, since it could happen that there are many of them, to be easily explained by experts. Our results suggest that, at least for a first analysis, this framework is enough to effectively detect groups in the data; further exploration of each group could incorporate extra information. We apply our methodology for detecting adiposity trajectories in Mexican children in a secondary analysis of the \"Prenatal Omega-3 fatty acid Supplementation and Child Growth and Development\" study (POSGRAD) cohort.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251414594"},"PeriodicalIF":1.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146012413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1177/09622802251412840
Jose A Christen, Francisco J Rubio
The hazard function is central to the formulation of commonly used survival regression models such as the proportional hazards and accelerated failure time models. However, these models rely on a shared baseline hazard, which, when specified parametrically, can only capture limited shapes. To overcome this limitation, we propose a general class of parametric survival regression models obtained by modelling the hazard function using autonomous systems of ordinary differential equations (ODEs). Covariate information is incorporated via transformed linear predictors on the parameters of the ODE system. Our framework capitalises on the interpretability of parameters in common ODE systems, enabling the identification of covariate values that produce qualitatively distinct hazard shapes associated with different attractors of the system of ODEs. This provides deeper insights into how covariates influence survival dynamics. We develop efficient Bayesian computational tools, including parallelised evaluation of the log-posterior, which facilitates integration with general-purpose Markov Chain Monte Carlo samplers. We also derive conditions for posterior asymptotic normality, enabling fast approximations of the posterior. A central contribution of our work lies in the case studies. We demonstrate the methodology using clinical trial data with crossing survival curves, and a study of cancer recurrence times where our approach reveals how the efficacy of interventions (treatments) on hazard and survival are influenced by patient characteristics.
{"title":"Hazard-based distributional regression via ordinary differential equations.","authors":"Jose A Christen, Francisco J Rubio","doi":"10.1177/09622802251412840","DOIUrl":"https://doi.org/10.1177/09622802251412840","url":null,"abstract":"<p><p>The hazard function is central to the formulation of commonly used survival regression models such as the proportional hazards and accelerated failure time models. However, these models rely on a shared baseline hazard, which, when specified parametrically, can only capture limited shapes. To overcome this limitation, we propose a general class of parametric survival regression models obtained by modelling the hazard function using autonomous systems of ordinary differential equations (ODEs). Covariate information is incorporated via transformed linear predictors on the parameters of the ODE system. Our framework capitalises on the interpretability of parameters in common ODE systems, enabling the identification of covariate values that produce qualitatively distinct hazard shapes associated with different attractors of the system of ODEs. This provides deeper insights into how covariates influence survival dynamics. We develop efficient Bayesian computational tools, including parallelised evaluation of the log-posterior, which facilitates integration with general-purpose Markov Chain Monte Carlo samplers. We also derive conditions for posterior asymptotic normality, enabling fast approximations of the posterior. A central contribution of our work lies in the case studies. We demonstrate the methodology using clinical trial data with crossing survival curves, and a study of cancer recurrence times where our approach reveals how the efficacy of interventions (treatments) on hazard and survival are influenced by patient characteristics.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251412840"},"PeriodicalIF":1.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1177/09622802251411538
Edwin Yn Tang, Stef Baas, Daniel Kaddaj, Lukas Pin, David S Robertson, Sofía S Villar
Response-adaptive randomization (RAR) can increase participant benefit in clinical trials, but also complicates statistical analysis. The burn-in period-a non-adaptive initial stage-is commonly used to mitigate this disadvantage, yet guidance on its optimal duration is scarce. To address this critical gap, this paper introduces an exact evaluation approach to investigate how the burn-in length impacts statistical operating characteristics of two-arm binary Bayesian RAR (BRAR) designs. We show that (1) commonly used calibration and asymptotic tests show substantial type I error rate inflation for BRAR designs without a burn-in period, and increasing the total burn-in length to more than half the trial size reduces but does not fully mitigate type I error rate inflation, necessitating exact tests; (2) exact tests conditioning on total successes show the highest average and minimum power up to large burn-in lengths; (3) the burn-in length substantially influences power and participant benefit, which are often not maximized at the maximum or minimum possible burn-in length; (4) the test statistic influences the type I error rate and power; (5) estimation bias decreases quicker in the burn-in length for larger treatment effects and increases for larger trial sizes under the same burn-in length. Our approach is illustrated by re-designing the ARREST trial.
{"title":"A burn-in(g) question: How long should an initial equal randomization stage be before Bayesian response-adaptive randomization?","authors":"Edwin Yn Tang, Stef Baas, Daniel Kaddaj, Lukas Pin, David S Robertson, Sofía S Villar","doi":"10.1177/09622802251411538","DOIUrl":"https://doi.org/10.1177/09622802251411538","url":null,"abstract":"<p><p>Response-adaptive randomization (RAR) can increase participant benefit in clinical trials, but also complicates statistical analysis. The burn-in period-a non-adaptive initial stage-is commonly used to mitigate this disadvantage, yet guidance on its optimal duration is scarce. To address this critical gap, this paper introduces an exact evaluation approach to investigate how the burn-in length impacts statistical operating characteristics of two-arm binary Bayesian RAR (BRAR) designs. We show that (1) commonly used calibration and asymptotic tests show substantial type I error rate inflation for BRAR designs without a burn-in period, and increasing the total burn-in length to more than half the trial size reduces but does not fully mitigate type I error rate inflation, necessitating exact tests; (2) exact tests conditioning on total successes show the highest average and minimum power up to large burn-in lengths; (3) the burn-in length substantially influences power and participant benefit, which are often not maximized at the maximum or minimum possible burn-in length; (4) the test statistic influences the type I error rate and power; (5) estimation bias decreases quicker in the burn-in length for larger treatment effects and increases for larger trial sizes under the same burn-in length. Our approach is illustrated by re-designing the ARREST trial.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251411538"},"PeriodicalIF":1.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1177/09622802251413737
Nikola Štefelová, Javier Palarea-Albaladejo, Josep Antoni Martín-Fernández
Testing group differences in compositional data, that is, multivariate data referring to parts of a whole, requires focussing on the relative information between components. This is commonly achieved by mapping the data into a sensible logratio coordinate system. Groupings are often defined by an externally given factor but can also emerge from internal features of the data, such as distinct zero patterns, which may reflect an underlaying structure of subpopulations. This work introduces the PERLOG test, a novel non-parametric permutation test to identify significant groupings based on pairwise logratios, the fundamental units of compositional information. The method is suitable for both externally and internally defined groupings. In particular, the case of groups defined according to zero patterns is discussed as a prominent example of the latter. The performance of the proposal as a statistical test and its advantages over conventional multivariate tests are demonstrated through simulation. Real-world applications are illustrated using data from studies on movement behaviours and time-use epidemiology.
{"title":"A permutation test of differences between externally or internally defined groupings in compositional data sets.","authors":"Nikola Štefelová, Javier Palarea-Albaladejo, Josep Antoni Martín-Fernández","doi":"10.1177/09622802251413737","DOIUrl":"https://doi.org/10.1177/09622802251413737","url":null,"abstract":"<p><p>Testing group differences in compositional data, that is, multivariate data referring to parts of a whole, requires focussing on the relative information between components. This is commonly achieved by mapping the data into a sensible logratio coordinate system. Groupings are often defined by an externally given factor but can also emerge from internal features of the data, such as distinct zero patterns, which may reflect an underlaying structure of subpopulations. This work introduces the PERLOG test, a novel non-parametric permutation test to identify significant groupings based on pairwise logratios, the fundamental units of compositional information. The method is suitable for both externally and internally defined groupings. In particular, the case of groups defined according to zero patterns is discussed as a prominent example of the latter. The performance of the proposal as a statistical test and its advantages over conventional multivariate tests are demonstrated through simulation. Real-world applications are illustrated using data from studies on movement behaviours and time-use epidemiology.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251413737"},"PeriodicalIF":1.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1177/09622802251412849
Yang-Jin Kim
A main interest in clinical practice is the prediction of patient prognosis conductive to decision making. Therefore, a relevant prediction model should be able to reflect the updated patient's condition. A joint model of longitudinal markers and time-to-event data has been widely applied to estimate the association between the risk of the event and the markers' change. The purpose of this work is to provide dynamic measures for evaluating the predictive accuracy of longitudinal markers in a context of interval-censored failure time data. We propose dynamic area under curve and Brier score reflecting incomplete data structure of interval-censored data. Simulation study compares the prediction performance of joint model and landmarking method. As a real data example, the suggested method is applied to predict the occurrence of dementia using repeatedly measured cognitive scores.
{"title":"Dynamic prediction of interval-censored failure time data with longitudinal marker.","authors":"Yang-Jin Kim","doi":"10.1177/09622802251412849","DOIUrl":"https://doi.org/10.1177/09622802251412849","url":null,"abstract":"<p><p>A main interest in clinical practice is the prediction of patient prognosis conductive to decision making. Therefore, a relevant prediction model should be able to reflect the updated patient's condition. A joint model of longitudinal markers and time-to-event data has been widely applied to estimate the association between the risk of the event and the markers' change. The purpose of this work is to provide dynamic measures for evaluating the predictive accuracy of longitudinal markers in a context of interval-censored failure time data. We propose dynamic area under curve and Brier score reflecting incomplete data structure of interval-censored data. Simulation study compares the prediction performance of joint model and landmarking method. As a real data example, the suggested method is applied to predict the occurrence of dementia using repeatedly measured cognitive scores.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251412849"},"PeriodicalIF":1.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1177/09622802251406536
Stephanie Pan, Prasad Patil, Janice Weinberg, Sara Lodi, Michael P LaValley
Generalized pairwise comparison (GPC) methods are extensions of the Mann-Whitney approach that allow comparisons of outcomes through prioritized ranking, and they have been widely applied in randomized controlled trials (RCTs). Importantly, GPC methods can be adapted to handle censored time-to-event data. GPC methods are based on assigning scores to pairs of subjects where all pairs of treatment and control subjects are evaluated: the outcome of each treatment subject is compared with each control subject. The GPC test statistic can be expressed as a treatment effect for the therapeutic intervention by measures such as the net benefit, win odds, win ratio (WR), or probability index. As the focus for this study, the WR has an alternative interpretation as the inverse of the hazard ratio under proportional hazards. However, its estimate could be biased in the presence of substantial censoring. Censoring increases the number of indeterminate treatment and control pairs, where the win or loss is undetermined due to the censored observation(s) and a definitive score cannot be assigned. We propose a novel method leveraging pseudo-observations to address the issue of uninformative pairs resulting from censoring for a time-to-event outcome. We compare the performance of our method with existing GPC methods in simulations under various censoring scenarios. For equal drop-out and administrative censoring, our method provides results that are comparable to existing GPC methods. However, for unequal drop-out, which is common in clinical trials, the performance of our approach relative to existing methods depends on the censoring proportion and distribution. The proposed approach reduced bias and root mean squared error relative to Gehan and Latta under several censoring conditions, but these improvements did not extend to gains in statistical power. Lastly, we illustrate this new GPC approach using two reconstructed RCT datasets.
{"title":"Generalized pairwise comparisons using pseudo-observations for time-to-event censored data in a randomized controlled trial setting.","authors":"Stephanie Pan, Prasad Patil, Janice Weinberg, Sara Lodi, Michael P LaValley","doi":"10.1177/09622802251406536","DOIUrl":"https://doi.org/10.1177/09622802251406536","url":null,"abstract":"<p><p>Generalized pairwise comparison (GPC) methods are extensions of the Mann-Whitney approach that allow comparisons of outcomes through prioritized ranking, and they have been widely applied in randomized controlled trials (RCTs). Importantly, GPC methods can be adapted to handle censored time-to-event data. GPC methods are based on assigning scores to pairs of subjects where all pairs of treatment and control subjects are evaluated: the outcome of each treatment subject is compared with each control subject. The GPC test statistic can be expressed as a treatment effect for the therapeutic intervention by measures such as the net benefit, win odds, win ratio (WR), or probability index. As the focus for this study, the WR has an alternative interpretation as the inverse of the hazard ratio under proportional hazards. However, its estimate could be biased in the presence of substantial censoring. Censoring increases the number of indeterminate treatment and control pairs, where the win or loss is undetermined due to the censored observation(s) and a definitive score cannot be assigned. We propose a novel method leveraging pseudo-observations to address the issue of uninformative pairs resulting from censoring for a time-to-event outcome. We compare the performance of our method with existing GPC methods in simulations under various censoring scenarios. For equal drop-out and administrative censoring, our method provides results that are comparable to existing GPC methods. However, for unequal drop-out, which is common in clinical trials, the performance of our approach relative to existing methods depends on the censoring proportion and distribution. The proposed approach reduced bias and root mean squared error relative to Gehan and Latta under several censoring conditions, but these improvements did not extend to gains in statistical power. Lastly, we illustrate this new GPC approach using two reconstructed RCT datasets.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251406536"},"PeriodicalIF":1.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145998094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1177/09622802251411550
Heewon Park, Seiya Imoto, Satoru Miyano
Heterogeneous gene networks capture coordinated gene activities and systemic disruptions in complex biological processes and diseases, but extracting biologically meaningful insights from these large-scale networks remains challenging due to limited interpretability of existing methods. To address this gap, we have developed comprehensive information-based functional gene network analysis (CiFGNA), a novel computational methodology that systematically detects functional pathways enriched with phenotype-specific molecular interplays both in directed and undirected gene networks. CiFGNA characterizes the differential molecular interplay across phenotypes using probability density functions, quantifying network dissimilarities via Kullback-Leibler divergence. This approach incorporates both gene expression levels and network structures, enabling the accurate identification of phenotype-specific molecular interactions. We then ranked edges by their divergence scores and computed an enrichment score to evaluate whether pathway-associated molecular interactions were statistically overrepresented among highly divergent edges. By incorporating comprehensive gene network information and employing probability density functions with KL divergence as a dissimilarity measure, CiFGNA achieves accurate characterization of phenotype-specific molecular interactions, improving performance of gene network functional pathway analyses. Simulation and anticancer drug sensitivity analyses demonstrated that CiFGNA effectively identifies enriched cancer pathways and distinguishes molecular features associated with drug resistance and sensitivity. Key findings revealed gene networks centered on CD52, EPCAM, and TNFRSF12A as markers of drug-response phenotypes, suggesting that targeting resistance-related molecular interactions (e.g. CD52 and EPCAM) or enhancing sensitivity-associated markers such as TNFRSF12A may improve chemotherapy efficacy. Overall, CiFGNA offers a powerful, generalizable tool for interpreting complex gene networks and advancing systems-level understanding of disease mechanisms.
{"title":"CiFGNA: Comprehensive information-based functional gene network analysis.","authors":"Heewon Park, Seiya Imoto, Satoru Miyano","doi":"10.1177/09622802251411550","DOIUrl":"https://doi.org/10.1177/09622802251411550","url":null,"abstract":"<p><p>Heterogeneous gene networks capture coordinated gene activities and systemic disruptions in complex biological processes and diseases, but extracting biologically meaningful insights from these large-scale networks remains challenging due to limited interpretability of existing methods. To address this gap, we have developed comprehensive information-based functional gene network analysis (CiFGNA), a novel computational methodology that systematically detects functional pathways enriched with phenotype-specific molecular interplays both in directed and undirected gene networks. CiFGNA characterizes the differential molecular interplay across phenotypes using probability density functions, quantifying network dissimilarities via Kullback-Leibler divergence. This approach incorporates both gene expression levels and network structures, enabling the accurate identification of phenotype-specific molecular interactions. We then ranked edges by their divergence scores and computed an enrichment score to evaluate whether pathway-associated molecular interactions were statistically overrepresented among highly divergent edges. By incorporating comprehensive gene network information and employing probability density functions with KL divergence as a dissimilarity measure, CiFGNA achieves accurate characterization of phenotype-specific molecular interactions, improving performance of gene network functional pathway analyses. Simulation and anticancer drug sensitivity analyses demonstrated that CiFGNA effectively identifies enriched cancer pathways and distinguishes molecular features associated with drug resistance and sensitivity. Key findings revealed gene networks centered on CD52, EPCAM, and TNFRSF12A as markers of drug-response phenotypes, suggesting that targeting resistance-related molecular interactions (e.g. CD52 and EPCAM) or enhancing sensitivity-associated markers such as TNFRSF12A may improve chemotherapy efficacy. Overall, CiFGNA offers a powerful, generalizable tool for interpreting complex gene networks and advancing systems-level understanding of disease mechanisms.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251411550"},"PeriodicalIF":1.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1177/09622802251404054
Tsung-I Lin, Wan-Lun Wang
There has been growing interest across various research domains in the modeling and clustering of multivariate longitudinal trajectories obtained from internally near-homogeneous subgroups. One prominent motivation for such work arises from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort study, which involves multiple clinical measurements, exhibiting complex features such as diverse progression patterns, multimodality, and the presence of atypical observations. To tackle the challenges associated with modeling and clustering such grouped longitudinal data, we propose a finite mixture of multivariate contaminated normal linear mixed model (FM-MCNLMM) and its extended version, referred to as the EFM-MCNLMM, which allows the mixing weights to potentially depend on concomitant covariates. We develop alternating expectation conditional maximization algorithms to carry out maximum likelihood estimation for the two models. The utility and effectiveness of the proposed methodology are demonstrated through simulations and analysis of the ADNI data.
{"title":"Grouped multi-trajectory modeling using finite mixtures of multivariate contaminated normal linear mixed model.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1177/09622802251404054","DOIUrl":"https://doi.org/10.1177/09622802251404054","url":null,"abstract":"<p><p>There has been growing interest across various research domains in the modeling and clustering of multivariate longitudinal trajectories obtained from internally near-homogeneous subgroups. One prominent motivation for such work arises from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort study, which involves multiple clinical measurements, exhibiting complex features such as diverse progression patterns, multimodality, and the presence of atypical observations. To tackle the challenges associated with modeling and clustering such grouped longitudinal data, we propose a finite mixture of multivariate contaminated normal linear mixed model (FM-MCNLMM) and its extended version, referred to as the EFM-MCNLMM, which allows the mixing weights to potentially depend on concomitant covariates. We develop alternating expectation conditional maximization algorithms to carry out maximum likelihood estimation for the two models. The utility and effectiveness of the proposed methodology are demonstrated through simulations and analysis of the ADNI data.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251404054"},"PeriodicalIF":1.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1177/09622802251406581
Floriane Le Vilain-Abraham, Solène Desmée, Jennifer A Thompson, Jean-Claude Lacherade, Elsa Tavernier, Etienne Dantan, Agnès Caille
In randomized clinical trials with a time-to-event outcome, the intervention effect could be quantified by a difference in restricted mean survival time (ΔRMST) between the intervention and control groups, defined as the expected survival duration gain due to the intervention over a fixed follow-up period. In cluster randomized trials (CRTs), social units are randomized to intervention or control groups; the correlation between survival times of the individuals within the same cluster must be taken into account in the statistical analysis. In a previous work, we proposed the use of pseudo-values regression, based on generalized estimating equations (GEEs), for estimating ΔRMST in CRTs. We showed that this method correctly estimated the ΔRMST and controlled the type I error rate in CRTs with at least 50 clusters. Here, we propose methods for CRTs with a small number of clusters (<50). We evaluated the performance of four bias-corrections of the GEE sandwich variance estimator of the intervention effect. We also considered the use of a Student t distribution as an alternative to the normal distribution of the GEE Wald test statistic for testing the intervention effect and constructing the confidence interval. With a simulation study, assuming proportional or non-proportional hazards, we showed that the Student t distribution outperformed the normal distribution in terms of type I error rate, and the Fay and Graubard bias-corrected variance led to an appropriate type I error rate whatever the number of clusters. Therefore, we recommend the use of the Fay and Graubard variance estimator combined with a Student t distribution for the pseudo-values regression to correctly estimate the variance of the intervention effect. Finally, we provide an illustrative analysis of the DEMETER trial evaluating the use of a specific endotracheal tube for subglottic secretion drainage to prevent ventilator-associated pneumonia, by comparing each of the methods considered.
{"title":"Restricted mean survival time in cluster randomized trials with a small number of clusters: Improving variance estimation of the intervention effect from the pseudo-values regression.","authors":"Floriane Le Vilain-Abraham, Solène Desmée, Jennifer A Thompson, Jean-Claude Lacherade, Elsa Tavernier, Etienne Dantan, Agnès Caille","doi":"10.1177/09622802251406581","DOIUrl":"https://doi.org/10.1177/09622802251406581","url":null,"abstract":"<p><p>In randomized clinical trials with a time-to-event outcome, the intervention effect could be quantified by a difference in restricted mean survival time (ΔRMST) between the intervention and control groups, defined as the expected survival duration gain due to the intervention over a fixed follow-up period. In cluster randomized trials (CRTs), social units are randomized to intervention or control groups; the correlation between survival times of the individuals within the same cluster must be taken into account in the statistical analysis. In a previous work, we proposed the use of pseudo-values regression, based on generalized estimating equations (GEEs), for estimating ΔRMST in CRTs. We showed that this method correctly estimated the ΔRMST and controlled the type I error rate in CRTs with at least 50 clusters. Here, we propose methods for CRTs with a small number of clusters (<50). We evaluated the performance of four bias-corrections of the GEE sandwich variance estimator of the intervention effect. We also considered the use of a Student <i>t</i> distribution as an alternative to the normal distribution of the GEE Wald test statistic for testing the intervention effect and constructing the confidence interval. With a simulation study, assuming proportional or non-proportional hazards, we showed that the Student <i>t</i> distribution outperformed the normal distribution in terms of type I error rate, and the Fay and Graubard bias-corrected variance led to an appropriate type I error rate whatever the number of clusters. Therefore, we recommend the use of the Fay and Graubard variance estimator combined with a Student <i>t</i> distribution for the pseudo-values regression to correctly estimate the variance of the intervention effect. Finally, we provide an illustrative analysis of the DEMETER trial evaluating the use of a specific endotracheal tube for subglottic secretion drainage to prevent ventilator-associated pneumonia, by comparing each of the methods considered.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251406581"},"PeriodicalIF":1.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the context of longitudinal data regression modeling, individuals often have two or more response indicators, and these response indicators are typically correlated to some extent. Additionally, in the field of clinical medicine, the response indicators of longitudinal data are often ordinal. For the joint modeling of multivariate ordinal longitudinal data, methods based on mean regression (MR) are commonly used to study latent variables. However, for data with non-normal errors, MR methods often perform poorly. As an alternative to MR methods, composite quantile regression (CQR) can overcome the limitations of MR methods and provide more robust estimates. This article proposes a joint relative composite quantile regression method (joint relative CQR) for multivariate ordinal longitudinal data and investigates its application to a set of longitudinal medical datasets on dementia. Firstly, the joint relative CQR method for multivariate ordinal longitudinal data is constructed based on the pseudo composite asymmetric Laplace distribution (PCALD) and latent variable models. Secondly, the parameter estimation problem of the model is studied using MCMC algorithms. Finally, Monte Carlo simulations and a set of longitudinal medical datasets on dementia validate the effectiveness of the proposed model and method.
{"title":"Joint modeling of composite quantile regression for multiple ordinal longitudinal data with its applications to a dementia dataset.","authors":"Shuqing Liang, Lina Bian, Qi Yang, Yuzhu Tian, Maozai Tian","doi":"10.1177/09622802251412838","DOIUrl":"https://doi.org/10.1177/09622802251412838","url":null,"abstract":"<p><p>In the context of longitudinal data regression modeling, individuals often have two or more response indicators, and these response indicators are typically correlated to some extent. Additionally, in the field of clinical medicine, the response indicators of longitudinal data are often ordinal. For the joint modeling of multivariate ordinal longitudinal data, methods based on mean regression (MR) are commonly used to study latent variables. However, for data with non-normal errors, MR methods often perform poorly. As an alternative to MR methods, composite quantile regression (CQR) can overcome the limitations of MR methods and provide more robust estimates. This article proposes a joint relative composite quantile regression method (joint relative CQR) for multivariate ordinal longitudinal data and investigates its application to a set of longitudinal medical datasets on dementia. Firstly, the joint relative CQR method for multivariate ordinal longitudinal data is constructed based on the pseudo composite asymmetric Laplace distribution (PCALD) and latent variable models. Secondly, the parameter estimation problem of the model is studied using MCMC algorithms. Finally, Monte Carlo simulations and a set of longitudinal medical datasets on dementia validate the effectiveness of the proposed model and method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251412838"},"PeriodicalIF":1.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}