Pub Date : 2025-11-05DOI: 10.1186/s12874-025-02689-w
Sardar Jahani, Ghodratollah Roshanaei, Leili Tapak
Background: Mild cognitive impairment (MCI) represents a transitional stage to Alzheimer's disease (AD), making progression prediction crucial for timely intervention. Predictive models integrating clinical, laboratory, and survival data can enhance early diagnosis and treatment decisions. While machine learning approaches effectively handle censored data, their application in MCI-to-AD progression prediction remains limited, with unclear superiority over classical survival models.
Methods: We analyzed 902 MCI individuals from Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset with 61 baseline features. Traditional survival models (Cox proportional hazards, Weibull, elastic net Cox) were compared with machine learning techniques (gradient boosting survival, random survival forests [RSF]) for progression prediction. Models were evaluated using C-index and IBS.
Results: Following feature selection, 14 key features were retained for model training. RSF achieved superior predictive performance with the highest C-index (0.878, 95% CI: 0.877-0.879) and lowest IBS (0.115, 95% CI: 0.114-0.116), demonstrating statistically significant superiority over all evaluated models (P-value < 0.001). RSF demonstrated effective risk stratification across individual biomarker categories (genetic, imaging, cognitive) and achieved optimal patient separation into three distinct prognostic groups when combining all features (p < 0.0001). SHAP-based feature importance analysis of RSF revealed cognitive assessments as the most influential predictors, with Functional Activities Questionnaire (FAQ) achieving the highest importance score (1.098), followed by Logical Memory Delayed Recall Total (LDELTOTAL) (0.906) and Alzheimer's Disease Assessment Scale (ADAS13) (0.770). Among neuroimaging biomarkers, Fluorodeoxyglucose (FDG) emerged as the leading predictor (0.634), ranking fifth overall. Feature importance ranking differed between classical and machine learning approaches, with FDG maintaining consistent importance across all models. RSF demonstrated excellent predictive calibration with positive net benefit across risk thresholds from 0.2 to 0.8.
Conclusions: The RSF model outperformed other methods, demonstrating superior potential for improving prognostic accuracy in medical diagnostics for MCI to AD progression.
{"title":"Assessing the accuracy of survival machine learning and traditional statistical models for Alzheimer's disease prediction over time: a study on the ADNI cohort.","authors":"Sardar Jahani, Ghodratollah Roshanaei, Leili Tapak","doi":"10.1186/s12874-025-02689-w","DOIUrl":"10.1186/s12874-025-02689-w","url":null,"abstract":"<p><strong>Background: </strong>Mild cognitive impairment (MCI) represents a transitional stage to Alzheimer's disease (AD), making progression prediction crucial for timely intervention. Predictive models integrating clinical, laboratory, and survival data can enhance early diagnosis and treatment decisions. While machine learning approaches effectively handle censored data, their application in MCI-to-AD progression prediction remains limited, with unclear superiority over classical survival models.</p><p><strong>Methods: </strong>We analyzed 902 MCI individuals from Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset with 61 baseline features. Traditional survival models (Cox proportional hazards, Weibull, elastic net Cox) were compared with machine learning techniques (gradient boosting survival, random survival forests [RSF]) for progression prediction. Models were evaluated using C-index and IBS.</p><p><strong>Results: </strong>Following feature selection, 14 key features were retained for model training. RSF achieved superior predictive performance with the highest C-index (0.878, 95% CI: 0.877-0.879) and lowest IBS (0.115, 95% CI: 0.114-0.116), demonstrating statistically significant superiority over all evaluated models (P-value < 0.001). RSF demonstrated effective risk stratification across individual biomarker categories (genetic, imaging, cognitive) and achieved optimal patient separation into three distinct prognostic groups when combining all features (p < 0.0001). SHAP-based feature importance analysis of RSF revealed cognitive assessments as the most influential predictors, with Functional Activities Questionnaire (FAQ) achieving the highest importance score (1.098), followed by Logical Memory Delayed Recall Total (LDELTOTAL) (0.906) and Alzheimer's Disease Assessment Scale (ADAS13) (0.770). Among neuroimaging biomarkers, Fluorodeoxyglucose (FDG) emerged as the leading predictor (0.634), ranking fifth overall. Feature importance ranking differed between classical and machine learning approaches, with FDG maintaining consistent importance across all models. RSF demonstrated excellent predictive calibration with positive net benefit across risk thresholds from 0.2 to 0.8.</p><p><strong>Conclusions: </strong>The RSF model outperformed other methods, demonstrating superior potential for improving prognostic accuracy in medical diagnostics for MCI to AD progression.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"250"},"PeriodicalIF":3.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12587511/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145450859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04DOI: 10.1186/s12874-025-02670-7
Jiawei Wei, Sarwar I Mozumder, Liming Li, Dong Xi, Jiajun Xu, Ray Lin, Oleksandr Sverdlov, Jonathan J Chipman
Background: The 2023 FDA's guidance on covariate adjustment encourages the judicious use of baseline covariates to enhance efficiency. However, when performing covariate adjustment in non-linear models, care must be taken on preserving estimation of the target estimand as introduced by the ICH E9(R1) addendum. To understand the current practices of covariate adjustment within the context of the estimands framework across various sectors and associated challenges, the conditional and marginal effect task force within the ASA Oncology Estimand working group conducted a survey.
Methods: The target participants of the survey were biostatisticians who support study designs and analyses in clinical trials in the drug development industry or in academia. A total of 19 questions were included in an online survey that was distributed between June and July 2023. The survey was disseminated via a shared online link to contacts from more than 50 organisations. The survey response and experience from the working group on challenges of covariate adjustment and stratified analysis are summarized and discussed in detail.
Results: A total of 122 responses were received from 12 countries. The survey results suggest that there remain gaps in the understanding of different statistical analysis models which may target different estimands for non-collapsable measures, highlighting the need for further clarification and training on this topic. In terms of general practice, when performing the analysis under stratified randomization, additional covariates may be added in the analysis model beyond those used for stratifying randomization, and small strata may be pooled to avoid the estimation challenges.
Conclusions: This paper summarises the results from this survey and based on our findings, we provide some recommendations to establish consistency and clarifications on any widely misunderstood practices.
{"title":"Current practice on covariate adjustment and stratified analysis -based on survey results by ASA oncology estimand working group conditional and marginal effect task force.","authors":"Jiawei Wei, Sarwar I Mozumder, Liming Li, Dong Xi, Jiajun Xu, Ray Lin, Oleksandr Sverdlov, Jonathan J Chipman","doi":"10.1186/s12874-025-02670-7","DOIUrl":"10.1186/s12874-025-02670-7","url":null,"abstract":"<p><strong>Background: </strong>The 2023 FDA's guidance on covariate adjustment encourages the judicious use of baseline covariates to enhance efficiency. However, when performing covariate adjustment in non-linear models, care must be taken on preserving estimation of the target estimand as introduced by the ICH E9(R1) addendum. To understand the current practices of covariate adjustment within the context of the estimands framework across various sectors and associated challenges, the conditional and marginal effect task force within the ASA Oncology Estimand working group conducted a survey.</p><p><strong>Methods: </strong>The target participants of the survey were biostatisticians who support study designs and analyses in clinical trials in the drug development industry or in academia. A total of 19 questions were included in an online survey that was distributed between June and July 2023. The survey was disseminated via a shared online link to contacts from more than 50 organisations. The survey response and experience from the working group on challenges of covariate adjustment and stratified analysis are summarized and discussed in detail.</p><p><strong>Results: </strong>A total of 122 responses were received from 12 countries. The survey results suggest that there remain gaps in the understanding of different statistical analysis models which may target different estimands for non-collapsable measures, highlighting the need for further clarification and training on this topic. In terms of general practice, when performing the analysis under stratified randomization, additional covariates may be added in the analysis model beyond those used for stratifying randomization, and small strata may be pooled to avoid the estimation challenges.</p><p><strong>Conclusions: </strong>This paper summarises the results from this survey and based on our findings, we provide some recommendations to establish consistency and clarifications on any widely misunderstood practices.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"249"},"PeriodicalIF":3.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12584542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145443916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-03DOI: 10.1186/s12874-025-02683-2
Alison Antoine, Katia Desroziers, Julien Dupin, David Pérol, Rémy Choquet
Randomised controlled trials (RCTs) are the gold standard for evaluating new therapies but have limitations, notably in terms of external validity. Real-world data (RWD) studies could complement RCT evidence. However, a consensus has not yet been reached on situations where RWD could offer rigorous complementary evidence to an RCT when evaluating the effectiveness of therapeutic innovations. This research aims to: (1) propose a categorisation of complex clinical situations; (2) classify the real-world evidence (RWE) approaches to be used in each situation to help reduce uncertainties or provide further evidence in drug benefit assessments; (3) summarise the best methodological considerations to adopt when using these RWE approaches; and (4) propose general recommendations to increase confidence in the use of RWE approaches during the assessment process. The main recommendations within the framework around the RWD-generation plan for complex evaluations are related to four main issues: quality (establishing criteria and standards for quality data), methodology (ensuring the use of the best methodological approaches), transparency (from the industry and from the health technology agencies (HTAs) and sharing/collaborating across countries and HTAs (promoting collaboration between HTAs and involving all parties). Our proposal and recommendations could help the scientific community better consider the therapeutic value of innovations through RWD, so that their potential can be fully realised to benefit the quality of care and the regulation of the healthcare system.
{"title":"Enhancing confidence in complex health technology assessments by using real-world evidence: highlighting existing strategies for effective drug evaluation.","authors":"Alison Antoine, Katia Desroziers, Julien Dupin, David Pérol, Rémy Choquet","doi":"10.1186/s12874-025-02683-2","DOIUrl":"10.1186/s12874-025-02683-2","url":null,"abstract":"<p><p>Randomised controlled trials (RCTs) are the gold standard for evaluating new therapies but have limitations, notably in terms of external validity. Real-world data (RWD) studies could complement RCT evidence. However, a consensus has not yet been reached on situations where RWD could offer rigorous complementary evidence to an RCT when evaluating the effectiveness of therapeutic innovations. This research aims to: (1) propose a categorisation of complex clinical situations; (2) classify the real-world evidence (RWE) approaches to be used in each situation to help reduce uncertainties or provide further evidence in drug benefit assessments; (3) summarise the best methodological considerations to adopt when using these RWE approaches; and (4) propose general recommendations to increase confidence in the use of RWE approaches during the assessment process. The main recommendations within the framework around the RWD-generation plan for complex evaluations are related to four main issues: quality (establishing criteria and standards for quality data), methodology (ensuring the use of the best methodological approaches), transparency (from the industry and from the health technology agencies (HTAs) and sharing/collaborating across countries and HTAs (promoting collaboration between HTAs and involving all parties). Our proposal and recommendations could help the scientific community better consider the therapeutic value of innovations through RWD, so that their potential can be fully realised to benefit the quality of care and the regulation of the healthcare system.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"248"},"PeriodicalIF":3.4,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12581489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145437211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1186/s12874-025-02685-0
T M Oh, S Batool, C Musicha, L Greene, H Wheat, L Smith, S Griffiths, A Gude, L Weston, H Shafi, K Stevens, C Sutcliffe, W Taylor, W Ingram, B Hussain, P Clarkson, I Sherriff, O C Ukoumunne, S Creanor, R Byng
Background: Complex socio-cultural, psychological, geographical, and service-related challenges are faced when recruiting people with dementia for clinical trials. The aim of Phase 1 of the Dementia PersonAlised Care Team (D-PACT) project was to assess the feasibility of recruiting (identifying, approaching and consenting) people with dementia, including those without capacity to consent, to a cluster randomized controlled trial of a primary care-based personalized dementia support intervention in England. COVID-19 necessitated a shift to remote working, creating the opportunity to compare recruitment strategies before and under lockdown constraints. This paper shares the adaptations made to enable remote consent and capacity judgement with people with dementia, as well as lessons learned.
Methods: Consent was conducted in person from September 2019 to March 2020. Remote consent was implemented from September 2020 to March 2021 after an enforced pause. Both quantitative and qualitative data were collected. Recruitment rates (proportion consented from eligible patients approached), mean monthly consent rates, and time spent on consent-related activities (tasks before and after consent/capacity-judgment meetings, miscellaneous tasks, travel) were compared. Participant experiences with remote recruitment were examined through thematic analysis of qualitative interviews.
Results: Pre-COVID-19, 22 participants (9.9%) out of 228 approached were consented in person. During the pandemic, 19 participants (9.6%) out of 198 were consented remotely, excluding 15 participants initially approached pre-pandemic and later consented via remote means. Mean monthly consent rates increased from 3.6 (in person) to 5.6 (remote). However, remote consent required more time (mean 14 researcher-hours per participant vs. 9 in person), with 75% of time spent on consent-related tasks compared to 20% in person. Travel accounted for 40% of in-person consent time. Interviews (n = 13) showed general acceptability of remote processes. However pre-consent information was perceived as excessive and led some participants to skim materials, potentially reducing understanding.
Conclusions: While remote consent is time-intensive, it achieves comparable rates (proportion consented/total approached) to in-person methods and higher monthly consent rates. A flexible, hybrid approach can enhance participation, offer choice, and increase person-centredness. Realistic planning for time and resources is crucial for inclusive dementia research. Funders should support these needs to ensure effective recruitment.
{"title":"Comparing in-person and remote consent of people with dementia into a primary care-based cluster randomised controlled trial: lessons from the Dementia PersonAlised Care Team (D-PACT) feasibility study.","authors":"T M Oh, S Batool, C Musicha, L Greene, H Wheat, L Smith, S Griffiths, A Gude, L Weston, H Shafi, K Stevens, C Sutcliffe, W Taylor, W Ingram, B Hussain, P Clarkson, I Sherriff, O C Ukoumunne, S Creanor, R Byng","doi":"10.1186/s12874-025-02685-0","DOIUrl":"10.1186/s12874-025-02685-0","url":null,"abstract":"<p><strong>Background: </strong>Complex socio-cultural, psychological, geographical, and service-related challenges are faced when recruiting people with dementia for clinical trials. The aim of Phase 1 of the Dementia PersonAlised Care Team (D-PACT) project was to assess the feasibility of recruiting (identifying, approaching and consenting) people with dementia, including those without capacity to consent, to a cluster randomized controlled trial of a primary care-based personalized dementia support intervention in England. COVID-19 necessitated a shift to remote working, creating the opportunity to compare recruitment strategies before and under lockdown constraints. This paper shares the adaptations made to enable remote consent and capacity judgement with people with dementia, as well as lessons learned.</p><p><strong>Methods: </strong>Consent was conducted in person from September 2019 to March 2020. Remote consent was implemented from September 2020 to March 2021 after an enforced pause. Both quantitative and qualitative data were collected. Recruitment rates (proportion consented from eligible patients approached), mean monthly consent rates, and time spent on consent-related activities (tasks before and after consent/capacity-judgment meetings, miscellaneous tasks, travel) were compared. Participant experiences with remote recruitment were examined through thematic analysis of qualitative interviews.</p><p><strong>Results: </strong>Pre-COVID-19, 22 participants (9.9%) out of 228 approached were consented in person. During the pandemic, 19 participants (9.6%) out of 198 were consented remotely, excluding 15 participants initially approached pre-pandemic and later consented via remote means. Mean monthly consent rates increased from 3.6 (in person) to 5.6 (remote). However, remote consent required more time (mean 14 researcher-hours per participant vs. 9 in person), with 75% of time spent on consent-related tasks compared to 20% in person. Travel accounted for 40% of in-person consent time. Interviews (n = 13) showed general acceptability of remote processes. However pre-consent information was perceived as excessive and led some participants to skim materials, potentially reducing understanding.</p><p><strong>Conclusions: </strong>While remote consent is time-intensive, it achieves comparable rates (proportion consented/total approached) to in-person methods and higher monthly consent rates. A flexible, hybrid approach can enhance participation, offer choice, and increase person-centredness. Realistic planning for time and resources is crucial for inclusive dementia research. Funders should support these needs to ensure effective recruitment.</p><p><strong>Trial registration: </strong>ISRCTN80204146.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"247"},"PeriodicalIF":3.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12573961/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145399831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1186/s12874-025-02700-4
Leonid Kopylev, Michael Dzierlenga
Comparing and combining reports from different publication is of interest to many when conducting meta-analyses. However, challenges can arise with reports using transformations of the exposure data. A recent publication, Linakis et al. (BMC Med Res Methodol 24:6, 2024), compared methods for re-expression with the conclusion that the re-expression methods examined are not reliable. In their analysis, they treated the estimated effect estimates, which are random variables, as if they were constants, which have no inherent variability. This letter describes two places where this assumption was made and how it affected their conclusions. While the re-expression methods demonstrate potential room for refinement in terms of estimating the observed point estimate, with the statistically appropriate consideration of variability, use of re-expression for small to moderate sample sizes (up to approximately 5000) seems appropriate. That contrasts with the author's conclusion that use of re-expression methods is not suitable for meta-analyses.
在进行荟萃分析时,比较和合并来自不同出版物的报告是许多人感兴趣的。然而,使用公开数据转换的报告可能会出现问题。最近发表的一篇论文,Linakis等人(BMC Med Res Methodol 24:6, 2024),比较了重新表达的方法,得出了重新表达方法不可靠的结论。在他们的分析中,他们将估计的效应估计(随机变量)视为常量,没有固有的可变性。这封信描述了这一假设产生的两个地方,以及它是如何影响他们的结论的。虽然重新表达方法在估计观察点估计方面显示出潜在的改进空间,但在统计上适当考虑可变性的情况下,对于小到中等样本量(最多约5000)使用重新表达似乎是合适的。这与作者的结论形成对比,即使用重新表达方法不适合进行荟萃分析。
{"title":"The importance of considering variability in re-expression of effect estimates for use in meta-analyses.","authors":"Leonid Kopylev, Michael Dzierlenga","doi":"10.1186/s12874-025-02700-4","DOIUrl":"10.1186/s12874-025-02700-4","url":null,"abstract":"<p><p>Comparing and combining reports from different publication is of interest to many when conducting meta-analyses. However, challenges can arise with reports using transformations of the exposure data. A recent publication, Linakis et al. (BMC Med Res Methodol 24:6, 2024), compared methods for re-expression with the conclusion that the re-expression methods examined are not reliable. In their analysis, they treated the estimated effect estimates, which are random variables, as if they were constants, which have no inherent variability. This letter describes two places where this assumption was made and how it affected their conclusions. While the re-expression methods demonstrate potential room for refinement in terms of estimating the observed point estimate, with the statistically appropriate consideration of variability, use of re-expression for small to moderate sample sizes (up to approximately 5000) seems appropriate. That contrasts with the author's conclusion that use of re-expression methods is not suitable for meta-analyses.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"246"},"PeriodicalIF":3.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12573819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145408029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1186/s12874-025-02699-8
Matthew W Linakis, Matthew P Longnecker
{"title":"Response to \"The importance of considering variability in re-expression of effect estimates for use in meta-analysis.\" (Kopylev and Dzierlenga 2025).","authors":"Matthew W Linakis, Matthew P Longnecker","doi":"10.1186/s12874-025-02699-8","DOIUrl":"10.1186/s12874-025-02699-8","url":null,"abstract":"","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"245"},"PeriodicalIF":3.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145408060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29DOI: 10.1186/s12874-025-02696-x
Gaofei Zhang, Ann Osi, Navid Ghaffarzadegan, Hazhir Rahmandad, Ran Xu
Background: Human behavioral responses to changes in risks are often delayed. Methods for estimating these delayed responses either rely on rigid assumptions about the delay distribution (e.g., Erlang distribution), producing a poor fit, or yield period-specific estimates (e.g., estimates from the Autoregressive Distributed Lag (ARDL) model) that are difficult to integrate into simulation models. We propose a hybrid ARDL-Erlang approach that yields an interpretable summary of behavioral responses suitable for incorporation into simulation models.
Method: We apply the ARDL-Erlang approach to estimate the effect of COVID-19 deaths on mobility across US counties from October 2020 to July 2021. A standard panel autoregressive distributed lag (ARDL) model first estimates the effect of past deaths and past mobility on current mobility. The ARDL model is then transformed into an Infinite Distributed Lag (IDL) model consisting of only past deaths. The coefficients of the past deaths are aggregated into an overall effect and fit to an Erlang distribution, summarized by average delay length and shape parameter.
Results: Our results show that on the national level, a one-standard-deviation permanent increase in weekly deaths per 100,000 population (log-transformed) is associated with a 0.46-standard-deviation decrease in human mobility in the long run, where the delay distribution follows a first-order Erlang distribution, and the average delay length is about 3.2 weeks. However, there is much heterogeneity across states, with first- to third-order Erlang delays and 2 to 18 weeks of average delay providing a theoretically cogent summary of how mobility followed changes in deaths during the first year and a half of the pandemic.
Conclusion: This study provides a novel approach to estimating delayed human responses to health risks using a hybrid ARDL-Erlang model. Our findings highlight significant variability in the impact and timing of responses across states, underscoring the need for tailored public health policies. This study can also serve as guidelines and an example for identifying delayed human behavior in other settings.
{"title":"Identifying delayed human response to external risks: an econometric analysis of mobility change during a pandemic.","authors":"Gaofei Zhang, Ann Osi, Navid Ghaffarzadegan, Hazhir Rahmandad, Ran Xu","doi":"10.1186/s12874-025-02696-x","DOIUrl":"10.1186/s12874-025-02696-x","url":null,"abstract":"<p><strong>Background: </strong>Human behavioral responses to changes in risks are often delayed. Methods for estimating these delayed responses either rely on rigid assumptions about the delay distribution (e.g., Erlang distribution), producing a poor fit, or yield period-specific estimates (e.g., estimates from the Autoregressive Distributed Lag (ARDL) model) that are difficult to integrate into simulation models. We propose a hybrid ARDL-Erlang approach that yields an interpretable summary of behavioral responses suitable for incorporation into simulation models.</p><p><strong>Method: </strong>We apply the ARDL-Erlang approach to estimate the effect of COVID-19 deaths on mobility across US counties from October 2020 to July 2021. A standard panel autoregressive distributed lag (ARDL) model first estimates the effect of past deaths and past mobility on current mobility. The ARDL model is then transformed into an Infinite Distributed Lag (IDL) model consisting of only past deaths. The coefficients of the past deaths are aggregated into an overall effect and fit to an Erlang distribution, summarized by average delay length and shape parameter.</p><p><strong>Results: </strong>Our results show that on the national level, a one-standard-deviation permanent increase in weekly deaths per 100,000 population (log-transformed) is associated with a 0.46-standard-deviation decrease in human mobility in the long run, where the delay distribution follows a first-order Erlang distribution, and the average delay length is about 3.2 weeks. However, there is much heterogeneity across states, with first- to third-order Erlang delays and 2 to 18 weeks of average delay providing a theoretically cogent summary of how mobility followed changes in deaths during the first year and a half of the pandemic.</p><p><strong>Conclusion: </strong>This study provides a novel approach to estimating delayed human responses to health risks using a hybrid ARDL-Erlang model. Our findings highlight significant variability in the impact and timing of responses across states, underscoring the need for tailored public health policies. This study can also serve as guidelines and an example for identifying delayed human behavior in other settings.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"244"},"PeriodicalIF":3.4,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145399992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1186/s12874-025-02655-6
Yunwei Zhang, Samuel Muller
Background: With the emergence of high-dimensional censored survival data in health and medicine, the use of survival models for risk prediction is increasing. To date, practical techniques exist for splitting data for model training and performance evaluation. While different sampling methods have been compared for their performances, the effect of data splitting ratio and survival specific characteristics have not yet been examined for high dimensional censored survival data.
Methods: We first conduct an empirical study of using the simple random sampling technique and stratified sampling technique on real high-dimensional gene expression datasets Lasso Cox model performance. For the simple random sampling technique, various data splitting ratios are investigated. For the stratified sampling, different survival specific variables are investigated. We consider C-index and Brier Score as evaluation metrics. We further develop and validate a two-stage purposive sampling approach motivated by our empirical study findings.
Results: Our findings reveal that survival specific characteristics contribute to model performance across training, testing and validation data. The proposed two-stage purposive sampling approach performs well in mitigating excessive diversity within the training data for both simulation study and real data analysis, leading to better survival model performances.
Conclusions: We recommend careful consideration of key factors in different sampling techniques when developing and validating survival models. Using methods such as the proposed method to mitigate excessive diversity provides a solution.
{"title":"Two-stage sampling for better survival model performance.","authors":"Yunwei Zhang, Samuel Muller","doi":"10.1186/s12874-025-02655-6","DOIUrl":"10.1186/s12874-025-02655-6","url":null,"abstract":"<p><strong>Background: </strong>With the emergence of high-dimensional censored survival data in health and medicine, the use of survival models for risk prediction is increasing. To date, practical techniques exist for splitting data for model training and performance evaluation. While different sampling methods have been compared for their performances, the effect of data splitting ratio and survival specific characteristics have not yet been examined for high dimensional censored survival data.</p><p><strong>Methods: </strong>We first conduct an empirical study of using the simple random sampling technique and stratified sampling technique on real high-dimensional gene expression datasets Lasso Cox model performance. For the simple random sampling technique, various data splitting ratios are investigated. For the stratified sampling, different survival specific variables are investigated. We consider C-index and Brier Score as evaluation metrics. We further develop and validate a two-stage purposive sampling approach motivated by our empirical study findings.</p><p><strong>Results: </strong>Our findings reveal that survival specific characteristics contribute to model performance across training, testing and validation data. The proposed two-stage purposive sampling approach performs well in mitigating excessive diversity within the training data for both simulation study and real data analysis, leading to better survival model performances.</p><p><strong>Conclusions: </strong>We recommend careful consideration of key factors in different sampling techniques when developing and validating survival models. Using methods such as the proposed method to mitigate excessive diversity provides a solution.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"242"},"PeriodicalIF":3.4,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570514/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145387107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1186/s12874-025-02694-z
Yinan Huang, Shadi Bazzazzadehgan, Jieni Li, Arman Arabshomali, Mai Li, Kaustuv Bhattacharya, John P Bentley
Background: Accurate prediction of survival in oncology can guide targeted interventions. The traditional regression-based Cox proportional hazards (CPH) model has statistical assumptions and may have limited predictive accuracy. With the capability to model large datasets, machine learning (ML) holds the potential to improve the prediction of time-to-event outcomes, such as cancer survival outcomes. The present study aimed to systematically summarize the use of ML models for cancer survival outcomes in observational studies and to compare the performance of ML models with CPH models.
Methods: We systematically searched PubMed, MEDLINE (via EBSCO), and Embase for studies that evaluated ML models vs. CPH models for cancer survival outcomes. The use of ML algorithms was summarized, and either the area under the curve (AUC) or the concordance index (C-index) for the ML and CPH models were presented descriptively. Only studies that provided a measure of discrimination, i.e., AUC or C-index, and 95% confidence interval (CI) were included in the final meta-analysis. A random-effects model was used to compare the predictive performance in the pooled AUC or C-index estimates between ML and CPH models using R. The quality of the studies was evaluated using available checklists. Multiple sensitivity analyses were performed.
Results: A total of 21 studies were included for systematic review and 7 for meta-analysis. Across the 21 articles, diverse ML models were used, including random survival forest (N=16, 76.19%), gradient boosting (N=5, 23.81%), and deep learning (N=8, 38.09%). In predicting cancer survival outcomes, ML models showed no superior performance over CPH regression. The standardized mean difference in AUC or C-index was 0.01 (95% CI: -0.01 to 0.03). Results from the sensitivity analyses confirmed the robustness of the main findings.
Conclusions: ML models had similar performance compared with CPH models in predicting cancer survival outcomes. Although this systematic review highlights the promising use of ML to improve the quality of care in oncology, findings from this review also suggest opportunities to improve ML reporting transparency. Future systematic reviews should focus on the comparative performance between specific ML models and CPH regression in time-to-event outcomes in specific type of cancer or other disease areas.
{"title":"Comparison of machine learning methods versus traditional Cox regression for survival prediction in cancer using real-world data: a systematic literature review and meta-analysis.","authors":"Yinan Huang, Shadi Bazzazzadehgan, Jieni Li, Arman Arabshomali, Mai Li, Kaustuv Bhattacharya, John P Bentley","doi":"10.1186/s12874-025-02694-z","DOIUrl":"10.1186/s12874-025-02694-z","url":null,"abstract":"<p><strong>Background: </strong>Accurate prediction of survival in oncology can guide targeted interventions. The traditional regression-based Cox proportional hazards (CPH) model has statistical assumptions and may have limited predictive accuracy. With the capability to model large datasets, machine learning (ML) holds the potential to improve the prediction of time-to-event outcomes, such as cancer survival outcomes. The present study aimed to systematically summarize the use of ML models for cancer survival outcomes in observational studies and to compare the performance of ML models with CPH models.</p><p><strong>Methods: </strong>We systematically searched PubMed, MEDLINE (via EBSCO), and Embase for studies that evaluated ML models vs. CPH models for cancer survival outcomes. The use of ML algorithms was summarized, and either the area under the curve (AUC) or the concordance index (C-index) for the ML and CPH models were presented descriptively. Only studies that provided a measure of discrimination, i.e., AUC or C-index, and 95% confidence interval (CI) were included in the final meta-analysis. A random-effects model was used to compare the predictive performance in the pooled AUC or C-index estimates between ML and CPH models using R. The quality of the studies was evaluated using available checklists. Multiple sensitivity analyses were performed.</p><p><strong>Results: </strong>A total of 21 studies were included for systematic review and 7 for meta-analysis. Across the 21 articles, diverse ML models were used, including random survival forest (N=16, 76.19%), gradient boosting (N=5, 23.81%), and deep learning (N=8, 38.09%). In predicting cancer survival outcomes, ML models showed no superior performance over CPH regression. The standardized mean difference in AUC or C-index was 0.01 (95% CI: -0.01 to 0.03). Results from the sensitivity analyses confirmed the robustness of the main findings.</p><p><strong>Conclusions: </strong>ML models had similar performance compared with CPH models in predicting cancer survival outcomes. Although this systematic review highlights the promising use of ML to improve the quality of care in oncology, findings from this review also suggest opportunities to improve ML reporting transparency. Future systematic reviews should focus on the comparative performance between specific ML models and CPH regression in time-to-event outcomes in specific type of cancer or other disease areas.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"243"},"PeriodicalIF":3.4,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145386946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}