Pub Date : 2024-12-26DOI: 10.1186/s12874-024-02448-3
Nicholas Niako, Jesus D Melgarejo, Gladys E Maestre, Kristina P Vatcheva
Background: Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques.
Methods: Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data.
Results: All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation.
Conclusions: We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute.
{"title":"Effects of missing data imputation methods on univariate blood pressure time series data analysis and forecasting with ARIMA and LSTM.","authors":"Nicholas Niako, Jesus D Melgarejo, Gladys E Maestre, Kristina P Vatcheva","doi":"10.1186/s12874-024-02448-3","DOIUrl":"10.1186/s12874-024-02448-3","url":null,"abstract":"<p><strong>Background: </strong>Missing observations within the univariate time series are common in real-life and cause analytical problems in the flow of the analysis. Imputation of missing values is an inevitable step in every incomplete univariate time series. Most of the existing studies focus on comparing the distributions of imputed data. There is a gap of knowledge on how different imputation methods for univariate time series affect the forecasting performance of time series models. We evaluated the prediction performance of autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) network models on imputed time series data using ten different imputation techniques.</p><p><strong>Methods: </strong>Missing values were generated under missing completely at random (MCAR) mechanism at 10%, 15%, 25%, and 35% rates of missingness using complete data of 24-h ambulatory diastolic blood pressure readings. The performance of the mean, Kalman filtering, linear, spline, and Stineman interpolations, exponentially weighted moving average (EWMA), simple moving average (SMA), k-nearest neighborhood (KNN), and last-observation-carried-forward (LOCF) imputation techniques on the time series structure and the prediction performance of the LSTM and ARIMA models were compared on imputed and original data.</p><p><strong>Results: </strong>All imputation techniques either increased or decreased the data autocorrelation and with this affected the forecasting performance of the ARIMA and LSTM algorithms. The best imputation technique did not guarantee better predictions obtained on the imputed data. The mean imputation, LOCF, KNN, Stineman, and cubic spline interpolations methods performed better for a small rate of missingness. Interpolation with EWMA and Kalman filtering yielded consistent performances across all scenarios of missingness. Disregarding the imputation methods, the LSTM resulted with a slightly better predictive accuracy among the best performing ARIMA and LSTM models; otherwise, the results varied. In our small sample, ARIMA tended to perform better on data with higher autocorrelation.</p><p><strong>Conclusions: </strong>We recommend to the researchers that they consider Kalman smoothing techniques, interpolation techniques (linear, spline, and Stineman), moving average techniques (SMA and EWMA) for imputing univariate time series data as they perform well on both data distribution and forecasting with ARIMA and LSTM models. The LSTM slightly outperforms ARIMA models, however, for small samples, ARIMA is simpler and faster to execute.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"320"},"PeriodicalIF":3.9,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142892064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Undetected atrial fibrillation (AF) poses a significant risk of stroke and cardiovascular mortality. However, diagnosing AF in real-time can be challenging as the arrhythmia is often not captured instantly. To address this issue, a deep-learning model was developed to diagnose AF even during periods of arrhythmia-free windows.</p><p><strong>Methods: </strong>The proposed method introduces a novel approach that integrates clinical data and electrocardiograms (ECGs) using a colorization technique. This technique recolors ECG images based on patients' demographic information while preserving their original characteristics and incorporating color correlations from statistical data features. Our primary objective is to enhance atrial fibrillation (AF) detection by fusing ECG images with demographic data for colorization. To ensure the reliability of our dataset for training, validation, and testing, we rigorously maintained separation to prevent cross-contamination among these sets. We designed a Dual-input Mixed Neural Network (DMNN) that effectively handles different types of inputs, including demographic and image data, leveraging their mixed characteristics to optimize prediction performance. Unlike previous approaches, this method introduces demographic data through color transformation within ECG images, enriching the diversity of features for improved learning outcomes.</p><p><strong>Results: </strong>The proposed approach yielded promising results on the independent test set, achieving an impressive AUC of 83.4%. This outperformed the AUC of 75.8% obtained when using only the original signal values as input for the CNN. The evaluation of performance improvement revealed significant enhancements, including a 7.6% increase in AUC, an 11.3% boost in accuracy, a 9.4% improvement in sensitivity, an 11.6% enhancement in specificity, and a substantial 25.1% increase in the F1 score. Notably, AI diagnosis of AF was associated with future cardiovascular mortality. For clinical application, over a median follow-up of 71.6 ± 29.1 months, high-risk AI-predicted AF patients exhibited significantly higher cardiovascular mortality (AF vs. non-AF; 47 [18.7%] vs. 34 [4.8%]) and all-cause mortality (176 [52.9%] vs. 216 [26.3%]) compared to non-AF patients. In the low-risk group, AI-predicted AF patients showed slightly elevated cardiovascular (7 [0.7%] vs. 1 [0.3%]) and all-cause mortality (103 [9.0%] vs. 26 [6.4%]) than AI-predicted non-AF patients during six-year follow-up. These findings underscore the potential clinical utility of the AI model in predicting AF-related outcomes.</p><p><strong>Conclusions: </strong>This study introduces an ECG colorization approach to enhance atrial fibrillation (AF) detection using deep learning and demographic data, improving performance compared to ECG-only methods. This method is effective in identifying high-risk and low-risk populations, providing valuable features for future AF research
{"title":"Identifying the presence of atrial fibrillation during sinus rhythm using a dual-input mixed neural network with ECG coloring technology.","authors":"Wei-Wen Chen, Chih-Min Liu, Chien-Chao Tseng, Ching-Chun Huang, I-Chien Wu, Pei-Fen Chen, Shih-Lin Chang, Yenn-Jiang Lin, Li-Wei Lo, Fa-Po Chung, Tze-Fan Chao, Ta-Chuan Tuan, Jo-Nan Liao, Chin-Yu Lin, Ting-Yung Chang, Ling Kuo, Cheng-I Wu, Shin-Huei Liu, Jacky Chung-Hao Wu, Yu-Feng Hu, Shih-Ann Chen, Henry Horng-Shing Lu","doi":"10.1186/s12874-024-02421-0","DOIUrl":"10.1186/s12874-024-02421-0","url":null,"abstract":"<p><strong>Background: </strong>Undetected atrial fibrillation (AF) poses a significant risk of stroke and cardiovascular mortality. However, diagnosing AF in real-time can be challenging as the arrhythmia is often not captured instantly. To address this issue, a deep-learning model was developed to diagnose AF even during periods of arrhythmia-free windows.</p><p><strong>Methods: </strong>The proposed method introduces a novel approach that integrates clinical data and electrocardiograms (ECGs) using a colorization technique. This technique recolors ECG images based on patients' demographic information while preserving their original characteristics and incorporating color correlations from statistical data features. Our primary objective is to enhance atrial fibrillation (AF) detection by fusing ECG images with demographic data for colorization. To ensure the reliability of our dataset for training, validation, and testing, we rigorously maintained separation to prevent cross-contamination among these sets. We designed a Dual-input Mixed Neural Network (DMNN) that effectively handles different types of inputs, including demographic and image data, leveraging their mixed characteristics to optimize prediction performance. Unlike previous approaches, this method introduces demographic data through color transformation within ECG images, enriching the diversity of features for improved learning outcomes.</p><p><strong>Results: </strong>The proposed approach yielded promising results on the independent test set, achieving an impressive AUC of 83.4%. This outperformed the AUC of 75.8% obtained when using only the original signal values as input for the CNN. The evaluation of performance improvement revealed significant enhancements, including a 7.6% increase in AUC, an 11.3% boost in accuracy, a 9.4% improvement in sensitivity, an 11.6% enhancement in specificity, and a substantial 25.1% increase in the F1 score. Notably, AI diagnosis of AF was associated with future cardiovascular mortality. For clinical application, over a median follow-up of 71.6 ± 29.1 months, high-risk AI-predicted AF patients exhibited significantly higher cardiovascular mortality (AF vs. non-AF; 47 [18.7%] vs. 34 [4.8%]) and all-cause mortality (176 [52.9%] vs. 216 [26.3%]) compared to non-AF patients. In the low-risk group, AI-predicted AF patients showed slightly elevated cardiovascular (7 [0.7%] vs. 1 [0.3%]) and all-cause mortality (103 [9.0%] vs. 26 [6.4%]) than AI-predicted non-AF patients during six-year follow-up. These findings underscore the potential clinical utility of the AI model in predicting AF-related outcomes.</p><p><strong>Conclusions: </strong>This study introduces an ECG colorization approach to enhance atrial fibrillation (AF) detection using deep learning and demographic data, improving performance compared to ECG-only methods. This method is effective in identifying high-risk and low-risk populations, providing valuable features for future AF research ","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"318"},"PeriodicalIF":3.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11665121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142881113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-21DOI: 10.1186/s12874-024-02451-8
Dominic Stringer, Mollie Payne, Ben Carter, Richard Emsley
Background: The choice of a single primary outcome in randomised trials can be difficult, especially in mental health where interventions may be complex and target several outcomes simultaneously. We carried out a systematic review to assess the quality of the analysis and reporting of multiple outcomes in mental health RCTs, comparing approaches with current CONSORT and other regulatory guidance.
Methods: The review included all late-stage mental health trials published between 1st January 2019 to 31st December 2020 in 9 leading medical and mental health journals. Pilot and feasibility trials, non-randomised trials, and early phase trials were excluded. The total number of primary, secondary and other outcomes was recorded, as was any strategy used to incorporate multiple primary outcomes in the primary analysis.
Results: There were 147 included mental health trials. Most trials (101/147) followed CONSORT guidance by specifying a single primary outcome with other outcomes defined as secondary and analysed in separate statistical analyses, although a minority (10/147) did not specify any outcomes as primary. Where multiple primary outcomes were specified (33/147), most (26/33) did not correct for multiplicity, contradicting regulatory guidance. The median number of clinical outcomes reported across studies was 8 (IQR 5-11 ).
Conclusions: Most trials are correctly following CONSORT guidance. However, there was little consideration given to multiplicity or correlation between outcomes even where multiple primary outcomes were stated. Trials should correct for multiplicity when multiple primary outcomes are specified or describe some other strategy to address the multiplicity. Overall, very few mental health trials are taking advantage of multiple outcome strategies in the primary analysis, especially more complex strategies such as multivariate modelling. More work is required to show these exist, aid interpretation, increase efficiency and are easily implemented.
Registration: Our systematic review protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) on 11th January 2023 (CRD42023382274).
{"title":"The analysis and reporting of multiple outcomes in mental health trials: a methodological systematic review.","authors":"Dominic Stringer, Mollie Payne, Ben Carter, Richard Emsley","doi":"10.1186/s12874-024-02451-8","DOIUrl":"10.1186/s12874-024-02451-8","url":null,"abstract":"<p><strong>Background: </strong>The choice of a single primary outcome in randomised trials can be difficult, especially in mental health where interventions may be complex and target several outcomes simultaneously. We carried out a systematic review to assess the quality of the analysis and reporting of multiple outcomes in mental health RCTs, comparing approaches with current CONSORT and other regulatory guidance.</p><p><strong>Methods: </strong>The review included all late-stage mental health trials published between 1st January 2019 to 31st December 2020 in 9 leading medical and mental health journals. Pilot and feasibility trials, non-randomised trials, and early phase trials were excluded. The total number of primary, secondary and other outcomes was recorded, as was any strategy used to incorporate multiple primary outcomes in the primary analysis.</p><p><strong>Results: </strong>There were 147 included mental health trials. Most trials (101/147) followed CONSORT guidance by specifying a single primary outcome with other outcomes defined as secondary and analysed in separate statistical analyses, although a minority (10/147) did not specify any outcomes as primary. Where multiple primary outcomes were specified (33/147), most (26/33) did not correct for multiplicity, contradicting regulatory guidance. The median number of clinical outcomes reported across studies was 8 (IQR 5-11 ).</p><p><strong>Conclusions: </strong>Most trials are correctly following CONSORT guidance. However, there was little consideration given to multiplicity or correlation between outcomes even where multiple primary outcomes were stated. Trials should correct for multiplicity when multiple primary outcomes are specified or describe some other strategy to address the multiplicity. Overall, very few mental health trials are taking advantage of multiple outcome strategies in the primary analysis, especially more complex strategies such as multivariate modelling. More work is required to show these exist, aid interpretation, increase efficiency and are easily implemented.</p><p><strong>Registration: </strong>Our systematic review protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) on 11th January 2023 (CRD42023382274).</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"317"},"PeriodicalIF":3.9,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11662570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-21DOI: 10.1186/s12874-024-02444-7
Alexandra Strobel, Andreas Wienke, Jan Gummert, Sabine Bleiziffer, Oliver Kuss
Background: Propensity score matching has become a popular method for estimating causal treatment effects in non-randomized studies. However, for time-to-event outcomes, the estimation of hazard ratios based on propensity scores can be challenging if omitted or unobserved covariates are present. Not accounting for such covariates could lead to treatment estimates, differing from the estimate of interest. However, researchers often do not know whether (and, if so, which) covariates will cause this divergence.
Methods: To address this issue, we extended a previously described method, Dynamic Landmarking, which was originally developed for randomized trials. The method is based on successively deletion of sorted observations and gradually fitting univariable Cox models. In addition, the balance of observed, but omitted covariates can be measured by the sum of squared z-differences.
Results: By simulation we show, that Dynamic Landmarking provides a good visual tool for detecting and distinguishing treatment effect estimates underlying built-in selection or confounding bias. We illustrate the approach with a data set from cardiac surgery and provide some recommendations on how to use and interpret Dynamic Landmarking in propensity score matched studies.
Conclusion: Dynamic Landmarking is a useful post-hoc diagnosis tool for visualizing whether an estimated hazard ratio could be distorted by confounding or built-in selection bias.
{"title":"Built-in selection or confounder bias? Dynamic Landmarking in matched propensity score analyses.","authors":"Alexandra Strobel, Andreas Wienke, Jan Gummert, Sabine Bleiziffer, Oliver Kuss","doi":"10.1186/s12874-024-02444-7","DOIUrl":"10.1186/s12874-024-02444-7","url":null,"abstract":"<p><strong>Background: </strong>Propensity score matching has become a popular method for estimating causal treatment effects in non-randomized studies. However, for time-to-event outcomes, the estimation of hazard ratios based on propensity scores can be challenging if omitted or unobserved covariates are present. Not accounting for such covariates could lead to treatment estimates, differing from the estimate of interest. However, researchers often do not know whether (and, if so, which) covariates will cause this divergence.</p><p><strong>Methods: </strong>To address this issue, we extended a previously described method, Dynamic Landmarking, which was originally developed for randomized trials. The method is based on successively deletion of sorted observations and gradually fitting univariable Cox models. In addition, the balance of observed, but omitted covariates can be measured by the sum of squared z-differences.</p><p><strong>Results: </strong>By simulation we show, that Dynamic Landmarking provides a good visual tool for detecting and distinguishing treatment effect estimates underlying built-in selection or confounding bias. We illustrate the approach with a data set from cardiac surgery and provide some recommendations on how to use and interpret Dynamic Landmarking in propensity score matched studies.</p><p><strong>Conclusion: </strong>Dynamic Landmarking is a useful post-hoc diagnosis tool for visualizing whether an estimated hazard ratio could be distorted by confounding or built-in selection bias.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"316"},"PeriodicalIF":3.9,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11662801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02437-6
Amani Al Tawil, Sean McGrath, Robin Ristl, Ulrich Mansmann
Background: Treatment switching in randomized clinical trials introduces challenges in performing causal inference. Intention To Treat (ITT) analyses often fail to fully capture the causal effect of treatment in the presence of treatment switching. Consequently, decision makers may instead be interested in causal effects of hypothetical treatment strategies that do not allow for treatment switching. For example, the phase 3 ALTA-1L trial showed that brigatinib may have improved Overall Survival (OS) compared to crizotinib if treatment switching had not occurred. Their sensitivity analysis using Inverse Probability of Censoring Weights (IPCW), reported a Hazard Ratio (HR) of 0.50 (95% CI, 0.28-0.87), while their initial ITT analysis estimated an HR of 0.81 (0.53-1.22).
Methods: We used a directed acyclic graph to depict the clinical setting of the ALTA-1L trial in the presence of treatment switching, illustrating the concept of treatment-confounder feedback and highlighting the need for g-methods. In a re-analysis of the ALTA-1L trial data, we used IPCW and the parametric g-formula to adjust for baseline and time-varying covariates to estimate the effect of two hypothetical treatment strategies on OS: "always treat with brigatinib" versus "always treat with crizotinib". We conducted various sensitivity analyses using different model specifications and weight truncation approaches.
Results: Applying the IPCW approach in a series of sensitivity analyses yielded Cumulative HRs (cHRs) ranging between 0.38 (0.12, 0.98) and 0.73 (0.45,1.22) and Risk Ratios (RRs) ranging between 0.52 (0.32, 0.98) and 0.79 (0.54,1.17). Applying the parametric g-formula resulted in cHRs ranging between 0.61 (0.38,0.91) and 0.72 (0.43,1.07) and RRs ranging between 0.71 (0.48,0.94) and 0.79 (0.54,1.05).
Conclusion: Our results consistently indicated that our estimated ITT effect estimate (cHR: 0.82 (0.51,1.22) may have underestimated brigatinib's benefit by around 10-45 percentage points (using IPCW) and 10-20 percentage points (using the parametric g-formula) across a wide range of model choices. Our analyses underscore the importance of performing sensitivity analyses, as the result from a single analysis could potentially stand as an outlier in a whole range of sensitivity analyses.
Trial registration: Clinicaltrials.gov Identifier: NCT02737501 on April 14, 2016.
{"title":"Addressing treatment switching in the ALTA-1L trial with g-methods: exploring the impact of model specification.","authors":"Amani Al Tawil, Sean McGrath, Robin Ristl, Ulrich Mansmann","doi":"10.1186/s12874-024-02437-6","DOIUrl":"10.1186/s12874-024-02437-6","url":null,"abstract":"<p><strong>Background: </strong>Treatment switching in randomized clinical trials introduces challenges in performing causal inference. Intention To Treat (ITT) analyses often fail to fully capture the causal effect of treatment in the presence of treatment switching. Consequently, decision makers may instead be interested in causal effects of hypothetical treatment strategies that do not allow for treatment switching. For example, the phase 3 ALTA-1L trial showed that brigatinib may have improved Overall Survival (OS) compared to crizotinib if treatment switching had not occurred. Their sensitivity analysis using Inverse Probability of Censoring Weights (IPCW), reported a Hazard Ratio (HR) of 0.50 (95% CI, 0.28-0.87), while their initial ITT analysis estimated an HR of 0.81 (0.53-1.22).</p><p><strong>Methods: </strong>We used a directed acyclic graph to depict the clinical setting of the ALTA-1L trial in the presence of treatment switching, illustrating the concept of treatment-confounder feedback and highlighting the need for g-methods. In a re-analysis of the ALTA-1L trial data, we used IPCW and the parametric g-formula to adjust for baseline and time-varying covariates to estimate the effect of two hypothetical treatment strategies on OS: \"always treat with brigatinib\" versus \"always treat with crizotinib\". We conducted various sensitivity analyses using different model specifications and weight truncation approaches.</p><p><strong>Results: </strong>Applying the IPCW approach in a series of sensitivity analyses yielded Cumulative HRs (cHRs) ranging between 0.38 (0.12, 0.98) and 0.73 (0.45,1.22) and Risk Ratios (RRs) ranging between 0.52 (0.32, 0.98) and 0.79 (0.54,1.17). Applying the parametric g-formula resulted in cHRs ranging between 0.61 (0.38,0.91) and 0.72 (0.43,1.07) and RRs ranging between 0.71 (0.48,0.94) and 0.79 (0.54,1.05).</p><p><strong>Conclusion: </strong>Our results consistently indicated that our estimated ITT effect estimate (cHR: 0.82 (0.51,1.22) may have underestimated brigatinib's benefit by around 10-45 percentage points (using IPCW) and 10-20 percentage points (using the parametric g-formula) across a wide range of model choices. Our analyses underscore the importance of performing sensitivity analyses, as the result from a single analysis could potentially stand as an outlier in a whole range of sensitivity analyses.</p><p><strong>Trial registration: </strong>Clinicaltrials.gov Identifier: NCT02737501 on April 14, 2016.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"314"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02425-w
Alice J Sitch, Jacqueline Dinnes, Jenny Hewison, Walter Gregory, Julie Parkes, Jonathan J Deeks
Background: The aim of the study was to investigate the development of evidence-based monitoring strategies in a population with progressive or recurrent disease. A simulation study of monitoring strategies using a new biomarker (ELF) for the detection of liver cirrhosis in people with known liver fibrosis was undertaken alongside a randomised controlled trial (ELUCIDATE).
Methods: Existing data and expert opinion were used to estimate the progression of disease and the performance of repeat testing with ELF. Knowledge of the true disease status in addition to the observed test results for a cohort of simulated patients allowed various monitoring strategies to be implemented, evaluated and validated against trial data.
Results: Several monitoring strategies ranging in complexity were successfully modelled and compared regarding the timing of detection of disease, the duration of monitoring, and the predictive value of a positive test result. The results of sensitivity analysis showed the importance of accurate data to inform the simulation. Results of the simulation were similar to those from the trial.
Conclusion: Monitoring data can be simulated and strategies compared given adequate knowledge of disease progression and test performance. Such exercises should be carried out to ensure optimal strategies are evaluated in trials thus reducing research waste. Monitoring data can be generated and monitoring strategies can be assessed if data is available on the monitoring test performance and the test variability. This work highlights the data necessary and the general method for evaluating the performance of monitoring strategies, allowing appropriate strategies to be selected for evaluation. Modelling work should be conducted prior to full scale investigation of monitoring strategies, allowing optimal monitoring strategies to be assessed.
{"title":"Optimising research investment by simulating and evaluating monitoring strategies to inform a trial: a simulation of liver fibrosis monitoring.","authors":"Alice J Sitch, Jacqueline Dinnes, Jenny Hewison, Walter Gregory, Julie Parkes, Jonathan J Deeks","doi":"10.1186/s12874-024-02425-w","DOIUrl":"10.1186/s12874-024-02425-w","url":null,"abstract":"<p><strong>Background: </strong>The aim of the study was to investigate the development of evidence-based monitoring strategies in a population with progressive or recurrent disease. A simulation study of monitoring strategies using a new biomarker (ELF) for the detection of liver cirrhosis in people with known liver fibrosis was undertaken alongside a randomised controlled trial (ELUCIDATE).</p><p><strong>Methods: </strong>Existing data and expert opinion were used to estimate the progression of disease and the performance of repeat testing with ELF. Knowledge of the true disease status in addition to the observed test results for a cohort of simulated patients allowed various monitoring strategies to be implemented, evaluated and validated against trial data.</p><p><strong>Results: </strong>Several monitoring strategies ranging in complexity were successfully modelled and compared regarding the timing of detection of disease, the duration of monitoring, and the predictive value of a positive test result. The results of sensitivity analysis showed the importance of accurate data to inform the simulation. Results of the simulation were similar to those from the trial.</p><p><strong>Conclusion: </strong>Monitoring data can be simulated and strategies compared given adequate knowledge of disease progression and test performance. Such exercises should be carried out to ensure optimal strategies are evaluated in trials thus reducing research waste. Monitoring data can be generated and monitoring strategies can be assessed if data is available on the monitoring test performance and the test variability. This work highlights the data necessary and the general method for evaluating the performance of monitoring strategies, allowing appropriate strategies to be selected for evaluation. Modelling work should be conducted prior to full scale investigation of monitoring strategies, allowing optimal monitoring strategies to be assessed.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"315"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02434-9
Myeonggyun Lee, Andrea B Troxel, Sophia Kwon, George Crowley, Theresa Schwartz, Rachel Zeig-Owens, David J Prezant, Anna Nolan, Mengling Liu
Background: In cohort studies with time-to-event outcomes, covariates of interest often have values that change over time. The classical Cox regression model can handle time-dependent covariates but assumes linear effects on the log hazard function, which can be limiting in practice. Furthermore, when multiple correlated covariates are studied, it is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk.
Methods: Motivated by the World Trade Center (WTC)-exposed Fire Department of New York cohort study, we proposed a partial-linear single-index Cox (PLSI-Cox) model to investigate the effects of repeatedly measured metabolic syndrome indicators on the risk of developing WTC lung injury associated with particulate matter exposure. The PLSI-Cox model reduces the dimensionality of covariates while providing interpretable estimates of their effects. The model's flexible link function accommodates nonlinear effects on the log hazard function. We developed an iterative estimation algorithm using spline techniques to model the nonparametric single-index component for potential nonlinear effects, followed by maximum partial likelihood estimation of the parameters.
Results: Extensive simulations showed that the proposed PLSI-Cox model outperformed the classical time-dependent Cox regression model when the true relationship was nonlinear. When the relationship was linear, both the PLSI-Cox model and classical time-dependent Cox regression model performed similarly. In the data application, we found a possible nonlinear joint effect of metabolic syndrome indicators on survival risk. Among the different indicators, BMI had the largest positive effect on the risk of developing lung injury, followed by triglycerides.
Conclusion: The PLSI-Cox models allow for the evaluation of nonlinear effects of covariates and offer insights into their relative importance and direction. These methods provide a powerful set of tools for analyzing data with multiple time-dependent covariates and survival outcomes, potentially offering valuable insights for both current and future studies.
{"title":"Partial-linear single-index Cox regression models with multiple time-dependent covariates.","authors":"Myeonggyun Lee, Andrea B Troxel, Sophia Kwon, George Crowley, Theresa Schwartz, Rachel Zeig-Owens, David J Prezant, Anna Nolan, Mengling Liu","doi":"10.1186/s12874-024-02434-9","DOIUrl":"10.1186/s12874-024-02434-9","url":null,"abstract":"<p><strong>Background: </strong>In cohort studies with time-to-event outcomes, covariates of interest often have values that change over time. The classical Cox regression model can handle time-dependent covariates but assumes linear effects on the log hazard function, which can be limiting in practice. Furthermore, when multiple correlated covariates are studied, it is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk.</p><p><strong>Methods: </strong>Motivated by the World Trade Center (WTC)-exposed Fire Department of New York cohort study, we proposed a partial-linear single-index Cox (PLSI-Cox) model to investigate the effects of repeatedly measured metabolic syndrome indicators on the risk of developing WTC lung injury associated with particulate matter exposure. The PLSI-Cox model reduces the dimensionality of covariates while providing interpretable estimates of their effects. The model's flexible link function accommodates nonlinear effects on the log hazard function. We developed an iterative estimation algorithm using spline techniques to model the nonparametric single-index component for potential nonlinear effects, followed by maximum partial likelihood estimation of the parameters.</p><p><strong>Results: </strong>Extensive simulations showed that the proposed PLSI-Cox model outperformed the classical time-dependent Cox regression model when the true relationship was nonlinear. When the relationship was linear, both the PLSI-Cox model and classical time-dependent Cox regression model performed similarly. In the data application, we found a possible nonlinear joint effect of metabolic syndrome indicators on survival risk. Among the different indicators, BMI had the largest positive effect on the risk of developing lung injury, followed by triglycerides.</p><p><strong>Conclusion: </strong>The PLSI-Cox models allow for the evaluation of nonlinear effects of covariates and offer insights into their relative importance and direction. These methods provide a powerful set of tools for analyzing data with multiple time-dependent covariates and survival outcomes, potentially offering valuable insights for both current and future studies.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"311"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02442-9
Qiong Zou, Borui Chen, Yang Zhang, Xi Wu, Yi Wan, Changsheng Chen
Background: Accurate fasting plasma glucose (FPG) trend prediction is important for management and treatment of patients with type 2 diabetes mellitus (T2DM), a globally prevalent chronic disease. (Generalised) linear mixed-effects (LME) models and machine learning (ML) are commonly used to analyse longitudinal data; however, the former is insufficient for dealing with complex, nonlinear data, whereas with the latter, random effects are ignored. The aim of this study was to develop LME, back propagation neural network (BPNN), and mixed-effects NN models that combine the 2 to predict FPG levels.
Methods: Monitoring data from 779 patients with T2DM from a multicentre, prospective study from the shared platform Figshare repository were divided 80/20 into training/test sets. The first 10 important features were modelled via random forest (RF) screening. First, an LME model was built to model interindividual differences, analyse the factors affecting FPG levels, compare the AIC and BIC values to screen the optimal model, and predict FPG levels. Second, multiple BPNN models were constructed via different variable sets to screen the optimal BPNN. Finally, an LME/BPNN combined model, named LMENN, was constructed via stacking integration. A 10-fold cross-validation cycle was performed using the training set to build the model and evaluate its performance, and then the final model was evaluated on the test set.
Results: The top 10 variables screened by RF were HOMA-β, HbA1c, HOMA-IR, urinary sugar, insulin, BMI, waist circumference, weight, age, and group. The best-fitting random-intercept mixed-effects (lm22) model showed that each patient's baseline glucose levels influenced subsequent glucose measurements, but the trend over time was consistent. The LMENN model combines the strengths of LME and BPNN and accounts for random effects. The RMSE of the LMENN model ranges were 0.447-0.471 (training set), 0.525-0.552 (validation set), and 0.511-0.565 (test set). It improves the prediction performance of the single LME and BPNN models and shows some advantages in predicting FPG levels.
Conclusions: The LMENN model built by integrating LME and BPNN has several potential applications in analysing longitudinal FPG monitoring data. This study provides new ideas and methods for further research in the field of blood glucose prediction.
{"title":"Mixed-effects neural network modelling to predict longitudinal trends in fasting plasma glucose.","authors":"Qiong Zou, Borui Chen, Yang Zhang, Xi Wu, Yi Wan, Changsheng Chen","doi":"10.1186/s12874-024-02442-9","DOIUrl":"10.1186/s12874-024-02442-9","url":null,"abstract":"<p><strong>Background: </strong>Accurate fasting plasma glucose (FPG) trend prediction is important for management and treatment of patients with type 2 diabetes mellitus (T2DM), a globally prevalent chronic disease. (Generalised) linear mixed-effects (LME) models and machine learning (ML) are commonly used to analyse longitudinal data; however, the former is insufficient for dealing with complex, nonlinear data, whereas with the latter, random effects are ignored. The aim of this study was to develop LME, back propagation neural network (BPNN), and mixed-effects NN models that combine the 2 to predict FPG levels.</p><p><strong>Methods: </strong>Monitoring data from 779 patients with T2DM from a multicentre, prospective study from the shared platform Figshare repository were divided 80/20 into training/test sets. The first 10 important features were modelled via random forest (RF) screening. First, an LME model was built to model interindividual differences, analyse the factors affecting FPG levels, compare the AIC and BIC values to screen the optimal model, and predict FPG levels. Second, multiple BPNN models were constructed via different variable sets to screen the optimal BPNN. Finally, an LME/BPNN combined model, named LMENN, was constructed via stacking integration. A 10-fold cross-validation cycle was performed using the training set to build the model and evaluate its performance, and then the final model was evaluated on the test set.</p><p><strong>Results: </strong>The top 10 variables screened by RF were HOMA-β, HbA1c, HOMA-IR, urinary sugar, insulin, BMI, waist circumference, weight, age, and group. The best-fitting random-intercept mixed-effects (lm22) model showed that each patient's baseline glucose levels influenced subsequent glucose measurements, but the trend over time was consistent. The LMENN model combines the strengths of LME and BPNN and accounts for random effects. The RMSE of the LMENN model ranges were 0.447-0.471 (training set), 0.525-0.552 (validation set), and 0.511-0.565 (test set). It improves the prediction performance of the single LME and BPNN models and shows some advantages in predicting FPG levels.</p><p><strong>Conclusions: </strong>The LMENN model built by integrating LME and BPNN has several potential applications in analysing longitudinal FPG monitoring data. This study provides new ideas and methods for further research in the field of blood glucose prediction.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"313"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02443-8
Junlong Ma, Heng Chen, Ji Sun, Juanjuan Huang, Gefei He, Guoping Yang
Background: Liver injury from drug-drug interactions (DDIs), notably with anti-tuberculosis drugs such as isoniazid, poses a significant safety concern. Electronic medical records contain comprehensive clinical information and have gained increasing attention as a potential resource for DDI detection. However, a substantial portion of adverse drug reaction (ADR) information is hidden in unstructured narrative text, which has yet to be efficiently harnessed, thereby introducing bias into the research. There is a significant need for an efficient framework for the DDI assessment.
Methods: Using a Chinese natural language processing (NLP) model, we extracted 25,130 adverse drug reaction (ADR) records, dividing them into sets for training an automated normalization model. The trained models, in conjunction with liver function laboratory tests, were used to thoroughly and efficiently identify liver injury cases. Ultimately, we applied a case-control study design to detect DDI signals increasing isoniazid's liver injury risk.
Results: The Logistic Regression model demonstrated stable and superior performance in classification task. Based on laboratory criteria and NLP, we identified 128 liver injury cases among a cohort of 3,209 patients treated with isoniazid. Preliminary screening of 113 drug combinations with isoniazid highlighted 20 potential signal drugs, with antibacterials constituting 25%. Sensitivity analysis confirmed the robustness of signal drugs, especially in cardiac therapy and antibacterials.
Conclusion: Our NLP and machine learning approach effectively identifies isoniazid-related DDIs that increase the risk of liver injury, identifying 20 signal drugs, mainly antibacterials. Further research is required to validate these DDI signals.
{"title":"Efficient analysis of drug interactions in liver injury: a retrospective study leveraging natural language processing and machine learning.","authors":"Junlong Ma, Heng Chen, Ji Sun, Juanjuan Huang, Gefei He, Guoping Yang","doi":"10.1186/s12874-024-02443-8","DOIUrl":"10.1186/s12874-024-02443-8","url":null,"abstract":"<p><strong>Background: </strong>Liver injury from drug-drug interactions (DDIs), notably with anti-tuberculosis drugs such as isoniazid, poses a significant safety concern. Electronic medical records contain comprehensive clinical information and have gained increasing attention as a potential resource for DDI detection. However, a substantial portion of adverse drug reaction (ADR) information is hidden in unstructured narrative text, which has yet to be efficiently harnessed, thereby introducing bias into the research. There is a significant need for an efficient framework for the DDI assessment.</p><p><strong>Methods: </strong>Using a Chinese natural language processing (NLP) model, we extracted 25,130 adverse drug reaction (ADR) records, dividing them into sets for training an automated normalization model. The trained models, in conjunction with liver function laboratory tests, were used to thoroughly and efficiently identify liver injury cases. Ultimately, we applied a case-control study design to detect DDI signals increasing isoniazid's liver injury risk.</p><p><strong>Results: </strong>The Logistic Regression model demonstrated stable and superior performance in classification task. Based on laboratory criteria and NLP, we identified 128 liver injury cases among a cohort of 3,209 patients treated with isoniazid. Preliminary screening of 113 drug combinations with isoniazid highlighted 20 potential signal drugs, with antibacterials constituting 25%. Sensitivity analysis confirmed the robustness of signal drugs, especially in cardiac therapy and antibacterials.</p><p><strong>Conclusion: </strong>Our NLP and machine learning approach effectively identifies isoniazid-related DDIs that increase the risk of liver injury, identifying 20 signal drugs, mainly antibacterials. Further research is required to validate these DDI signals.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"312"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1186/s12874-024-02431-y
Sophie Vanbelle, Christina Hernandez Engelhart, Ellen Blix
Background: A recent systematic review revealed issues in regard to performing and reporting agreement and reliability studies for ordinal scales, especially in the presence of more than two observers. This paper therefore aims to provide all necessary information in regard to the choice among the most meaningful and most used measures and the planning of agreement and reliability studies for ordinal outcomes.
Methods: This paper considers the generalisation of the proportion of (dis)agreement, the mean absolute deviation, the mean squared deviation and weighted kappa coefficients to more than two observers in the presence of an ordinal outcome.
Results: After highlighting the difference between the concepts of agreement and reliability, a clear and simple interpretation of the agreement and reliability coefficients is provided. The large sample variance of the various coefficients with the delta method is presented or derived if not available in the literature to construct Wald confidence intervals. Finally, a procedure to determine the minimum number of raters and patients needed to limit the uncertainty associated with the sampling process is provided. All the methods are available in an R package and a Shiny application to circumvent the limitations of current software.
Conclusions: The present paper completes existing guidelines, such as the Guidelines for Reporting Reliability and Agreement Studies (GRRAS), to improve the quality of reliability and agreement studies of clinical tests. Furthermore, we provide open source software to researchers with minimum programming skills.
{"title":"A comprehensive guide to study the agreement and reliability of multi-observer ordinal data.","authors":"Sophie Vanbelle, Christina Hernandez Engelhart, Ellen Blix","doi":"10.1186/s12874-024-02431-y","DOIUrl":"10.1186/s12874-024-02431-y","url":null,"abstract":"<p><strong>Background: </strong>A recent systematic review revealed issues in regard to performing and reporting agreement and reliability studies for ordinal scales, especially in the presence of more than two observers. This paper therefore aims to provide all necessary information in regard to the choice among the most meaningful and most used measures and the planning of agreement and reliability studies for ordinal outcomes.</p><p><strong>Methods: </strong>This paper considers the generalisation of the proportion of (dis)agreement, the mean absolute deviation, the mean squared deviation and weighted kappa coefficients to more than two observers in the presence of an ordinal outcome.</p><p><strong>Results: </strong>After highlighting the difference between the concepts of agreement and reliability, a clear and simple interpretation of the agreement and reliability coefficients is provided. The large sample variance of the various coefficients with the delta method is presented or derived if not available in the literature to construct Wald confidence intervals. Finally, a procedure to determine the minimum number of raters and patients needed to limit the uncertainty associated with the sampling process is provided. All the methods are available in an R package and a Shiny application to circumvent the limitations of current software.</p><p><strong>Conclusions: </strong>The present paper completes existing guidelines, such as the Guidelines for Reporting Reliability and Agreement Studies (GRRAS), to improve the quality of reliability and agreement studies of clinical tests. Furthermore, we provide open source software to researchers with minimum programming skills.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"310"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}