Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311788
L. Alketbi, Yousef Boobes, N. Nagelkerke, H. Aleissaee, N. AlShamsi, M. Almansoori, Ahmed Hemaid, M. AlDobaee, Noura AlAlawi, R. AlKetbi, T. Fahmawee, B. AlHashaikeh, A. AlAzeezi, F. Shuaib, J. Alnuaimi, E. Mahmoud, N. AlAhbabi, Bachar Afandi, Retrospective Cohort Study
The impact of abnormal Glomerular Filtration Rate (eGFR) on various adverse outcomes has been well studied; however, the United Arab Emirates (UAE), like many other regions in the world, remains understudied in this area. Method This retrospective cohort study estimates the age and sex-specific Glomerular Filtration Rate (eGFR) in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular (ASCVD) outcomes. The cohort of 8699 participants in a national cardiovascular disease screening from 2011 to 2013. The cohort was reevaluated in 2023 for mortality and cardiovascular outcomes. Reference eGFR percentiles were estimated from subjects without comorbidities using the LMS method. Results The reference percentiles of normal eGFR values showed a marked decrease with age, with small sex differences in the reference percentile distribution. A prognostic definition of renal hyperfiltration (RH) is suggested by the observation that subjects in the 97th percentile had a significantly higher incidence of ASCVD, although not statistically significant, in terms of mortality rate. Older age, female sex, history of ASCVD, history of hypertension, being treated for hypertension, lower diastolic blood pressure, higher systolic blood pressure, lower HDL, higher HA1C, and higher vitamin D were significantly associated with lower eGFR percentiles. Subjects in the two categories within the RH range, the 95th and 97th percentiles, had a significantly higher prevalence of diabetes; they are older smokers with higher BMI, higher HA1C, higher HDL, lower vitamin D, and more likely to be males, with higher physical activity and have a lower prevalence of CHD. Conclusion The distribution of eGFR by age and sex is valuable for clinical decision-making in Abu Dhabi and likely for the Arab population in general. Although the 95th percentile of eGFR in this cohort showed a higher but nonsignificant risk, the 97th percentile is significantly associated with ASCVD, even more than subjects in the less than 10th eGFR percentile. This study provides important insights into the prevalence and risk factors associated with different eGFR percentiles in the Abu Dhabi population. The findings underscore the need for targeted interventions to address modifiable risk factors and prevent the progression of renal damage in this high-risk population.
{"title":"Estimation of age and sex specific Glomerular Filtration Rate in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular outcome. A Retrospective Cohort Study","authors":"L. Alketbi, Yousef Boobes, N. Nagelkerke, H. Aleissaee, N. AlShamsi, M. Almansoori, Ahmed Hemaid, M. AlDobaee, Noura AlAlawi, R. AlKetbi, T. Fahmawee, B. AlHashaikeh, A. AlAzeezi, F. Shuaib, J. Alnuaimi, E. Mahmoud, N. AlAhbabi, Bachar Afandi, Retrospective Cohort Study","doi":"10.1101/2024.08.10.24311788","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311788","url":null,"abstract":"The impact of abnormal Glomerular Filtration Rate (eGFR) on various adverse outcomes has been well studied; however, the United Arab Emirates (UAE), like many other regions in the world, remains understudied in this area. Method This retrospective cohort study estimates the age and sex-specific Glomerular Filtration Rate (eGFR) in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular (ASCVD) outcomes. The cohort of 8699 participants in a national cardiovascular disease screening from 2011 to 2013. The cohort was reevaluated in 2023 for mortality and cardiovascular outcomes. Reference eGFR percentiles were estimated from subjects without comorbidities using the LMS method. Results The reference percentiles of normal eGFR values showed a marked decrease with age, with small sex differences in the reference percentile distribution. A prognostic definition of renal hyperfiltration (RH) is suggested by the observation that subjects in the 97th percentile had a significantly higher incidence of ASCVD, although not statistically significant, in terms of mortality rate. Older age, female sex, history of ASCVD, history of hypertension, being treated for hypertension, lower diastolic blood pressure, higher systolic blood pressure, lower HDL, higher HA1C, and higher vitamin D were significantly associated with lower eGFR percentiles. Subjects in the two categories within the RH range, the 95th and 97th percentiles, had a significantly higher prevalence of diabetes; they are older smokers with higher BMI, higher HA1C, higher HDL, lower vitamin D, and more likely to be males, with higher physical activity and have a lower prevalence of CHD. Conclusion The distribution of eGFR by age and sex is valuable for clinical decision-making in Abu Dhabi and likely for the Arab population in general. Although the 95th percentile of eGFR in this cohort showed a higher but nonsignificant risk, the 97th percentile is significantly associated with ASCVD, even more than subjects in the less than 10th eGFR percentile. This study provides important insights into the prevalence and risk factors associated with different eGFR percentiles in the Abu Dhabi population. The findings underscore the need for targeted interventions to address modifiable risk factors and prevent the progression of renal damage in this high-risk population.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"15 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311796
Masab A. Mansoor, Dba
Abstract Background Somatic mutations play a crucial role in cancer initiation, progression, and treatment response. While high-throughput sequencing has vastly expanded our understanding of cancer genomics, interpreting the functional impact of novel somatic mutations remains challenging. Machine learning approaches show promise in predicting mutation impacts, but robust models for accurate prognosis across different cancer types are still needed. Objective This study aimed to develop and validate a machine learning model using the Catalogue of Somatic Mutations in Cancer (COSMIC) database to predict the functional impact of novel somatic mutations on cancer prognosis across various cancer types. Methods We extracted data on 6,573,214 coding point mutations across 1,391 cancer types from COSMIC v95. We engineered 47 features for each mutation, including sequence context, protein domain information, evolutionary conservation scores, and frequency data. We developed and compared Random Forest, XGBoost, and Deep Neural Network models, selecting XGBoost based on performance. The model was evaluated using standard metrics and externally validated using data from The Cancer Genome Atlas (TCGA). Results The XGBoost model achieved an area under the Receiver Operating Characteristic curve (AUC-ROC) of 0.89 on the test set and 0.86 on the TCGA validation set. The model demonstrated consistent performance across major cancer types (AUC-ROC range: 0.85-0.92). Key predictive features included evolutionary conservation score, protein domain disruption, and mutation frequency. The model correctly identified 87% of known driver mutations and predicted 3,241 potentially high-impact novel mutations. Model predictions significantly correlated with patient survival in the TCGA dataset (HR = 1.8, 95% CI: 1.6-2.0, p < 0.001). Conclusions Our machine learning model shows strong predictive power in assessing the functional impact of somatic mutations on cancer prognosis across various cancer types. This approach has potential applications in research prioritization and clinical decision support, contributing to the advancement of precision oncology. Keywords cancer genomics; somatic mutations; machine learning; prognosis prediction; COSMIC database; precision oncology
{"title":"Predictive Modeling of Novel Somatic Mutation Impacts on Cancer Prognosis: A Machine Learning Approach Using the COSMIC Database","authors":"Masab A. Mansoor, Dba","doi":"10.1101/2024.08.10.24311796","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311796","url":null,"abstract":"Abstract Background Somatic mutations play a crucial role in cancer initiation, progression, and treatment response. While high-throughput sequencing has vastly expanded our understanding of cancer genomics, interpreting the functional impact of novel somatic mutations remains challenging. Machine learning approaches show promise in predicting mutation impacts, but robust models for accurate prognosis across different cancer types are still needed. Objective This study aimed to develop and validate a machine learning model using the Catalogue of Somatic Mutations in Cancer (COSMIC) database to predict the functional impact of novel somatic mutations on cancer prognosis across various cancer types. Methods We extracted data on 6,573,214 coding point mutations across 1,391 cancer types from COSMIC v95. We engineered 47 features for each mutation, including sequence context, protein domain information, evolutionary conservation scores, and frequency data. We developed and compared Random Forest, XGBoost, and Deep Neural Network models, selecting XGBoost based on performance. The model was evaluated using standard metrics and externally validated using data from The Cancer Genome Atlas (TCGA). Results The XGBoost model achieved an area under the Receiver Operating Characteristic curve (AUC-ROC) of 0.89 on the test set and 0.86 on the TCGA validation set. The model demonstrated consistent performance across major cancer types (AUC-ROC range: 0.85-0.92). Key predictive features included evolutionary conservation score, protein domain disruption, and mutation frequency. The model correctly identified 87% of known driver mutations and predicted 3,241 potentially high-impact novel mutations. Model predictions significantly correlated with patient survival in the TCGA dataset (HR = 1.8, 95% CI: 1.6-2.0, p < 0.001). Conclusions Our machine learning model shows strong predictive power in assessing the functional impact of somatic mutations on cancer prognosis across various cancer types. This approach has potential applications in research prioritization and clinical decision support, contributing to the advancement of precision oncology. Keywords cancer genomics; somatic mutations; machine learning; prognosis prediction; COSMIC database; precision oncology","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24305340
M. Pelegrini-Issac, A. Hezghia, E. Caron, S. Delphine, V. Battisti, D. Cassereau, C. Debarle, M. Lefort, B. Lesimple, G. Torkomian, V. Degos, R. Bernard, D. Galanaud, P. Pradat-Diehl, V. Perlbarg, É. Bayen, L. Puybasset
Moderate to severe traumatic brain injury (TBI) should be considered as a chronic health condition. The corpus callosum is the brain region that suffers most from diffuse axonal injury, leading to long-term functional deficits. Few studies have considered the relationships between inter- and intrahemispheric functional connectivity and structural damages to the corpus callosum in chronic TBI patients. We examined how callosal functional connectivity and white matter alterations relate to clinical outcome using multimodal magnetic resonance imaging (MRI): structural MRI estimates callosal volume, diffusion-weighted MRI enables white matter integrity quantification, resting-state functional MRI assesses neural dysfunction. Seventy-four patients underwent a multimodal MRI session on average 5 years after a moderate-to-severe TBI. Multiple factorial analysis analyzed the relationships between clinical outcome (from severe disability to good recovery, assessed by the Glasgow Outcome Scale extended GOSE), callosal volume, diffusion metrics (fractional anisotropy and mean, axial, and radial diffusivity), and inter- and intrahemispheric functional connectivity. Multiple factorial analysis confirmed that patients with severe disability (GOSE 3-4) had more structural alterations in the corpus callosum than patients with a good recovery (GOSE 7-8). Most importantly, patients able to live independently but unable to work/study in a standard environment (GOSE 5-6) could not be described solely by structural features. They exhibited a lower interhemispheric connectivity between cortical regions mediated by the corpus callosum than patients with a good recovery, and a tendency towards a decrease in intrahemispheric connectivity compared with severely disabled patients. These findings suggest a complex long-term functional impact of moderate-to-severe TBI.
{"title":"Multimodal magnetic resonance imaging characterizes clinical outcome in chronic traumatic brain injury","authors":"M. Pelegrini-Issac, A. Hezghia, E. Caron, S. Delphine, V. Battisti, D. Cassereau, C. Debarle, M. Lefort, B. Lesimple, G. Torkomian, V. Degos, R. Bernard, D. Galanaud, P. Pradat-Diehl, V. Perlbarg, É. Bayen, L. Puybasset","doi":"10.1101/2024.08.10.24305340","DOIUrl":"https://doi.org/10.1101/2024.08.10.24305340","url":null,"abstract":"Moderate to severe traumatic brain injury (TBI) should be considered as a chronic health condition. The corpus callosum is the brain region that suffers most from diffuse axonal injury, leading to long-term functional deficits. Few studies have considered the relationships between inter- and intrahemispheric functional connectivity and structural damages to the corpus callosum in chronic TBI patients. We examined how callosal functional connectivity and white matter alterations relate to clinical outcome using multimodal magnetic resonance imaging (MRI): structural MRI estimates callosal volume, diffusion-weighted MRI enables white matter integrity quantification, resting-state functional MRI assesses neural dysfunction. Seventy-four patients underwent a multimodal MRI session on average 5 years after a moderate-to-severe TBI. Multiple factorial analysis analyzed the relationships between clinical outcome (from severe disability to good recovery, assessed by the Glasgow Outcome Scale extended GOSE), callosal volume, diffusion metrics (fractional anisotropy and mean, axial, and radial diffusivity), and inter- and intrahemispheric functional connectivity. Multiple factorial analysis confirmed that patients with severe disability (GOSE 3-4) had more structural alterations in the corpus callosum than patients with a good recovery (GOSE 7-8). Most importantly, patients able to live independently but unable to work/study in a standard environment (GOSE 5-6) could not be described solely by structural features. They exhibited a lower interhemispheric connectivity between cortical regions mediated by the corpus callosum than patients with a good recovery, and a tendency towards a decrease in intrahemispheric connectivity compared with severely disabled patients. These findings suggest a complex long-term functional impact of moderate-to-severe TBI.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311370
MASc Johnathan R. Lex MBChB, Jacob Mosseri BASc MASc, Mba Frcsc Jay Toor MD, Aazad Abbas HBSc, Michael Simone BASc, Bheeshma Ravi, Cari M. Whyne, Elias B. Khalil
Objective: To determine the potential for improving elective surgery scheduling for total knee and hip arthroplasty (TKA and THA, respectively) by utilizing a two-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Materials and Methods: Two ML models (for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 examples, respectively, from a large international database. Three optimization formulations based on varying surgeon flexibility were compared: Any- surgeons could operate in any operating room at any time, Split- limitation of two surgeons per operating room per day, and MSSP- limit of one surgeon per operating room per day. Two years of daily scheduling simulations were performed for each optimization problem using ML-prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high volume arthroplasty hospital in Canada. Results: The Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (p<0.001). The latter two problems performed similarly (p>0.05) over most schedule parameters. The ML-prediction schedules outperformed those generated using a mean DOS over all schedule parameters, with overtime reduced on average by 300 to 500 minutes per week. Using a 15-minute schedule granularity with a wait list pool of minimum 1 month generated the best schedules. Conclusion: Assuming a full waiting list, optimizing an individual surgeons elective operating room time using an ML-assisted predict-then optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime.
目的利用机器学习(ML)预测手术持续时间(DOS)和优化排期的两阶段方法,确定改善全膝关节和髋关节置换术(分别为 TKA 和 THA)择期手术排期的潜力。材料与方法:分别根据大型国际数据库中的 302,490 例和 196,942 例实例,使用患者因素对两个 ML 模型(TKA 和 THA)进行训练,以预测 DOS。比较了基于不同外科医生灵活性的三种优化方案:Any--外科医生可以在任何时间在任何手术室进行手术;Split--限制每天每个手术室有两名外科医生;MSSP--限制每天每个手术室有一名外科医生。针对每个优化问题,使用 ML 预测法或平均 DOS 法对一系列日程参数进行了为期两年的每日日程安排模拟。约束条件和资源以加拿大一家高产量关节成形术医院为基础。结果:在大多数日程参数下,任何日程安排方案在超时和利用不足方面的表现明显差于 Split 和 MSSP 方案(P0.05)。在所有排程参数上,ML 预测排程的表现优于使用平均 DOS 生成的排程,每周平均减少加班 300 到 500 分钟。使用 15 分钟的计划粒度和最少 1 个月的候补名单池生成了最佳计划。结论假定有完整的候诊名单,使用 ML 辅助的 "先预测后优化 "排班系统优化外科医生的择期手术室时间,可提高手术室的整体效率,显著减少加班时间。
{"title":"Machine Learning to Predict-Then-Optimize Elective Orthopaedic Surgery Scheduling Improves Operating Room Utilization","authors":"MASc Johnathan R. Lex MBChB, Jacob Mosseri BASc MASc, Mba Frcsc Jay Toor MD, Aazad Abbas HBSc, Michael Simone BASc, Bheeshma Ravi, Cari M. Whyne, Elias B. Khalil","doi":"10.1101/2024.08.10.24311370","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311370","url":null,"abstract":"Objective: To determine the potential for improving elective surgery scheduling for total knee and hip arthroplasty (TKA and THA, respectively) by utilizing a two-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Materials and Methods: Two ML models (for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 examples, respectively, from a large international database. Three optimization formulations based on varying surgeon flexibility were compared: Any- surgeons could operate in any operating room at any time, Split- limitation of two surgeons per operating room per day, and MSSP- limit of one surgeon per operating room per day. Two years of daily scheduling simulations were performed for each optimization problem using ML-prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high volume arthroplasty hospital in Canada. Results: The Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (p<0.001). The latter two problems performed similarly (p>0.05) over most schedule parameters. The ML-prediction schedules outperformed those generated using a mean DOS over all schedule parameters, with overtime reduced on average by 300 to 500 minutes per week. Using a 15-minute schedule granularity with a wait list pool of minimum 1 month generated the best schedules. Conclusion: Assuming a full waiting list, optimizing an individual surgeons elective operating room time using an ML-assisted predict-then optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"5 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311795
Masab A. Mansoor, Dba
Background Improved survival rates in pediatric cancer have shifted focus to long-term effects of treatment, with cardiovascular complications emerging as a leading cause of morbidity and mortality. Understanding the patterns and predictors of cardiotoxicity is crucial for risk stratification, treatment optimization, and long-term care planning. Objective This study aimed to investigate the prevalence, incidence, and risk factors of cardiotoxicity in pediatric cancer survivors using data from the Childhood Cancer Survivor Study (CCSS). Methods We conducted a retrospective cohort study of 24,938 five-year survivors of childhood cancer diagnosed between 1970 and 1999. Cardiovascular complications, including cardiomyopathy, coronary artery disease, valvular heart disease, and arrhythmias, were assessed through self-reported questionnaires and medical record review. Cox proportional hazards models were used to evaluate risk factors, and a prediction model was developed using multivariable logistic regression. Results The cumulative incidence of any cardiovascular complication by 30 years post-diagnosis was 18.7% (95% CI: 17.9%-19.5%). Significant risk factors included anthracycline exposure (HR 2.31, 95% CI: 2.09-2.55 for doses [≥] 250 mg/m), chest radiation (HR 1.84, 95% CI: 1.66-2.05 for doses [≥] 20 Gy), older age at diagnosis, male sex, and obesity. A risk prediction model demonstrated good discrimination (C-statistic: 0.78, 95% CI: 0.76-0.80). Survivors had a significantly higher risk of cardiovascular complications compared to sibling controls (OR 3.7, 95% CI: 3.2-4.2). Conclusions Childhood cancer survivors face a substantial and persistent risk of cardiovascular complications. The identified risk factors and prediction model can guide personalized follow-up strategies and interventions. These findings underscore the need for lifelong cardiovascular monitoring and care in this population.
{"title":"Cardiotoxicity in Pediatric Cancer Survivorship: Patterns, Predictors, and Implications for Long-term Care","authors":"Masab A. Mansoor, Dba","doi":"10.1101/2024.08.10.24311795","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311795","url":null,"abstract":"Background Improved survival rates in pediatric cancer have shifted focus to long-term effects of treatment, with cardiovascular complications emerging as a leading cause of morbidity and mortality. Understanding the patterns and predictors of cardiotoxicity is crucial for risk stratification, treatment optimization, and long-term care planning. Objective This study aimed to investigate the prevalence, incidence, and risk factors of cardiotoxicity in pediatric cancer survivors using data from the Childhood Cancer Survivor Study (CCSS). Methods We conducted a retrospective cohort study of 24,938 five-year survivors of childhood cancer diagnosed between 1970 and 1999. Cardiovascular complications, including cardiomyopathy, coronary artery disease, valvular heart disease, and arrhythmias, were assessed through self-reported questionnaires and medical record review. Cox proportional hazards models were used to evaluate risk factors, and a prediction model was developed using multivariable logistic regression. Results The cumulative incidence of any cardiovascular complication by 30 years post-diagnosis was 18.7% (95% CI: 17.9%-19.5%). Significant risk factors included anthracycline exposure (HR 2.31, 95% CI: 2.09-2.55 for doses [≥] 250 mg/m), chest radiation (HR 1.84, 95% CI: 1.66-2.05 for doses [≥] 20 Gy), older age at diagnosis, male sex, and obesity. A risk prediction model demonstrated good discrimination (C-statistic: 0.78, 95% CI: 0.76-0.80). Survivors had a significantly higher risk of cardiovascular complications compared to sibling controls (OR 3.7, 95% CI: 3.2-4.2). Conclusions Childhood cancer survivors face a substantial and persistent risk of cardiovascular complications. The identified risk factors and prediction model can guide personalized follow-up strategies and interventions. These findings underscore the need for lifelong cardiovascular monitoring and care in this population.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"2 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.11.24311810
M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang
Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.
{"title":"Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios","authors":"M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang","doi":"10.1101/2024.08.11.24311810","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311810","url":null,"abstract":"Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"2 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.11.24311807
M. Sugimoto, T. Takagi, T. Suzuki, H. Shimizu, G. Shibukawa, Y. Nakajima, Y. Takeda, Y. Noguchi, R. Kobayashi, H. Imamura, H. Asama, N. Konno, Y. Waragai, H. Akatsuka, R. Suzuki, T. Hikichi, H. Ohira
Background: Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is a severe and deadly adverse event following ERCP. The ideal method for predicting PEP risk before ERCP has yet to be identified. We aimed to establish a simple PEP risk score model (SuPER model: Support for PEP Reduction) that can be applied before ERCP. Methods: This multicenter study enrolled 2074 patients who underwent ERCP. Among them, 1037 patients each were randomly assigned to the development and validation cohorts. In the development cohort, the risk score model for predicting PEP was established by logistic regression analysis. In the validation cohort, the performance of the model was assessed. Results: In the development cohort, five PEP risk factors that could be identified before ERCP were extracted and assigned weights according to their respective regression coefficients: -2 points for pancreatic calcification, 1 point for female sex, and 2 points for intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. The PEP occurrence rate was 0% among low-risk patients ([≤] 0 points), 5.5% among moderate-risk patients (1 to 3 points), and 20.2% among high-risk patients (4 to 7 points). In the validation cohort, the C-statistic of the risk score model was 0.71 (95% CI 0.64-0.78), which was considered acceptable. The PEP risk classification (low, moderate, and high) was a significant predictive factor for PEP that was independent from intraprocedural PEP risk factors (precut sphincterotomy and inadvertent pancreatic duct cannulation) (OR 4.2, 95% CI 2.8-6.3, P < 0.01). Conclusions: The PEP risk score allows an estimation of the risk of PEP prior to ERCP, regardless of whether the patient has undergone pancreatic duct procedures. This simple risk model, consisting of only five items, may aid in predicting and explaining the risk of PEP before ERCP and in preventing PEP by allowing selection of the appropriate expert endoscopist and useful PEP prophylaxes.
背景:内镜逆行胰胆管造影术(ERCP)后胰腺炎(PEP)是ERCP术后严重且致命的不良反应。ERCP术前预测胰腺炎风险的理想方法尚未确定。我们的目标是建立一个简单的 PEP 风险评分模型(SuPER 模型:支持减少 PEP),该模型可在 ERCP 前应用:这项多中心研究共纳入 2074 名接受 ERCP 的患者。方法:这项多中心研究共纳入 2074 名接受 ERCP 的患者,其中 1037 名患者被随机分配到开发组和验证组。在开发组中,通过逻辑回归分析建立了预测 PEP 的风险评分模型。在验证队列中,对模型的性能进行了评估:在开发队列中,提取了ERCP前可确定的五个PEP风险因素,并根据其各自的回归系数赋予权重:胰腺钙化为-2分,女性为1分,导管内乳头状粘液瘤、原生瓦特乳头或使用胰管手术为2分。低危患者([≤] 0 分)的 PEP 发生率为 0%,中危患者(1 至 3 分)为 5.5%,高危患者(4 至 7 分)为 20.2%。在验证队列中,风险评分模型的 C 统计量为 0.71(95% CI 0.64-0.78),可以接受。PEP风险分级(低、中、高)是PEP的重要预测因素,独立于术中PEP风险因素(括约肌切开术前和胰管插管不慎)(OR 4.2,95% CI 2.8-6.3,P < 0.01):无论患者是否接受过胰管手术,PEP 风险评分都能估算出 ERCP 术前发生 PEP 的风险。这个简单的风险模型只有五个项目,有助于预测和解释ERCP术前PEP的风险,并通过选择合适的内镜专家和有用的PEP预防措施来预防PEP。
{"title":"A new preprocedural predictive risk model for post-endoscopic retrograde cholangiopancreatography pancreatitis: The SuPER model","authors":"M. Sugimoto, T. Takagi, T. Suzuki, H. Shimizu, G. Shibukawa, Y. Nakajima, Y. Takeda, Y. Noguchi, R. Kobayashi, H. Imamura, H. Asama, N. Konno, Y. Waragai, H. Akatsuka, R. Suzuki, T. Hikichi, H. Ohira","doi":"10.1101/2024.08.11.24311807","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311807","url":null,"abstract":"Background: Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is a severe and deadly adverse event following ERCP. The ideal method for predicting PEP risk before ERCP has yet to be identified. We aimed to establish a simple PEP risk score model (SuPER model: Support for PEP Reduction) that can be applied before ERCP.\u0000\u0000Methods: This multicenter study enrolled 2074 patients who underwent ERCP. Among them, 1037 patients each were randomly assigned to the development and validation cohorts. In the development cohort, the risk score model for predicting PEP was established by logistic regression analysis. In the validation cohort, the performance of the model was assessed.\u0000\u0000Results: In the development cohort, five PEP risk factors that could be identified before ERCP were extracted and assigned weights according to their respective regression coefficients: -2 points for pancreatic calcification, 1 point for female sex, and 2 points for intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. The PEP occurrence rate was 0% among low-risk patients ([≤] 0 points), 5.5% among moderate-risk patients (1 to 3 points), and 20.2% among high-risk patients (4 to 7 points). In the validation cohort, the C-statistic of the risk score model was 0.71 (95% CI 0.64-0.78), which was considered acceptable. The PEP risk classification (low, moderate, and high) was a significant predictive factor for PEP that was independent from intraprocedural PEP risk factors (precut sphincterotomy and inadvertent pancreatic duct cannulation) (OR 4.2, 95% CI 2.8-6.3, P < 0.01).\u0000\u0000Conclusions: The PEP risk score allows an estimation of the risk of PEP prior to ERCP, regardless of whether the patient has undergone pancreatic duct procedures. This simple risk model, consisting of only five items, may aid in predicting and explaining the risk of PEP before ERCP and in preventing PEP by allowing selection of the appropriate expert endoscopist and useful PEP prophylaxes.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.11.24311810
M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang
Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.
{"title":"Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios","authors":"M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang","doi":"10.1101/2024.08.11.24311810","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311810","url":null,"abstract":"Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311783
A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm
Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.
{"title":"Mycobacterium tuberculosis infection in pregnancy: a systematic review","authors":"A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm","doi":"10.1101/2024.08.10.24311783","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311783","url":null,"abstract":"Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"4 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-11DOI: 10.1101/2024.08.10.24311783
A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm
Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.
{"title":"Mycobacterium tuberculosis infection in pregnancy: a systematic review","authors":"A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm","doi":"10.1101/2024.08.10.24311783","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311783","url":null,"abstract":"Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"17 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}