medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311788

L. Alketbi, Yousef Boobes, N. Nagelkerke, H. Aleissaee, N. AlShamsi, M. Almansoori, Ahmed Hemaid, M. AlDobaee, Noura AlAlawi, R. AlKetbi, T. Fahmawee, B. AlHashaikeh, A. AlAzeezi, F. Shuaib, J. Alnuaimi, E. Mahmoud, N. AlAhbabi, Bachar Afandi, Retrospective Cohort Study

The impact of abnormal Glomerular Filtration Rate (eGFR) on various adverse outcomes has been well studied; however, the United Arab Emirates (UAE), like many other regions in the world, remains understudied in this area. Method This retrospective cohort study estimates the age and sex-specific Glomerular Filtration Rate (eGFR) in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular (ASCVD) outcomes. The cohort of 8699 participants in a national cardiovascular disease screening from 2011 to 2013. The cohort was reevaluated in 2023 for mortality and cardiovascular outcomes. Reference eGFR percentiles were estimated from subjects without comorbidities using the LMS method. Results The reference percentiles of normal eGFR values showed a marked decrease with age, with small sex differences in the reference percentile distribution. A prognostic definition of renal hyperfiltration (RH) is suggested by the observation that subjects in the 97th percentile had a significantly higher incidence of ASCVD, although not statistically significant, in terms of mortality rate. Older age, female sex, history of ASCVD, history of hypertension, being treated for hypertension, lower diastolic blood pressure, higher systolic blood pressure, lower HDL, higher HA1C, and higher vitamin D were significantly associated with lower eGFR percentiles. Subjects in the two categories within the RH range, the 95th and 97th percentiles, had a significantly higher prevalence of diabetes; they are older smokers with higher BMI, higher HA1C, higher HDL, lower vitamin D, and more likely to be males, with higher physical activity and have a lower prevalence of CHD. Conclusion The distribution of eGFR by age and sex is valuable for clinical decision-making in Abu Dhabi and likely for the Arab population in general. Although the 95th percentile of eGFR in this cohort showed a higher but nonsignificant risk, the 97th percentile is significantly associated with ASCVD, even more than subjects in the less than 10th eGFR percentile. This study provides important insights into the prevalence and risk factors associated with different eGFR percentiles in the Abu Dhabi population. The findings underscore the need for targeted interventions to address modifiable risk factors and prevent the progression of renal damage in this high-risk population.

肾小球滤过率（eGFR）异常对各种不良后果的影响已得到充分研究；然而，阿拉伯联合酋长国（UAE）与世界上许多其他地区一样，在这一领域的研究仍然不足。方法该回顾性队列研究估算了阿布扎比人口中特定年龄和性别的肾小球滤过率（eGFR）及其与死亡率和动脉粥样硬化性心血管疾病（ASCVD）结果的关系。该队列由 2011 年至 2013 年参加全国心血管疾病筛查的 8699 人组成。2023 年对该队列进行了死亡率和心血管后果的重新评估。采用 LMS 方法从无合并症的受试者中估算出 eGFR 参考百分位数。结果正常 eGFR 值的参考百分位数随着年龄的增长而明显下降，参考百分位数分布的性别差异较小。肾脏高滤过率（RH）的预后定义是通过观察发现的，位于第 97 百分位数的受试者发生 ASCVD 的几率明显较高，尽管在死亡率方面没有统计学意义。年龄较大、性别为女性、有 ASCVD 病史、高血压病史、正在接受高血压治疗、舒张压较低、收缩压较高、高密度脂蛋白较低、HA1C 较高和维生素 D 较高与较低的 eGFR 百分位数显著相关。在 RH 范围内的两个类别，即第 95 和 97 百分位数中，受试者的糖尿病患病率明显较高；他们是年龄较大的吸烟者，具有较高的体重指数、较高的 HA1C、较高的 HDL 和较低的维生素 D，并且更可能是男性，具有较高的体力活动量和较低的冠心病患病率。结论按年龄和性别分列的 eGFR 分布情况对阿布扎比的临床决策很有价值，对整个阿拉伯人口也很有可能有价值。虽然在该队列中，eGFR 的第 95 百分位数显示出较高但不显著的风险，但第 97 百分位数与 ASCVD 有显著相关性，甚至高于 eGFR 第 10 百分位数以下的受试者。这项研究为我们深入了解阿布扎比人口中不同 eGFR 百分位数的患病率和相关风险因素提供了重要依据。研究结果突出表明，有必要采取有针对性的干预措施，以解决可改变的风险因素，并防止这一高风险人群的肾脏损伤恶化。

{"title":"Estimation of age and sex specific Glomerular Filtration Rate in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular outcome. A Retrospective Cohort Study","authors":"L. Alketbi, Yousef Boobes, N. Nagelkerke, H. Aleissaee, N. AlShamsi, M. Almansoori, Ahmed Hemaid, M. AlDobaee, Noura AlAlawi, R. AlKetbi, T. Fahmawee, B. AlHashaikeh, A. AlAzeezi, F. Shuaib, J. Alnuaimi, E. Mahmoud, N. AlAhbabi, Bachar Afandi, Retrospective Cohort Study","doi":"10.1101/2024.08.10.24311788","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311788","url":null,"abstract":"The impact of abnormal Glomerular Filtration Rate (eGFR) on various adverse outcomes has been well studied; however, the United Arab Emirates (UAE), like many other regions in the world, remains understudied in this area. Method This retrospective cohort study estimates the age and sex-specific Glomerular Filtration Rate (eGFR) in the Abu Dhabi population and its association with mortality and Atherosclerotic cardiovascular (ASCVD) outcomes. The cohort of 8699 participants in a national cardiovascular disease screening from 2011 to 2013. The cohort was reevaluated in 2023 for mortality and cardiovascular outcomes. Reference eGFR percentiles were estimated from subjects without comorbidities using the LMS method. Results The reference percentiles of normal eGFR values showed a marked decrease with age, with small sex differences in the reference percentile distribution. A prognostic definition of renal hyperfiltration (RH) is suggested by the observation that subjects in the 97th percentile had a significantly higher incidence of ASCVD, although not statistically significant, in terms of mortality rate. Older age, female sex, history of ASCVD, history of hypertension, being treated for hypertension, lower diastolic blood pressure, higher systolic blood pressure, lower HDL, higher HA1C, and higher vitamin D were significantly associated with lower eGFR percentiles. Subjects in the two categories within the RH range, the 95th and 97th percentiles, had a significantly higher prevalence of diabetes; they are older smokers with higher BMI, higher HA1C, higher HDL, lower vitamin D, and more likely to be males, with higher physical activity and have a lower prevalence of CHD. Conclusion The distribution of eGFR by age and sex is valuable for clinical decision-making in Abu Dhabi and likely for the Arab population in general. Although the 95th percentile of eGFR in this cohort showed a higher but nonsignificant risk, the 97th percentile is significantly associated with ASCVD, even more than subjects in the less than 10th eGFR percentile. This study provides important insights into the prevalence and risk factors associated with different eGFR percentiles in the Abu Dhabi population. The findings underscore the need for targeted interventions to address modifiable risk factors and prevent the progression of renal damage in this high-risk population.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"15 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predictive Modeling of Novel Somatic Mutation Impacts on Cancer Prognosis: A Machine Learning Approach Using the COSMIC Database 新型体细胞突变对癌症预后影响的预测建模：使用 COSMIC 数据库的机器学习方法

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311796

Masab A. Mansoor, Dba

Abstract Background Somatic mutations play a crucial role in cancer initiation, progression, and treatment response. While high-throughput sequencing has vastly expanded our understanding of cancer genomics, interpreting the functional impact of novel somatic mutations remains challenging. Machine learning approaches show promise in predicting mutation impacts, but robust models for accurate prognosis across different cancer types are still needed. Objective This study aimed to develop and validate a machine learning model using the Catalogue of Somatic Mutations in Cancer (COSMIC) database to predict the functional impact of novel somatic mutations on cancer prognosis across various cancer types. Methods We extracted data on 6,573,214 coding point mutations across 1,391 cancer types from COSMIC v95. We engineered 47 features for each mutation, including sequence context, protein domain information, evolutionary conservation scores, and frequency data. We developed and compared Random Forest, XGBoost, and Deep Neural Network models, selecting XGBoost based on performance. The model was evaluated using standard metrics and externally validated using data from The Cancer Genome Atlas (TCGA). Results The XGBoost model achieved an area under the Receiver Operating Characteristic curve (AUC-ROC) of 0.89 on the test set and 0.86 on the TCGA validation set. The model demonstrated consistent performance across major cancer types (AUC-ROC range: 0.85-0.92). Key predictive features included evolutionary conservation score, protein domain disruption, and mutation frequency. The model correctly identified 87% of known driver mutations and predicted 3,241 potentially high-impact novel mutations. Model predictions significantly correlated with patient survival in the TCGA dataset (HR = 1.8, 95% CI: 1.6-2.0, p < 0.001). Conclusions Our machine learning model shows strong predictive power in assessing the functional impact of somatic mutations on cancer prognosis across various cancer types. This approach has potential applications in research prioritization and clinical decision support, contributing to the advancement of precision oncology. Keywords cancer genomics; somatic mutations; machine learning; prognosis prediction; COSMIC database; precision oncology

摘要背景体细胞突变在癌症的发生、发展和治疗反应中起着至关重要的作用。虽然高通量测序极大地扩展了我们对癌症基因组学的了解，但解读新型体细胞突变的功能影响仍具有挑战性。机器学习方法有望预测突变的影响，但仍需建立强大的模型来准确预测不同癌症类型的预后。目的本研究旨在利用癌症体细胞突变目录（COSMIC）数据库开发并验证一种机器学习模型，以预测新型体细胞突变对不同癌症类型预后的功能性影响。方法我们从 COSMIC v95 中提取了 1,391 种癌症类型中 6,573,214 个编码点突变的数据。我们为每个突变设计了 47 个特征，包括序列上下文、蛋白质域信息、进化保护评分和频率数据。我们开发并比较了随机森林模型、XGBoost 模型和深度神经网络模型，并根据性能选择了 XGBoost 模型。我们使用标准指标对模型进行了评估，并使用癌症基因组图谱（TCGA）的数据进行了外部验证。结果 XGBoost 模型在测试集上的接收者操作特征曲线下面积（AUC-ROC）为 0.89，在 TCGA 验证集上为 0.86。该模型在主要癌症类型中表现出一致的性能（AUC-ROC 范围：0.85-0.92）。主要预测特征包括进化保护得分、蛋白质结构域中断和突变频率。该模型正确识别了87%的已知驱动突变，并预测了3241个潜在的高影响新型突变。在 TCGA 数据集中，模型预测结果与患者生存率明显相关（HR = 1.8，95% CI：1.6-2.0，p < 0.001）。结论我们的机器学习模型在评估体细胞突变对各种癌症预后的功能性影响方面显示出很强的预测能力。这种方法有望应用于研究优先级排序和临床决策支持，促进精准肿瘤学的发展。关键词癌症基因组学；体细胞突变；机器学习；预后预测；COSMIC 数据库；精准肿瘤学

{"title":"Predictive Modeling of Novel Somatic Mutation Impacts on Cancer Prognosis: A Machine Learning Approach Using the COSMIC Database","authors":"Masab A. Mansoor, Dba","doi":"10.1101/2024.08.10.24311796","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311796","url":null,"abstract":"Abstract Background Somatic mutations play a crucial role in cancer initiation, progression, and treatment response. While high-throughput sequencing has vastly expanded our understanding of cancer genomics, interpreting the functional impact of novel somatic mutations remains challenging. Machine learning approaches show promise in predicting mutation impacts, but robust models for accurate prognosis across different cancer types are still needed. Objective This study aimed to develop and validate a machine learning model using the Catalogue of Somatic Mutations in Cancer (COSMIC) database to predict the functional impact of novel somatic mutations on cancer prognosis across various cancer types. Methods We extracted data on 6,573,214 coding point mutations across 1,391 cancer types from COSMIC v95. We engineered 47 features for each mutation, including sequence context, protein domain information, evolutionary conservation scores, and frequency data. We developed and compared Random Forest, XGBoost, and Deep Neural Network models, selecting XGBoost based on performance. The model was evaluated using standard metrics and externally validated using data from The Cancer Genome Atlas (TCGA). Results The XGBoost model achieved an area under the Receiver Operating Characteristic curve (AUC-ROC) of 0.89 on the test set and 0.86 on the TCGA validation set. The model demonstrated consistent performance across major cancer types (AUC-ROC range: 0.85-0.92). Key predictive features included evolutionary conservation score, protein domain disruption, and mutation frequency. The model correctly identified 87% of known driver mutations and predicted 3,241 potentially high-impact novel mutations. Model predictions significantly correlated with patient survival in the TCGA dataset (HR = 1.8, 95% CI: 1.6-2.0, p < 0.001). Conclusions Our machine learning model shows strong predictive power in assessing the functional impact of somatic mutations on cancer prognosis across various cancer types. This approach has potential applications in research prioritization and clinical decision support, contributing to the advancement of precision oncology. Keywords cancer genomics; somatic mutations; machine learning; prognosis prediction; COSMIC database; precision oncology","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal magnetic resonance imaging characterizes clinical outcome in chronic traumatic brain injury 多模态磁共振成像描述慢性脑外伤的临床结果

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24305340

M. Pelegrini-Issac, A. Hezghia, E. Caron, S. Delphine, V. Battisti, D. Cassereau, C. Debarle, M. Lefort, B. Lesimple, G. Torkomian, V. Degos, R. Bernard, D. Galanaud, P. Pradat-Diehl, V. Perlbarg, É. Bayen, L. Puybasset

Moderate to severe traumatic brain injury (TBI) should be considered as a chronic health condition. The corpus callosum is the brain region that suffers most from diffuse axonal injury, leading to long-term functional deficits. Few studies have considered the relationships between inter- and intrahemispheric functional connectivity and structural damages to the corpus callosum in chronic TBI patients. We examined how callosal functional connectivity and white matter alterations relate to clinical outcome using multimodal magnetic resonance imaging (MRI): structural MRI estimates callosal volume, diffusion-weighted MRI enables white matter integrity quantification, resting-state functional MRI assesses neural dysfunction. Seventy-four patients underwent a multimodal MRI session on average 5 years after a moderate-to-severe TBI. Multiple factorial analysis analyzed the relationships between clinical outcome (from severe disability to good recovery, assessed by the Glasgow Outcome Scale extended GOSE), callosal volume, diffusion metrics (fractional anisotropy and mean, axial, and radial diffusivity), and inter- and intrahemispheric functional connectivity. Multiple factorial analysis confirmed that patients with severe disability (GOSE 3-4) had more structural alterations in the corpus callosum than patients with a good recovery (GOSE 7-8). Most importantly, patients able to live independently but unable to work/study in a standard environment (GOSE 5-6) could not be described solely by structural features. They exhibited a lower interhemispheric connectivity between cortical regions mediated by the corpus callosum than patients with a good recovery, and a tendency towards a decrease in intrahemispheric connectivity compared with severely disabled patients. These findings suggest a complex long-term functional impact of moderate-to-severe TBI.

中重度创伤性脑损伤（TBI）应被视为一种慢性健康问题。胼胝体是遭受弥漫性轴索损伤最严重的脑区，会导致长期的功能障碍。很少有研究考虑慢性 TBI 患者大脑半球间和半球内功能连接与胼胝体结构损伤之间的关系。我们利用多模态磁共振成像（MRI）研究了胼胝体功能连接和白质改变与临床结果之间的关系：结构磁共振成像估算胼胝体体积，弥散加权磁共振成像量化白质完整性，静息状态功能磁共振成像评估神经功能紊乱。74名患者在中重度创伤性脑损伤后平均5年接受了一次多模态磁共振成像检查。多重因子分析分析了临床结果（从严重残疾到良好恢复，由格拉斯哥结果量表扩展 GOSE 评估）、胼胝体体积、弥散指标（分数各向异性和平均、轴向和径向弥散性）以及半球间和半球内功能连接之间的关系。多重因子分析证实，与恢复良好的患者（GOSE 7-8）相比，重度残疾患者（GOSE 3-4）的胼胝体结构变化更大。最重要的是，能够独立生活但无法在标准环境中工作/学习的患者（GOSE 5-6）不能仅用结构特征来描述。与恢复良好的患者相比，他们由胼胝体介导的大脑皮层区域之间的半球间连通性较低，与严重残疾患者相比，半球内连通性有下降趋势。这些研究结果表明，中重度创伤性脑损伤会产生复杂的长期功能影响。

{"title":"Multimodal magnetic resonance imaging characterizes clinical outcome in chronic traumatic brain injury","authors":"M. Pelegrini-Issac, A. Hezghia, E. Caron, S. Delphine, V. Battisti, D. Cassereau, C. Debarle, M. Lefort, B. Lesimple, G. Torkomian, V. Degos, R. Bernard, D. Galanaud, P. Pradat-Diehl, V. Perlbarg, É. Bayen, L. Puybasset","doi":"10.1101/2024.08.10.24305340","DOIUrl":"https://doi.org/10.1101/2024.08.10.24305340","url":null,"abstract":"Moderate to severe traumatic brain injury (TBI) should be considered as a chronic health condition. The corpus callosum is the brain region that suffers most from diffuse axonal injury, leading to long-term functional deficits. Few studies have considered the relationships between inter- and intrahemispheric functional connectivity and structural damages to the corpus callosum in chronic TBI patients. We examined how callosal functional connectivity and white matter alterations relate to clinical outcome using multimodal magnetic resonance imaging (MRI): structural MRI estimates callosal volume, diffusion-weighted MRI enables white matter integrity quantification, resting-state functional MRI assesses neural dysfunction. Seventy-four patients underwent a multimodal MRI session on average 5 years after a moderate-to-severe TBI. Multiple factorial analysis analyzed the relationships between clinical outcome (from severe disability to good recovery, assessed by the Glasgow Outcome Scale extended GOSE), callosal volume, diffusion metrics (fractional anisotropy and mean, axial, and radial diffusivity), and inter- and intrahemispheric functional connectivity. Multiple factorial analysis confirmed that patients with severe disability (GOSE 3-4) had more structural alterations in the corpus callosum than patients with a good recovery (GOSE 7-8). Most importantly, patients able to live independently but unable to work/study in a standard environment (GOSE 5-6) could not be described solely by structural features. They exhibited a lower interhemispheric connectivity between cortical regions mediated by the corpus callosum than patients with a good recovery, and a tendency towards a decrease in intrahemispheric connectivity compared with severely disabled patients. These findings suggest a complex long-term functional impact of moderate-to-severe TBI.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning to Predict-Then-Optimize Elective Orthopaedic Surgery Scheduling Improves Operating Room Utilization 通过机器学习预测并优化骨科择期手术排期，提高手术室利用率

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311370

MASc Johnathan R. Lex MBChB, Jacob Mosseri BASc MASc, Mba Frcsc Jay Toor MD, Aazad Abbas HBSc, Michael Simone BASc, Bheeshma Ravi, Cari M. Whyne, Elias B. Khalil

Objective: To determine the potential for improving elective surgery scheduling for total knee and hip arthroplasty (TKA and THA, respectively) by utilizing a two-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Materials and Methods: Two ML models (for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 examples, respectively, from a large international database. Three optimization formulations based on varying surgeon flexibility were compared: Any- surgeons could operate in any operating room at any time, Split- limitation of two surgeons per operating room per day, and MSSP- limit of one surgeon per operating room per day. Two years of daily scheduling simulations were performed for each optimization problem using ML-prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high volume arthroplasty hospital in Canada. Results: The Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (p<0.001). The latter two problems performed similarly (p>0.05) over most schedule parameters. The ML-prediction schedules outperformed those generated using a mean DOS over all schedule parameters, with overtime reduced on average by 300 to 500 minutes per week. Using a 15-minute schedule granularity with a wait list pool of minimum 1 month generated the best schedules. Conclusion: Assuming a full waiting list, optimizing an individual surgeons elective operating room time using an ML-assisted predict-then optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime.

目的利用机器学习（ML）预测手术持续时间（DOS）和优化排期的两阶段方法，确定改善全膝关节和髋关节置换术（分别为 TKA 和 THA）择期手术排期的潜力。材料与方法：分别根据大型国际数据库中的 302,490 例和 196,942 例实例，使用患者因素对两个 ML 模型（TKA 和 THA）进行训练，以预测 DOS。比较了基于不同外科医生灵活性的三种优化方案：Any--外科医生可以在任何时间在任何手术室进行手术；Split--限制每天每个手术室有两名外科医生；MSSP--限制每天每个手术室有一名外科医生。针对每个优化问题，使用 ML 预测法或平均 DOS 法对一系列日程参数进行了为期两年的每日日程安排模拟。约束条件和资源以加拿大一家高产量关节成形术医院为基础。结果：在大多数日程参数下，任何日程安排方案在超时和利用不足方面的表现明显差于 Split 和 MSSP 方案（P0.05）。在所有排程参数上，ML 预测排程的表现优于使用平均 DOS 生成的排程，每周平均减少加班 300 到 500 分钟。使用 15 分钟的计划粒度和最少 1 个月的候补名单池生成了最佳计划。结论假定有完整的候诊名单，使用 ML 辅助的 "先预测后优化 "排班系统优化外科医生的择期手术室时间，可提高手术室的整体效率，显著减少加班时间。

{"title":"Machine Learning to Predict-Then-Optimize Elective Orthopaedic Surgery Scheduling Improves Operating Room Utilization","authors":"MASc Johnathan R. Lex MBChB, Jacob Mosseri BASc MASc, Mba Frcsc Jay Toor MD, Aazad Abbas HBSc, Michael Simone BASc, Bheeshma Ravi, Cari M. Whyne, Elias B. Khalil","doi":"10.1101/2024.08.10.24311370","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311370","url":null,"abstract":"Objective: To determine the potential for improving elective surgery scheduling for total knee and hip arthroplasty (TKA and THA, respectively) by utilizing a two-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Materials and Methods: Two ML models (for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 examples, respectively, from a large international database. Three optimization formulations based on varying surgeon flexibility were compared: Any- surgeons could operate in any operating room at any time, Split- limitation of two surgeons per operating room per day, and MSSP- limit of one surgeon per operating room per day. Two years of daily scheduling simulations were performed for each optimization problem using ML-prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high volume arthroplasty hospital in Canada. Results: The Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (p<0.001). The latter two problems performed similarly (p>0.05) over most schedule parameters. The ML-prediction schedules outperformed those generated using a mean DOS over all schedule parameters, with overtime reduced on average by 300 to 500 minutes per week. Using a 15-minute schedule granularity with a wait list pool of minimum 1 month generated the best schedules. Conclusion: Assuming a full waiting list, optimizing an individual surgeons elective operating room time using an ML-assisted predict-then optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"5 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cardiotoxicity in Pediatric Cancer Survivorship: Patterns, Predictors, and Implications for Long-term Care 小儿癌症幸存者的心脏毒性：模式、预测因素和对长期护理的影响

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311795

Masab A. Mansoor, Dba

Background Improved survival rates in pediatric cancer have shifted focus to long-term effects of treatment, with cardiovascular complications emerging as a leading cause of morbidity and mortality. Understanding the patterns and predictors of cardiotoxicity is crucial for risk stratification, treatment optimization, and long-term care planning. Objective This study aimed to investigate the prevalence, incidence, and risk factors of cardiotoxicity in pediatric cancer survivors using data from the Childhood Cancer Survivor Study (CCSS). Methods We conducted a retrospective cohort study of 24,938 five-year survivors of childhood cancer diagnosed between 1970 and 1999. Cardiovascular complications, including cardiomyopathy, coronary artery disease, valvular heart disease, and arrhythmias, were assessed through self-reported questionnaires and medical record review. Cox proportional hazards models were used to evaluate risk factors, and a prediction model was developed using multivariable logistic regression. Results The cumulative incidence of any cardiovascular complication by 30 years post-diagnosis was 18.7% (95% CI: 17.9%-19.5%). Significant risk factors included anthracycline exposure (HR 2.31, 95% CI: 2.09-2.55 for doses [≥] 250 mg/m), chest radiation (HR 1.84, 95% CI: 1.66-2.05 for doses [≥] 20 Gy), older age at diagnosis, male sex, and obesity. A risk prediction model demonstrated good discrimination (C-statistic: 0.78, 95% CI: 0.76-0.80). Survivors had a significantly higher risk of cardiovascular complications compared to sibling controls (OR 3.7, 95% CI: 3.2-4.2). Conclusions Childhood cancer survivors face a substantial and persistent risk of cardiovascular complications. The identified risk factors and prediction model can guide personalized follow-up strategies and interventions. These findings underscore the need for lifelong cardiovascular monitoring and care in this population.

背景儿童癌症存活率的提高已将重点转移到治疗的长期影响上，心血管并发症已成为发病和死亡的主要原因。了解心脏毒性的模式和预测因素对于风险分层、治疗优化和长期护理规划至关重要。目的本研究旨在利用儿童癌症幸存者研究（CCSS）的数据，调查儿童癌症幸存者中心脏毒性的流行率、发病率和风险因素。方法我们对 1970 年至 1999 年间确诊的 24938 名五年期儿童癌症幸存者进行了回顾性队列研究。心血管并发症包括心肌病、冠状动脉疾病、瓣膜性心脏病和心律失常，通过自我报告问卷和病历审查进行评估。采用 Cox 比例危险模型评估风险因素，并利用多变量逻辑回归建立了预测模型。结果诊断后30年内任何心血管并发症的累积发生率为18.7%（95% CI：17.9%-19.5%）。重要的风险因素包括蒽环类药物暴露（剂量[≥] 250 mg/m，HR 2.31，95% CI：2.09-2.55）、胸部辐射（剂量[≥] 20 Gy，HR 1.84，95% CI：1.66-2.05）、诊断时年龄较大、男性和肥胖。风险预测模型显示出良好的区分度（C统计量：0.78，95% CI：0.76-0.80）。与同胞对照组相比，幸存者患心血管并发症的风险明显更高（OR 3.7，95% CI：3.2-4.2）。结论儿童癌症幸存者面临着巨大且持续的心血管并发症风险。已确定的风险因素和预测模型可指导个性化的随访策略和干预措施。这些发现强调了对这一人群进行终身心血管监测和护理的必要性。

{"title":"Cardiotoxicity in Pediatric Cancer Survivorship: Patterns, Predictors, and Implications for Long-term Care","authors":"Masab A. Mansoor, Dba","doi":"10.1101/2024.08.10.24311795","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311795","url":null,"abstract":"Background Improved survival rates in pediatric cancer have shifted focus to long-term effects of treatment, with cardiovascular complications emerging as a leading cause of morbidity and mortality. Understanding the patterns and predictors of cardiotoxicity is crucial for risk stratification, treatment optimization, and long-term care planning. Objective This study aimed to investigate the prevalence, incidence, and risk factors of cardiotoxicity in pediatric cancer survivors using data from the Childhood Cancer Survivor Study (CCSS). Methods We conducted a retrospective cohort study of 24,938 five-year survivors of childhood cancer diagnosed between 1970 and 1999. Cardiovascular complications, including cardiomyopathy, coronary artery disease, valvular heart disease, and arrhythmias, were assessed through self-reported questionnaires and medical record review. Cox proportional hazards models were used to evaluate risk factors, and a prediction model was developed using multivariable logistic regression. Results The cumulative incidence of any cardiovascular complication by 30 years post-diagnosis was 18.7% (95% CI: 17.9%-19.5%). Significant risk factors included anthracycline exposure (HR 2.31, 95% CI: 2.09-2.55 for doses [≥] 250 mg/m), chest radiation (HR 1.84, 95% CI: 1.66-2.05 for doses [≥] 20 Gy), older age at diagnosis, male sex, and obesity. A risk prediction model demonstrated good discrimination (C-statistic: 0.78, 95% CI: 0.76-0.80). Survivors had a significantly higher risk of cardiovascular complications compared to sibling controls (OR 3.7, 95% CI: 3.2-4.2). Conclusions Childhood cancer survivors face a substantial and persistent risk of cardiovascular complications. The identified risk factors and prediction model can guide personalized follow-up strategies and interventions. These findings underscore the need for lifelong cardiovascular monitoring and care in this population.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"2 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios 过于自信的人工智能？临床场景中的法律硕士自我评估基准

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.11.24311810

M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang

Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.

背景和目的：大语言模型（LLMs）在医疗保健领域大有可为，但其自我评估能力仍不明确。本研究评估了五个医学专业的 12 个 LLM 的信心水平和性能，以评估它们准确判断自己回答的能力。研究方法我们使用了来自内科、妇产科、精神病学、儿科和普通外科的 1965 道选择题。模型在提示下提供答案和置信度分数。采用卡方检验和 t 检验对成绩和信心进行分析。此外，还评估了不同问题版本的一致性。结果：无论答案正确与否，所有模型都显示出很高的置信度。与较低层次的模型（79.6% 对 79.5%）相比，高层次模型的校准效果稍好，正确答案的平均置信度为 72.5%，错误答案的平均置信度为 69.4%。在所有模型中，正确答案和错误答案之间的平均置信度差异从 0.6% 到 5.4% 不等。有四个模型显示正确答案的置信度明显更高（p<0.01），但差异仍然很小。大多数模型在不同的问题版本中表现出一致性。结论：虽然较新的 LLM 在医学知识任务中的表现和一致性有所改善，但其置信水平的校准仍然较差。表现与自我评估之间的差距给临床应用带来了风险。在这些模型能可靠地评估其确定性之前，它们在医疗保健领域的使用应受到限制，并应在专家的监督下进行。需要进一步研究人与人工智能的协作和组合方法，以便负责任地实施。关键词大型语言模型（LLMs）、安全人工智能、人工智能可靠性、临床知识。

{"title":"Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios","authors":"M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang","doi":"10.1101/2024.08.11.24311810","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311810","url":null,"abstract":"Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"2 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new preprocedural predictive risk model for post-endoscopic retrograde cholangiopancreatography pancreatitis: The SuPER model 内镜逆行胰胆管造影术后胰腺炎的新术前预测风险模型：SuPER模型

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.11.24311807

M. Sugimoto, T. Takagi, T. Suzuki, H. Shimizu, G. Shibukawa, Y. Nakajima, Y. Takeda, Y. Noguchi, R. Kobayashi, H. Imamura, H. Asama, N. Konno, Y. Waragai, H. Akatsuka, R. Suzuki, T. Hikichi, H. Ohira

Background: Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is a severe and deadly adverse event following ERCP. The ideal method for predicting PEP risk before ERCP has yet to be identified. We aimed to establish a simple PEP risk score model (SuPER model: Support for PEP Reduction) that can be applied before ERCP.Methods: This multicenter study enrolled 2074 patients who underwent ERCP. Among them, 1037 patients each were randomly assigned to the development and validation cohorts. In the development cohort, the risk score model for predicting PEP was established by logistic regression analysis. In the validation cohort, the performance of the model was assessed.Results: In the development cohort, five PEP risk factors that could be identified before ERCP were extracted and assigned weights according to their respective regression coefficients: -2 points for pancreatic calcification, 1 point for female sex, and 2 points for intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. The PEP occurrence rate was 0% among low-risk patients ([≤] 0 points), 5.5% among moderate-risk patients (1 to 3 points), and 20.2% among high-risk patients (4 to 7 points). In the validation cohort, the C-statistic of the risk score model was 0.71 (95% CI 0.64-0.78), which was considered acceptable. The PEP risk classification (low, moderate, and high) was a significant predictive factor for PEP that was independent from intraprocedural PEP risk factors (precut sphincterotomy and inadvertent pancreatic duct cannulation) (OR 4.2, 95% CI 2.8-6.3, P < 0.01).Conclusions: The PEP risk score allows an estimation of the risk of PEP prior to ERCP, regardless of whether the patient has undergone pancreatic duct procedures. This simple risk model, consisting of only five items, may aid in predicting and explaining the risk of PEP before ERCP and in preventing PEP by allowing selection of the appropriate expert endoscopist and useful PEP prophylaxes.

背景：内镜逆行胰胆管造影术（ERCP）后胰腺炎（PEP）是ERCP术后严重且致命的不良反应。ERCP术前预测胰腺炎风险的理想方法尚未确定。我们的目标是建立一个简单的 PEP 风险评分模型（SuPER 模型：支持减少 PEP），该模型可在 ERCP 前应用：这项多中心研究共纳入 2074 名接受 ERCP 的患者。方法：这项多中心研究共纳入 2074 名接受 ERCP 的患者，其中 1037 名患者被随机分配到开发组和验证组。在开发组中，通过逻辑回归分析建立了预测 PEP 的风险评分模型。在验证队列中，对模型的性能进行了评估：在开发队列中，提取了ERCP前可确定的五个PEP风险因素，并根据其各自的回归系数赋予权重：胰腺钙化为-2分，女性为1分，导管内乳头状粘液瘤、原生瓦特乳头或使用胰管手术为2分。低危患者（[≤] 0 分）的 PEP 发生率为 0%，中危患者（1 至 3 分）为 5.5%，高危患者（4 至 7 分）为 20.2%。在验证队列中，风险评分模型的 C 统计量为 0.71（95% CI 0.64-0.78），可以接受。PEP风险分级（低、中、高）是PEP的重要预测因素，独立于术中PEP风险因素（括约肌切开术前和胰管插管不慎）（OR 4.2，95% CI 2.8-6.3，P < 0.01）：无论患者是否接受过胰管手术，PEP 风险评分都能估算出 ERCP 术前发生 PEP 的风险。这个简单的风险模型只有五个项目，有助于预测和解释ERCP术前PEP的风险，并通过选择合适的内镜专家和有用的PEP预防措施来预防PEP。

{"title":"A new preprocedural predictive risk model for post-endoscopic retrograde cholangiopancreatography pancreatitis: The SuPER model","authors":"M. Sugimoto, T. Takagi, T. Suzuki, H. Shimizu, G. Shibukawa, Y. Nakajima, Y. Takeda, Y. Noguchi, R. Kobayashi, H. Imamura, H. Asama, N. Konno, Y. Waragai, H. Akatsuka, R. Suzuki, T. Hikichi, H. Ohira","doi":"10.1101/2024.08.11.24311807","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311807","url":null,"abstract":"Background: Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is a severe and deadly adverse event following ERCP. The ideal method for predicting PEP risk before ERCP has yet to be identified. We aimed to establish a simple PEP risk score model (SuPER model: Support for PEP Reduction) that can be applied before ERCP.\u0000\u0000Methods: This multicenter study enrolled 2074 patients who underwent ERCP. Among them, 1037 patients each were randomly assigned to the development and validation cohorts. In the development cohort, the risk score model for predicting PEP was established by logistic regression analysis. In the validation cohort, the performance of the model was assessed.\u0000\u0000Results: In the development cohort, five PEP risk factors that could be identified before ERCP were extracted and assigned weights according to their respective regression coefficients: -2 points for pancreatic calcification, 1 point for female sex, and 2 points for intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. The PEP occurrence rate was 0% among low-risk patients ([≤] 0 points), 5.5% among moderate-risk patients (1 to 3 points), and 20.2% among high-risk patients (4 to 7 points). In the validation cohort, the C-statistic of the risk score model was 0.71 (95% CI 0.64-0.78), which was considered acceptable. The PEP risk classification (low, moderate, and high) was a significant predictive factor for PEP that was independent from intraprocedural PEP risk factors (precut sphincterotomy and inadvertent pancreatic duct cannulation) (OR 4.2, 95% CI 2.8-6.3, P < 0.01).\u0000\u0000Conclusions: The PEP risk score allows an estimation of the risk of PEP prior to ERCP, regardless of whether the patient has undergone pancreatic duct procedures. This simple risk model, consisting of only five items, may aid in predicting and explaining the risk of PEP before ERCP and in preventing PEP by allowing selection of the appropriate expert endoscopist and useful PEP prophylaxes.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios 过于自信的人工智能？临床场景中的法律硕士自我评估基准

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.11.24311810

M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang

Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.

背景和目的：大语言模型（LLMs）在医疗保健领域大有可为，但其自我评估能力仍不明确。本研究评估了五个医学专业的 12 个 LLM 的信心水平和性能，以评估它们准确判断自己回答的能力。研究方法我们使用了来自内科、妇产科、精神病学、儿科和普通外科的 1965 道选择题。模型在提示下提供答案和置信度分数。采用卡方检验和 t 检验对成绩和信心进行分析。此外，还评估了不同问题版本的一致性。结果：无论答案正确与否，所有模型都显示出很高的置信度。与低层次模型（79.6% 对 79.5%）相比，高层次模型的校准效果稍好，正确答案的平均置信度为 72.5%，错误答案的平均置信度为 69.4%。在所有模型中，正确答案和错误答案之间的平均置信度差异从 0.6% 到 5.4% 不等。有四个模型显示正确答案的置信度明显更高（p<0.01），但差异仍然很小。大多数模型在不同的问题版本中表现出一致性。结论：虽然较新的 LLM 在医学知识任务中的表现和一致性有所改善，但其置信度仍未得到很好的校准。表现与自我评估之间的差距给临床应用带来了风险。在这些模型能够可靠地评估其确定性之前，它们在医疗保健领域的使用应受到限制，并应在专家的监督下进行。需要进一步研究人与人工智能的协作和组合方法，以便负责任地实施。关键词大型语言模型（LLMs）、安全人工智能、人工智能可靠性、临床知识。

{"title":"Overconfident AI? Benchmarking LLM Self-Assessment in Clinical Scenarios","authors":"M. Omar, Benjamin S. Glicksberg, G. Nadkarni, E. Klang","doi":"10.1101/2024.08.11.24311810","DOIUrl":"https://doi.org/10.1101/2024.08.11.24311810","url":null,"abstract":"Background and Aim: Large language models (LLMs) show promise in healthcare, but their self-assessment capabilities remain unclear. This study evaluates the confidence levels and performance of 12 LLMs across five medical specialties to assess their ability to accurately judge their responses. Methods: We used 1965 multiple-choice questions from internal medicine, obstetrics and gynecology, psychiatry, pediatrics, and general surgery. Models were prompted to provide answers and confidence scores. Performance and confidence were analyzed using chi-square tests and t-tests. Consistency across question versions was also evaluated. Results: All models displayed high confidence regardless of answer correctness. Higher-tier models showed slightly better calibration, with a mean confidence of 72.5% for correct answers versus 69.4% for incorrect ones, compared to lower-tier models (79.6% vs 79.5%). The mean confidence difference between correct and incorrect responses ranged from 0.6% to 5.4% across all models. Four models showed significantly higher confidence when correct (p<0.01), but the difference remained small. Most models demonstrated consistency across question versions. Conclusion: While newer LLMs show improved performance and consistency in medical knowledge tasks, their confidence levels remain poorly calibrated. The gap between performance and self-assessment poses risks in clinical applications. Until these models can reliably gauge their certainty, their use in healthcare should be limited and supervised by experts. Further research on human-AI collaboration and ensemble methods is needed for responsible implementation. Keywords: Large Language Models (LLMs), Safe AI, AI Reliability, Clinical knowledge.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"16 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mycobacterium tuberculosis infection in pregnancy: a systematic review 妊娠期结核分枝杆菌感染：系统综述

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311783

A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm

Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.

怀孕可能与结核分枝杆菌（Mtb）感染者罹患结核病（TB）的风险有关。围产期可为有针对性的筛查和治疗提供机会。本研究旨在综合已发表的有关妊娠期 Mtb 感染的文献，内容涉及患病率、自然史、检测结果、护理流程和治疗。我们于2023年10月3日检索了Ovid MEDLINE、Embase+Embase Classic、Web of Science和Cochrane对照试验中央注册中心（CENTRAL），共有47项研究符合纳入标准。在某些人群中，Mtb感染率高达57.0%，随着产妇年龄的增长和来自结核病高发地区的妇女的感染率也在增加。有五项研究对围产期从 Mtb 感染到活动性 TB 疾病的进展进行了量化，其中两项研究表明，与非孕期人群相比，风险增加（孕期 IRR 为 1.3-1.4，产后 IRR 为 1.9-2）。结核菌素皮试（TST）与伽马干扰素释放测定（IGRA）的一致性为 49.4%-96.3%，k 值为 0.19-0.56。据报道，筛查的依从性很高，62.0%-100.0%的人完成了产前 TST，81.0%-100.0%的人进行了胸片检查。关于结核病预防性治疗（TPT）的四项研究未发现与严重不良事件有显著关联。产前阶段可为根据具体情况进行母婴传播筛查和治疗提供机会。由于年龄增大和来自结核病高发地区的妇女患病率更高，患病风险也更高，因此应优先考虑这一人群。TPT似乎安全可行，但仍需进一步研究，以优化算法，确保孕妇和产后妇女能做出有据可依的决定，有效预防结核病。

{"title":"Mycobacterium tuberculosis infection in pregnancy: a systematic review","authors":"A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm","doi":"10.1101/2024.08.10.24311783","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311783","url":null,"abstract":"Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"4 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mycobacterium tuberculosis infection in pregnancy: a systematic review 妊娠期结核分枝杆菌感染：系统综述

medRxiv

Pub Date : 2024-08-11 DOI: 10.1101/2024.08.10.24311783

A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm

Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.

怀孕可能与结核分枝杆菌（Mtb）感染者罹患结核病（TB）的风险有关。围产期可为有针对性的筛查和治疗提供机会。本研究旨在综合已发表的有关妊娠期 Mtb 感染的文献，内容涉及患病率、自然史、检测结果、护理流程和治疗。我们于2023年10月3日检索了Ovid MEDLINE、Embase+Embase Classic、Web of Science和Cochrane对照试验中央注册中心（CENTRAL），共有47项研究符合纳入标准。在某些人群中，Mtb感染率高达57.0%，随着产妇年龄的增长和来自结核病高发地区的妇女的感染率也在增加。有五项研究对围产期从 Mtb 感染到活动性 TB 疾病的进展进行了量化，其中两项研究表明，与非孕期人群相比，风险增加（孕期 IRR 为 1.3-1.4，产后 IRR 为 1.9-2）。结核菌素皮试（TST）与伽马干扰素释放测定（IGRA）的一致性为 49.4%-96.3%，k 值为 0.19-0.56。据报道，筛查的依从性很高，62.0%-100.0%的人完成了产前 TST，81.0%-100.0%的人进行了胸片检查。关于结核病预防性治疗（TPT）的四项研究未发现与严重不良事件有显著关联。产前阶段可为根据具体情况进行母婴传播筛查和治疗提供机会。由于年龄增大和来自结核病高发地区的妇女患病率更高，患病风险也更高，因此应优先考虑这一人群。TPT似乎安全可行，但仍需进一步研究，以优化算法，确保孕妇和产后妇女能做出有据可依的决定，有效预防结核病。

{"title":"Mycobacterium tuberculosis infection in pregnancy: a systematic review","authors":"A. J. Morton, A. Roddy Mitchell, R. E. Melville, L. Hui, S. Y. Tong, S. J. Dunstan, J. T. Denholm","doi":"10.1101/2024.08.10.24311783","DOIUrl":"https://doi.org/10.1101/2024.08.10.24311783","url":null,"abstract":"Pregnancy may be associated with risk of developing tuberculosis (TB) in those infected with Mycobacterium tuberculosis (Mtb). The perinatal period could provide opportunities for targeted screening and treatment. This study aims to synthesise published literature on Mtb infection in pregnancy, relating to prevalence, natural history, test performance, cascade of care, and treatment. We searched Ovid MEDLINE, Embase+Embase Classic, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) on October 3, 2023, and 47 studies met the inclusion criteria. The prevalence of Mtb infection was up to 57.0% in some populations, with rates increasing with maternal age and in women from high TB-incidence settings. Five studies quantified perinatal progression from Mtb infection to active TB disease, with two demonstrating increased risk compared to non-pregnant populations (IRR 1.3-1.4 during pregnancy and IRR 1.9-2 postpartum). Concordance between Tuberculin Skin Test (TST) and Interferon Gamma-Release Assay (IGRA) ranged from 49.4%-96.3%, with k-values of 0.19-0.56. High screening adherence was reported, with 62.0-100.0% completing antenatal TST and 81.0-100.0% having chest radiograph. Four studies of TB preventative treatment (TPT) did not find a significant association with serious adverse events. The antenatal period could provide opportunities for contextualised Mtb infection screening and treatment. As women with increased age and from high TB-incidence settings demonstrate higher prevalence and risk of disease, this cohort should be prioritised. TPT appears safe and feasible; however, further studies are needed to optimise algorithms, ensuring pregnant and postpartum women can make evidence-informed decisions for effective TB prevention.","PeriodicalId":18505,"journal":{"name":"medRxiv","volume":"17 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

medRxiv最新文献