首页 > 最新文献

medRxiv - Health Informatics最新文献

英文 中文
Predicting mortality in critically ill patients with hypertension using machine learning and deep learning models 利用机器学习和深度学习模型预测高血压重症患者的死亡率
Pub Date : 2024-08-22 DOI: 10.1101/2024.08.21.24312399
Ziyang Zhang, Jiancheng Ye
Background:Accurate prediction of mortality in critically ill patients with hypertension admitted to the Intensive Care Unit (ICU) is essential for guiding clinical decision-making and improving patient outcomes. Traditional prognostic tools often fall short in capturing the complex interactions between clinical variables in this high-risk population. Recent advances in machine learning (ML) and deep learning (DL) offer the potential for developing more sophisticated and accurate predictive models. Objective:This study aims to evaluate the performance of various ML and DL models in predicting mortality among critically ill patients with hypertension, with a particular focus on identifying key clinical predictors and assessing the comparative effectiveness of these models. Methods:We conducted a retrospective analysis of 30,096 critically ill patients with hypertension admitted to the ICU. Various ML models, including logistic regression, decision trees, and support vector machines, were compared with advanced DL models, including 1D convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Model performance was evaluated using area under the receiver operating characteristic curve (AUC) and other performance metrics. SHapley Additive exPlanations (SHAP) values were used to interpret model outputs and identify key predictors of mortality. Results:The 1D CNN model with an initial selection of predictors achieved the highest AUC (0.7744), outperforming both traditional ML models and other DL models. Key clinical predictors of mortality identified across models included the APS-III score, age, and length of ICU stay. The SHAP analysis revealed that these predictors had a substantial influence on model predictions, underscoring their importance in assessing mortality risk in this patient population. Conclusion:Deep learning models, particularly the 1D CNN, demonstrated superior predictive accuracy compared to traditional ML models in predicting mortality among critically ill patients with hypertension. The integration of these models into clinical workflows could enhance the early identification of high-risk patients, enabling more targeted interventions and improving patient outcomes. Future research should focus on the prospective validation of these models and the ethical considerations associated with their implementation in clinical practice.
背景:准确预测入住重症监护室(ICU)的高血压重症患者的死亡率对于指导临床决策和改善患者预后至关重要。传统的预后工具往往无法捕捉这一高风险人群中临床变量之间复杂的相互作用。机器学习(ML)和深度学习(DL)的最新进展为开发更复杂、更准确的预测模型提供了可能。目的:本研究旨在评估各种 ML 和 DL 模型在预测高血压重症患者死亡率方面的性能,尤其侧重于确定关键临床预测因素和评估这些模型的比较效果。方法:我们对 30096 名入住重症监护室的高血压重症患者进行了回顾性分析。我们将包括逻辑回归、决策树和支持向量机在内的各种 ML 模型与包括一维卷积神经网络 (CNN) 和长短期记忆 (LSTM) 网络在内的高级 DL 模型进行了比较。使用接收者工作特征曲线下面积(AUC)和其他性能指标对模型性能进行了评估。使用SHAPLEY Additive exPlanations(SHAP)值解释模型输出,并确定死亡率的关键预测因素。结果:具有初始预测因子选择的一维 CNN 模型达到了最高的 AUC(0.7744),优于传统的 ML 模型和其他 DL 模型。各模型确定的死亡率关键临床预测因子包括 APS-III 评分、年龄和重症监护室住院时间。SHAP分析表明,这些预测因素对模型预测有很大影响,突出了它们在评估这类患者死亡风险中的重要性。结论:与传统的 ML 模型相比,深度学习模型,尤其是一维 CNN,在预测高血压重症患者的死亡率方面表现出更高的预测准确性。将这些模型整合到临床工作流程中,可以加强对高危患者的早期识别,从而采取更有针对性的干预措施,改善患者预后。未来的研究应侧重于这些模型的前瞻性验证以及在临床实践中实施这些模型的相关伦理考虑。
{"title":"Predicting mortality in critically ill patients with hypertension using machine learning and deep learning models","authors":"Ziyang Zhang, Jiancheng Ye","doi":"10.1101/2024.08.21.24312399","DOIUrl":"https://doi.org/10.1101/2024.08.21.24312399","url":null,"abstract":"Background:\u0000Accurate prediction of mortality in critically ill patients with hypertension admitted to the Intensive Care Unit (ICU) is essential for guiding clinical decision-making and improving patient outcomes. Traditional prognostic tools often fall short in capturing the complex interactions between clinical variables in this high-risk population. Recent advances in machine learning (ML) and deep learning (DL) offer the potential for developing more sophisticated and accurate predictive models. Objective:\u0000This study aims to evaluate the performance of various ML and DL models in predicting mortality among critically ill patients with hypertension, with a particular focus on identifying key clinical predictors and assessing the comparative effectiveness of these models. Methods:\u0000We conducted a retrospective analysis of 30,096 critically ill patients with hypertension admitted to the ICU. Various ML models, including logistic regression, decision trees, and support vector machines, were compared with advanced DL models, including 1D convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Model performance was evaluated using area under the receiver operating characteristic curve (AUC) and other performance metrics. SHapley Additive exPlanations (SHAP) values were used to interpret model outputs and identify key predictors of mortality. Results:\u0000The 1D CNN model with an initial selection of predictors achieved the highest AUC (0.7744), outperforming both traditional ML models and other DL models. Key clinical predictors of mortality identified across models included the APS-III score, age, and length of ICU stay. The SHAP analysis revealed that these predictors had a substantial influence on model predictions, underscoring their importance in assessing mortality risk in this patient population. Conclusion:\u0000Deep learning models, particularly the 1D CNN, demonstrated superior predictive accuracy compared to traditional ML models in predicting mortality among critically ill patients with hypertension. The integration of these models into clinical workflows could enhance the early identification of high-risk patients, enabling more targeted interventions and improving patient outcomes. Future research should focus on the prospective validation of these models and the ethical considerations associated with their implementation in clinical practice.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Future Pandemics: AI-Designed Assays for Detecting Mpox, General and Clade 1b Specific 未来的流行病:人工智能设计的检测麻风杆菌(普通型和 1b 支系特异型)的化验方法
Pub Date : 2024-08-22 DOI: 10.1101/2024.08.22.24312441
Lucero Mendoza-Maldonado, John MacSharry, Johan Garssen, Aletta D. Kraneveld, Alberto Tonda, Alejandro Lopez-Rincon
The global outbreak of human monkeypox (mpox) in 2022, declared a Public Health Emergency of International Concern by the WHO, has underscored the urgent need for effective diagnostic tools. In August 2024 WHO again declared mpox as a Public Health Emergency of International Concern. This study presents an innovative approach using artificial intelligence (AI) to design primers for the rapid and accurate detection of mpox. Leveraging evolutionary algorithms, we developed primer sets with high specificity and sensitivity, validated in silico for mpox main lineage and the Clade 1b. These primers are crucial for distinguishing mpox from other viruses, enabling precise diagnosis and timely public health responses. Our findings highlight the potential of AI-driven methodologies to enhance surveillance, vaccination strategies, and outbreak management, particularly for emerging zoonotic diseases. The emergence of new mpox clades, such as Clade 1b, with higher mortality rates, further emphasizes the necessity for continuous monitoring and preparedness for future pandemics. This study advocates for the integration of AI in molecular diagnostics to improve public health outcomes.
2022 年,人猴痘在全球爆发,世卫组织宣布其为 "国际关注的公共卫生紧急事件",这凸显了对有效诊断工具的迫切需求。2024 年 8 月,世卫组织再次宣布猴痘为国际关注的突发公共卫生事件。本研究提出了一种利用人工智能(AI)设计引物的创新方法,用于快速准确地检测天花。利用进化算法,我们开发出了具有高特异性和高灵敏度的引物集,并对 mpox 主系和支系 1b 进行了硅验证。这些引物对于区分 mpox 和其他病毒、实现精确诊断和及时的公共卫生响应至关重要。我们的研究结果凸显了人工智能驱动的方法在加强监测、疫苗接种策略和疫情管理方面的潜力,尤其是在新出现的人畜共患疾病方面。新的 mpox 支系(如支系 1b)的出现具有更高的死亡率,这进一步强调了对未来流行病进行持续监测和防备的必要性。本研究提倡将人工智能融入分子诊断,以改善公共卫生成果。
{"title":"Future Pandemics: AI-Designed Assays for Detecting Mpox, General and Clade 1b Specific","authors":"Lucero Mendoza-Maldonado, John MacSharry, Johan Garssen, Aletta D. Kraneveld, Alberto Tonda, Alejandro Lopez-Rincon","doi":"10.1101/2024.08.22.24312441","DOIUrl":"https://doi.org/10.1101/2024.08.22.24312441","url":null,"abstract":"The global outbreak of human monkeypox (mpox) in 2022, declared a Public Health Emergency of International Concern by the WHO, has underscored the urgent need for effective diagnostic tools. In August 2024 WHO again declared mpox as a Public Health Emergency of International Concern. This study presents an innovative approach using artificial intelligence (AI) to design primers for the rapid and accurate detection of mpox. Leveraging evolutionary algorithms, we developed primer sets with high specificity and sensitivity, validated in silico for mpox main lineage and the Clade 1b. These primers are crucial for distinguishing mpox from other viruses, enabling precise diagnosis and timely public health responses. Our findings highlight the potential of AI-driven methodologies to enhance surveillance, vaccination strategies, and outbreak management, particularly for emerging zoonotic diseases. The emergence of new mpox clades, such as Clade 1b, with higher mortality rates, further emphasizes the necessity for continuous monitoring and preparedness for future pandemics. This study advocates for the integration of AI in molecular diagnostics to improve public health outcomes.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidimensional assessment of adverse events of finasteride:a real-world pharmacovigilance analysis based on FDA Adverse Event Reporting System (FAERS) from 2004 to April 2024 非那雄胺不良事件的多维评估:基于 2004 年至 2024 年 4 月美国食品药物管理局不良事件报告系统 (FAERS) 的真实世界药物警戒分析
Pub Date : 2024-08-22 DOI: 10.1101/2024.08.21.24312383
Xiaoling Zhong, Yihan Yang, Sheng Wei, Yuchen Liu
Background Finasteride is commonly utilized in clinical practice for treating androgenetic alopecia, but real-world data regarding the long-term safety of its adverse events remains incomplete, necessitating ongoing supplementation. This study aims to evaluate the adverse events (AEs) associated with finasteride use, based on data from the US Food and Drug Administration Adverse Event Reporting System (FAERS), to contribute to its safety assessment. Methods We reviewed adverse event reports associated with finasteride from the FAERS database, covering the period from the first quarter of 2004 to the first quarter of 2024. We assessed the safety of finasteride medication and AEs using four proportional disproportionality analyses: reported odds ratio, proportionate reporting ratio (PRR), Bayesian Confidence Propagation Neural Network (BCPN), and Multi-Item Gamma Poisson Shrinkage (MGPS). These methods were used to evaluate the of finasteride medication and AEs. whether there is a significant association between finasteride drug use and AEs. To investigate potential safety issues related to drug use, we further analyzed the similarities and differences in the onset time and AEs by gender, as well as the similarities and differences in AEs by age. Results Among the 11,557 adverse event reports where finasteride was the primary suspected drug, most patients affected were male (86.04%), with a significant proportion being the young adult aged 18-45 years (27.22%). We categorized 73 adverse events (AEs) into 7 different system organ categories (SOCs), which included common AEs like erectile dysfunction and sexual dysfunction. Notably, Peyronie's disease and post 5α reductase inhibitor syndrome were AEs not listed in the drug insert. We identified 102 AEs for men and 7 for women. Depression and anxiety were notable AEs for both male and female. Additionally, we examined 17 adverse events (AEs) in patients under 18 years old, 157 in patients aged 18 to 65 years, and 133 in patients aged 65 years and older. Each age group exhibited unique AEs, although erectile dysfunction, decreased libido, depression, suicidal ideation, psychotic disorder, and attention disturbance were common AEs observed across different age brackets. Ultimately, the median onset time for all instances was 61 days. The onset was mainly within one month after initiation of finasteride and it is noteworthy that the second highest number of cases involved adverse drug reactions persisted beyond one year of treatment. Conclusion The results of our study uncovered both known and novel AEs associated with finasteride medication. Some of these AEs were identical to the specification, and some of them signaled AEs that were not demonstrated in the specification. In addition, some AEs showed variations based on gender and age in our study. Consequently, our findings offer valuable insights for future research on the safety of finasteride medication and are anticipated to enhance its safe use i
背景 非那雄胺是临床上治疗雄激素性脱发的常用药物,但有关其不良事件长期安全性的真实数据仍不完整,因此需要不断补充。本研究旨在根据美国食品和药物管理局不良事件报告系统(FAERS)的数据,评估与非那雄胺使用相关的不良事件(AEs),为其安全性评估做出贡献。方法 我们查阅了 FAERS 数据库中与非那雄胺相关的不良事件报告,时间跨度为 2004 年第一季度至 2024 年第一季度。我们使用四种比例失调分析法评估了非那雄胺药物和AEs的安全性:报告几率比例、比例报告比(PRR)、贝叶斯置信度传播神经网络(BCPN)和多项目伽马泊松收缩(MGPS)。这些方法用于评估非那雄胺药物使用和AEs之间是否存在显著关联。为了研究与用药相关的潜在安全问题,我们进一步分析了不同性别的非那雄胺药物的发病时间和不良反应的异同,以及不同年龄的不良反应的异同。结果 在以非那雄胺为主要可疑药物的 11,557 例不良事件报告中,大多数患者为男性(86.04%),其中 18-45 岁的青壮年占了相当大的比例(27.22%)。我们将 73 例不良事件(AEs)分为 7 个不同的系统器官类别(SOCs),其中包括勃起功能障碍和性功能障碍等常见不良事件。值得注意的是,佩罗尼氏病和 5α 还原酶抑制剂后综合征是药物说明书中未列出的不良反应。我们发现男性和女性分别有 102 例和 7 例不良反应。抑郁和焦虑是男性和女性的显著不良反应。此外,我们还检查了 18 岁以下患者的 17 例不良事件 (AE)、18 至 65 岁患者的 157 例不良事件 (AE),以及 65 岁及以上患者的 133 例不良事件 (AE)。尽管勃起功能障碍、性欲减退、抑郁、自杀倾向、精神障碍和注意力障碍是不同年龄段患者常见的不良反应,但每个年龄段都出现了独特的不良反应。最终,所有病例的中位发病时间为 61 天。值得注意的是,药物不良反应持续时间超过一年的病例数位居第二。结论 我们的研究结果发现了与非那雄胺药物治疗相关的已知和新的药物不良反应。其中有些 AE 与说明书中的内容相同,有些则预示着说明书中未显示的 AE。此外,在我们的研究中,一些 AE 因性别和年龄的不同而有所变化。因此,我们的研究结果为今后研究非那雄胺药物的安全性提供了有价值的见解,并有望提高非那雄胺在临床实践中的安全使用。
{"title":"Multidimensional assessment of adverse events of finasteride:a real-world pharmacovigilance analysis based on FDA Adverse Event Reporting System (FAERS) from 2004 to April 2024","authors":"Xiaoling Zhong, Yihan Yang, Sheng Wei, Yuchen Liu","doi":"10.1101/2024.08.21.24312383","DOIUrl":"https://doi.org/10.1101/2024.08.21.24312383","url":null,"abstract":"Background Finasteride is commonly utilized in clinical practice for treating androgenetic alopecia, but real-world data regarding the long-term safety of its adverse events remains incomplete, necessitating ongoing supplementation. This study aims to evaluate the adverse events (AEs) associated with finasteride use, based on data from the US Food and Drug Administration Adverse Event Reporting System (FAERS), to contribute to its safety assessment. Methods We reviewed adverse event reports associated with finasteride from the FAERS database, covering the period from the first quarter of 2004 to the first quarter of 2024. We assessed the safety of finasteride medication and AEs using four proportional disproportionality analyses: reported odds ratio, proportionate reporting ratio (PRR), Bayesian Confidence Propagation Neural Network (BCPN), and Multi-Item Gamma Poisson Shrinkage (MGPS). These methods were used to evaluate the of finasteride medication and AEs. whether there is a significant association between finasteride drug use and AEs. To investigate potential safety issues related to drug use, we further analyzed the similarities and differences in the onset time and AEs by gender, as well as the similarities and differences in AEs by age. Results Among the 11,557 adverse event reports where finasteride was the primary suspected drug, most patients affected were male (86.04%), with a significant proportion being the young adult aged 18-45 years (27.22%). We categorized 73 adverse events (AEs) into 7 different system organ categories (SOCs), which included common AEs like erectile dysfunction and sexual dysfunction. Notably, Peyronie's disease and post 5α reductase inhibitor syndrome were AEs not listed in the drug insert. We identified 102 AEs for men and 7 for women. Depression and anxiety were notable AEs for both male and female. Additionally, we examined 17 adverse events (AEs) in patients under 18 years old, 157 in patients aged 18 to 65 years, and 133 in patients aged 65 years and older. Each age group exhibited unique AEs, although erectile dysfunction, decreased libido, depression, suicidal ideation, psychotic disorder, and attention disturbance were common AEs observed across different age brackets. Ultimately, the median onset time for all instances was 61 days. The onset was mainly within one month after initiation of finasteride and it is noteworthy that the second highest number of cases involved adverse drug reactions persisted beyond one year of treatment. Conclusion The results of our study uncovered both known and novel AEs associated with finasteride medication. Some of these AEs were identical to the specification, and some of them signaled AEs that were not demonstrated in the specification. In addition, some AEs showed variations based on gender and age in our study. Consequently, our findings offer valuable insights for future research on the safety of finasteride medication and are anticipated to enhance its safe use i","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy 评估从文字到图像生成的逼真人体解剖图像
Pub Date : 2024-08-21 DOI: 10.1101/2024.08.21.24312353
Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kuebler, Hatice Kuebra Parmaksiz, Cheng Chen, Pablo Sebastian Bolanos Orozco, Soeren S. Lienkamp, Janna Hastings
Background: Generative AI models that can produce photorealistic images from text descriptions have many applications in medicine, including medical education and synthetic data. However, it can be challenging to evaluate and compare their range of heterogeneous outputs, and thus there is a need for a systematic approach enabling image and model comparisons. Methods: We develop an error classification system for annotating errors in AI-generated photorealistic images of humans and apply our method to a corpus of 240 images generated with three different models (DALL-E 3, Stable Diffusion XL and Stable Cascade) using 10 prompts with 8 images per prompt. The error classification system identifies five different error types with three different severities across five anatomical regions and specifies an associated quantitative scoring method based on aggregated proportions of errors per expected count of anatomical components for the generated image. We assess inter-rater agreement by double-annotating 25% of the images and calculating Krippendorf's alpha and compare results across the three models and ten prompts quantitatively using a cumulative score per image. Findings: The error classification system, accompanying training manual, generated image collection, annotations, and all associated scripts are available from our GitHub repository at https://github.com/hastingslab-org/ai-human-images. Inter-rater agreement was relatively poor, reflecting the subjectivity of the error classification task. Model comparisons revealed DALL-E 3 performed consistently better than Stable Diffusion, however, the latter generated images reflecting more diversity in personal attributes. Images with groups of people were more challenging for all the models than individuals or pairs; some prompts were challenging for all models. Interpretation: Our method enables systematic comparison of AI-generated photorealistic images of humans; our results can serve to catalyse improvements in these models for medical applications.
背景:能够根据文本描述生成逼真图像的人工智能生成模型在医学领域有很多应用,包括医学教育和合成数据。然而,评估和比较这些模型的各种不同输出结果是一项挑战,因此需要一种系统的方法来对图像和模型进行比较。方法:我们开发了一种错误分类系统,用于标注人工智能生成的逼真人体图像中的错误,并将我们的方法应用于由三种不同模型(DALL-E 3、Stable Diffusion XL 和 Stable Cascade)生成的 240 幅图像组成的语料库,其中使用了 10 个提示,每个提示包含 8 幅图像。错误分类系统识别了五个解剖区域中三种不同严重程度的五种不同错误类型,并根据生成图像中每个预期解剖成分计数的错误汇总比例指定了相关的量化评分方法。我们通过对 25% 的图像进行双重注释和计算 Krippendorf's alpha 来评估评分者之间的一致性,并使用每张图像的累积分数对三种模型和十个提示的结果进行定量比较。研究结果错误分类系统、随附的培训手册、生成的图像集、注释和所有相关脚本均可从我们的 GitHub 存储库 https://github.com/hastingslab-org/ai-human-images 获取。评分者之间的一致性相对较差,这反映了错误分类任务的主观性。模型比较显示,DALL-E 3 的表现一直优于稳定扩散,但后者生成的图像反映了更多样化的个人属性。对所有模型来说,群组图像比个人或双人图像更具挑战性;有些提示对所有模型来说都具有挑战性。解释:我们的方法可以对人工智能生成的逼真人类图像进行系统比较;我们的结果可以促进这些模型在医疗应用方面的改进。
{"title":"Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy","authors":"Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kuebler, Hatice Kuebra Parmaksiz, Cheng Chen, Pablo Sebastian Bolanos Orozco, Soeren S. Lienkamp, Janna Hastings","doi":"10.1101/2024.08.21.24312353","DOIUrl":"https://doi.org/10.1101/2024.08.21.24312353","url":null,"abstract":"Background: Generative AI models that can produce photorealistic images from text descriptions have many applications in medicine, including medical education and synthetic data. However, it can be challenging to evaluate and compare their range of heterogeneous outputs, and thus there is a need for a systematic approach enabling image and model comparisons. Methods: We develop an error classification system for annotating errors in AI-generated photorealistic images of humans and apply our method to a corpus of 240 images generated with three different models (DALL-E 3, Stable Diffusion XL and Stable Cascade) using 10 prompts with 8 images per prompt. The error classification system identifies five different error types with three different severities across five anatomical regions and specifies an associated quantitative scoring method based on aggregated proportions of errors per expected count of anatomical components for the generated image. We assess inter-rater agreement by double-annotating 25% of the images and calculating Krippendorf's alpha and compare results across the three models and ten prompts quantitatively using a cumulative score per image. Findings: The error classification system, accompanying training manual, generated image collection, annotations, and all associated scripts are available from our GitHub repository at https://github.com/hastingslab-org/ai-human-images. Inter-rater agreement was relatively poor, reflecting the subjectivity of the error classification task. Model comparisons revealed DALL-E 3 performed consistently better than Stable Diffusion, however, the latter generated images reflecting more diversity in personal attributes. Images with groups of people were more challenging for all the models than individuals or pairs; some prompts were challenging for all models. Interpretation: Our method enables systematic comparison of AI-generated photorealistic images of humans; our results can serve to catalyse improvements in these models for medical applications.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic diagnostic support for diagnosis of pulmonary fibrosis 为诊断肺纤维化提供自动诊断支持
Pub Date : 2024-08-20 DOI: 10.1101/2024.08.14.24312012
Ravi Pal, Anna Barney, Giacomo Sgalla, Simon L. F. Walsh, Nicola Sverzellati, Sophie Fletcher, Stefania Cerri, Maxime Cannesson, Luca Richeldi
Patients with pulmonary fibrosis (PF) often experience long waits before getting a correct diagnosis, and this delay in reaching specialized care is associated with increased mortality, regardless of the severity of the disease. Early diagnosis and timely treatment of PF can potentially extend life expectancy and maintain a better quality of life. Crackles present in the recorded lung sounds may be crucial for the early diagnosis of PF. This paper describes an automated system for differentiating lung sounds related to PF from other pathological lung conditions using the average number of crackles per breath cycle (NOC/BC). The system is divided into four main parts: (1) preprocessing, (2) separation of crackles from normal breath sounds, (3) crackle verification and counting, and (4) estimating NOC/BC. The system was tested on a dataset consisting of 48 (24 fibrotic and 24 non-fibrotic) subjects and the results were compared with an assessment by two expert respiratory physicians. The set of HRCT images, reviewed by two expert radiologists for the presence or absence of pulmonary fibrosis, was used as the ground truth for evaluating the PF and non-PF classification performance of the system. The overall performance of the automatic classifier based on receiver operating curve-derived cut-off value for average NOC/BC of 18.65 (AUC=0.845, 95 % CI 0.739-0.952, p<0.001; sensitivity=91.7 %; specificity=59.3 %) compares favorably with the averaged performance of the physicians (sensitivity=83.3 %; specificity=56.25 %). Although radiological assessment should remain the gold standard for diagnosis of fibrotic interstitial lung disease, the automatic classification system has strong potential for diagnostic support, especially in assisting general practitioners in the auscultatory assessment of lung sounds to prompt further diagnostic work up of patients with suspect of interstitial lung disease.
肺纤维化(PF)患者在得到正确诊断之前往往要经历漫长的等待,而这种迟迟得不到专业治疗的情况与死亡率的增加有关,无论疾病的严重程度如何。肺纤维化的早期诊断和及时治疗有可能延长患者的预期寿命并提高其生活质量。记录的肺部啰音中出现的噼啪声可能是早期诊断 PF 的关键。本文介绍了一种自动系统,该系统利用每个呼吸周期的平均噼啪声数量(NOC/BC)来区分与肺功能不全相关的肺部声音和其他病理肺部状况。该系统分为四个主要部分:(1) 预处理;(2) 从正常呼吸音中分离噼啪声;(3) 噼啪声验证和计数;(4) 估算 NOC/BC。该系统在由 48 名受试者(24 名纤维化受试者和 24 名非纤维化受试者)组成的数据集上进行了测试,并将测试结果与两名呼吸内科专家的评估结果进行了比较。由两位放射科专家审查是否存在肺纤维化的一组 HRCT 图像被用作评估该系统肺纤维化和非肺纤维化分类性能的基本事实。根据接收器操作曲线得出的平均 NOC/BC 临界值 18.65(AUC=0.845,95 % CI 0.739-0.952,p<0.001;灵敏度=91.7 %;特异性=59.3 %),自动分类器的总体性能优于医生的平均性能(灵敏度=83.3 %;特异性=56.25 %)。尽管放射学评估仍应是诊断纤维化间质性肺病的金标准,但自动分类系统在诊断支持方面具有强大的潜力,尤其是在协助全科医生对肺部听诊进行评估方面,可促使怀疑患有间质性肺病的患者接受进一步的诊断工作。
{"title":"Automatic diagnostic support for diagnosis of pulmonary fibrosis","authors":"Ravi Pal, Anna Barney, Giacomo Sgalla, Simon L. F. Walsh, Nicola Sverzellati, Sophie Fletcher, Stefania Cerri, Maxime Cannesson, Luca Richeldi","doi":"10.1101/2024.08.14.24312012","DOIUrl":"https://doi.org/10.1101/2024.08.14.24312012","url":null,"abstract":"Patients with pulmonary fibrosis (PF) often experience long waits before getting a correct diagnosis, and this delay in reaching specialized care is associated with increased mortality, regardless of the severity of the disease. Early diagnosis and timely treatment of PF can potentially extend life expectancy and maintain a better quality of life. Crackles present in the recorded lung sounds may be crucial for the early diagnosis of PF. This paper describes an automated system for differentiating lung sounds related to PF from other pathological lung conditions using the average number of crackles per breath cycle (NOC/BC). The system is divided into four main parts: (1) preprocessing, (2) separation of crackles from normal breath sounds, (3) crackle verification and counting, and (4) estimating NOC/BC. The system was tested on a dataset consisting of 48 (24 fibrotic and 24 non-fibrotic) subjects and the results were compared with an assessment by two expert respiratory physicians. The set of HRCT images, reviewed by two expert radiologists for the presence or absence of pulmonary fibrosis, was used as the ground truth for evaluating the PF and non-PF classification performance of the system. The overall performance of the automatic classifier based on receiver operating curve-derived cut-off value for average NOC/BC of 18.65 (AUC=0.845, 95 % CI 0.739-0.952, p&lt;0.001; sensitivity=91.7 %; specificity=59.3 %) compares favorably with the averaged performance of the physicians (sensitivity=83.3 %; specificity=56.25 %). Although radiological assessment should remain the gold standard for diagnosis of fibrotic interstitial lung disease, the automatic classification system has strong potential for diagnostic support, especially in assisting general practitioners in the auscultatory assessment of lung sounds to prompt further diagnostic work up of patients with suspect of interstitial lung disease.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Framework for Benchmarking Embedding Models for Semantic Medical Tasks 为语义医疗任务建立嵌入模型基准的可扩展框架
Pub Date : 2024-08-20 DOI: 10.1101/2024.08.14.24312010
Shelly Soffer, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander Charney, Girish Nadkarni, Eyal Klang
Text embeddings convert textual information into numerical representations, enabling machines to perform semantic tasks like information retrieval. Despite its potential, the application of text embeddings in healthcare is underexplored in part due to a lack of benchmarking studies using biomedical data. This study provides a flexible framework for benchmarking embedding models to identify those most effective for healthcare-related semantic tasks. We selected thirty embedding models from the multilingual text embedding benchmarks (MTEB) Hugging Face resource, of various parameter sizes and architectures. Models were tested with real-world semantic retrieval medical tasks on (1) PubMed abstracts, (2) synthetic Electronic Health Records (EHRs) generated by the Llama-3-70b model, (3) real-world patient data from the Mount Sinai Health System, and the (4) MIMIC IV database. Tasks were split into Short Tasks, involving brief text pair interactions such as triage notes and chief complaints, and Long Tasks, which required processing extended documentation such as progress notes and history & physical notes. We assessed models by correlating their performance with data integrity levels, ranging from 0% (fully mismatched pairs) to 100% (perfectly matched pairs), using Spearman correlation. Additionally, we examined correlations between the average Spearman scores across tasks and two MTEB leaderboard benchmarks: the overall recorded average and the average Semantic Textual Similarity (STS) score. We evaluated 30 embedding models across seven clinical tasks (each involving 2,000 text pairs), across five levels of data integrity, totaling 2.1 million comparisons. Some models performed consistently well, while models based on Mistral-7b excelled in long-context tasks. NV-Embed-v1, despite being top performer in short tasks, did not perform as well in long tasks. Our average task performance score (ATPS) correlated better with the MTEB STS score (0.73) than with MTEB average score (0.67). The suggested framework is flexible, scalable and resistant to the risk of models overfitting on published benchmarks. Adopting this method can improve embedding technologies in healthcare.
文本嵌入将文本信息转换为数字表示,使机器能够执行信息检索等语义任务。尽管文本嵌入很有潜力,但其在医疗保健领域的应用还未得到充分探索,部分原因是缺乏使用生物医学数据的基准研究。本研究提供了一个灵活的框架,用于对嵌入模型进行基准测试,以确定那些对医疗保健相关语义任务最有效的嵌入模型。我们从多语言文本嵌入基准(MTEB)Hugging Face 资源中选择了 30 个嵌入模型,这些模型具有不同的参数大小和架构。我们使用真实世界的语义检索医疗任务对模型进行了测试,测试对象包括:(1)PubMed 摘要;(2)由 Llama-3-70b 模型生成的合成电子健康记录(EHR);(3)来自西奈山健康系统的真实世界患者数据;以及(4)MIMIC IV 数据库。任务分为 "短任务 "和 "长任务"。"短任务 "涉及简短的文本配对交互,例如分诊记录和主诉;"长任务 "需要处理扩展文档,例如进展记录和病史& 体检记录。我们利用斯皮尔曼相关性将模型的性能与数据完整性水平相关联,对模型进行了评估,数据完整性水平从 0%(完全不匹配的配对)到 100%(完全匹配的配对)不等。此外,我们还检查了各任务的平均斯皮尔曼分数与两个 MTEB排行榜基准(记录的总平均分和语义文本相似性 (STS) 平均分)之间的相关性。我们在七个临床任务(每个任务涉及 2,000 个文本对)和五个数据完整性级别中对 30 个嵌入模型进行了评估,共进行了 210 万次比较。一些模型始终表现出色,而基于 Mistral-7b 的模型在长文本任务中表现出色。尽管 NV-Embed-v1 在短任务中表现最佳,但在长任务中的表现却不尽如人意。我们的平均任务性能得分(ATPS)与 MTEB STS 得分(0.73)的相关性优于 MTEB 平均得分(0.67)。所建议的框架具有灵活性、可扩展性,并能抵御已发布基准上模型过度拟合的风险。采用这种方法可以改进医疗保健领域的嵌入式技术。
{"title":"A Scalable Framework for Benchmarking Embedding Models for Semantic Medical Tasks","authors":"Shelly Soffer, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander Charney, Girish Nadkarni, Eyal Klang","doi":"10.1101/2024.08.14.24312010","DOIUrl":"https://doi.org/10.1101/2024.08.14.24312010","url":null,"abstract":"Text embeddings convert textual information into numerical representations, enabling machines to perform semantic tasks like information retrieval. Despite its potential, the application of text embeddings in healthcare is underexplored in part due to a lack of benchmarking studies using biomedical data. This study provides a flexible framework for benchmarking embedding models to identify those most effective for healthcare-related semantic tasks. We selected thirty embedding models from the multilingual text embedding benchmarks (MTEB) Hugging Face resource, of various parameter sizes and architectures. Models were tested with real-world semantic retrieval medical tasks on (1) PubMed abstracts, (2) synthetic Electronic Health Records (EHRs) generated by the Llama-3-70b model, (3) real-world patient data from the Mount Sinai Health System, and the (4) MIMIC IV database. Tasks were split into Short Tasks, involving brief text pair interactions such as triage notes and chief complaints, and Long Tasks, which required processing extended documentation such as progress notes and history &amp; physical notes. We assessed models by correlating their performance with data integrity levels, ranging from 0% (fully mismatched pairs) to 100% (perfectly matched pairs), using Spearman correlation. Additionally, we examined correlations between the average Spearman scores across tasks and two MTEB leaderboard benchmarks: the overall recorded average and the average Semantic Textual Similarity (STS) score. We evaluated 30 embedding models across seven clinical tasks (each involving 2,000 text pairs), across five levels of data integrity, totaling 2.1 million comparisons. Some models performed consistently well, while models based on Mistral-7b excelled in long-context tasks. NV-Embed-v1, despite being top performer in short tasks, did not perform as well in long tasks. Our average task performance score (ATPS) correlated better with the MTEB STS score (0.73) than with MTEB average score (0.67). The suggested framework is flexible, scalable and resistant to the risk of models overfitting on published benchmarks. Adopting this method can improve embedding technologies in healthcare.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"256 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel fusion method of 3D MRI and test results through deep learning for the early detection of Alzheimer’s disease 通过深度学习融合三维核磁共振成像和测试结果的新型方法,用于早期检测阿尔茨海默病
Pub Date : 2024-08-20 DOI: 10.1101/2024.08.15.24312032
Arman Atalar, Nihat Adar, Savaş Okyay
Alzheimer’s disease (AD) is a prevalent form of dementia that impacts brain cells. Although its likelihood increases with age, there is no transitional period between its stages. In order to enhance diagnostic precision, physicians rely on clinical judgments derived from interpreting health data, considering demographics, clinical history, and laboratory results to detect AD at an early stage. While patient cognitive tests and demographic information are primarily presented in text, brain scan images are presented in graphic formats. Researchers typically use different classifiers for each data format and then merge the classifier outcomes to maximize classification accuracy and utilize all patient-related data for the final decision. However, this approach leads to low performance, diminishing predictive abilities and model effectiveness.We propose an innovative approach that combines diverse textual health records (HR) with three-dimensional structural magnetic resonance imaging (3D sMRI) to achieve a similar objective in computer-aided diagnosis, utilizing a novel deep learning technique. Health records, encompassing demographic features like age, gender, apolipoprotein gene, and mini-mental state examination score, are fused with 3D sMRI, enabling a graphic-based deep learning strategy for early AD detection. The fusion of data is accomplished by representing textual information as graphic pipes and integrating them into 3D sMRI, a method referred to as the “pipe-laying” method.Experimental results from over 4000 sMRI scans of 780 patients in the AD Neuroimaging Initiative (ADNI) dataset demonstrate that the pipe-laying method enhances recognition accuracy rates for Early and Late Mild Cognitive Impairment (MCI) patients, accurately classifying all AD patients. In a 4-class AD diagnosis scenario, accuracy improved from 86.87% when only 3D images were used to 90.00% when 3D sMRI and patient health records were included. Thus, the positive impact of combining 3D sMRI with HR on 4-class AD diagnosis was established.
阿尔茨海默病(AD)是一种影响脑细胞的常见痴呆症。虽然其发病几率随年龄增长而增加,但各阶段之间并无过渡期。为了提高诊断的准确性,医生们依靠通过解读健康数据、考虑人口统计学、临床病史和实验室结果得出的临床判断来早期发现老年痴呆症。患者的认知测试和人口统计学信息主要以文本形式呈现,而大脑扫描图像则以图形形式呈现。研究人员通常对每种数据格式使用不同的分类器,然后合并分类器的结果,以最大限度地提高分类准确性,并利用所有与患者相关的数据做出最终决定。我们提出了一种创新方法,将不同的文本健康记录(HR)与三维结构磁共振成像(3D sMRI)相结合,利用新型深度学习技术实现计算机辅助诊断中的类似目标。健康记录包括年龄、性别、载脂蛋白基因和迷你精神状态检查评分等人口统计学特征,与三维结构磁共振成像融合后,可采用基于图形的深度学习策略进行早期注意力缺失症检测。数据融合是通过将文本信息表示为图形管道并将其整合到三维sMRI中来实现的,这种方法被称为 "管道铺设 "法。"AD神经影像倡议"(ADNI)数据集中780名患者的4000多次sMRI扫描的实验结果表明,管道铺设法提高了早期和晚期轻度认知障碍(MCI)患者的识别准确率,准确地对所有AD患者进行了分类。在 4 类注意力缺失症诊断场景中,如果只使用三维图像,准确率为 86.87%,而加入三维 sMRI 和患者健康记录后,准确率则提高到 90.00%。因此,结合三维 sMRI 和 HR 对四级 AD 诊断的积极影响已经确立。
{"title":"A novel fusion method of 3D MRI and test results through deep learning for the early detection of Alzheimer’s disease","authors":"Arman Atalar, Nihat Adar, Savaş Okyay","doi":"10.1101/2024.08.15.24312032","DOIUrl":"https://doi.org/10.1101/2024.08.15.24312032","url":null,"abstract":"Alzheimer’s disease (AD) is a prevalent form of dementia that impacts brain cells. Although its likelihood increases with age, there is no transitional period between its stages. In order to enhance diagnostic precision, physicians rely on clinical judgments derived from interpreting health data, considering demographics, clinical history, and laboratory results to detect AD at an early stage. While patient cognitive tests and demographic information are primarily presented in text, brain scan images are presented in graphic formats. Researchers typically use different classifiers for each data format and then merge the classifier outcomes to maximize classification accuracy and utilize all patient-related data for the final decision. However, this approach leads to low performance, diminishing predictive abilities and model effectiveness.\u0000We propose an innovative approach that combines diverse textual health records (HR) with three-dimensional structural magnetic resonance imaging (3D sMRI) to achieve a similar objective in computer-aided diagnosis, utilizing a novel deep learning technique. Health records, encompassing demographic features like age, gender, apolipoprotein gene, and mini-mental state examination score, are fused with 3D sMRI, enabling a graphic-based deep learning strategy for early AD detection. The fusion of data is accomplished by representing textual information as graphic pipes and integrating them into 3D sMRI, a method referred to as the “pipe-laying” method.\u0000Experimental results from over 4000 sMRI scans of 780 patients in the AD Neuroimaging Initiative (ADNI) dataset demonstrate that the pipe-laying method enhances recognition accuracy rates for Early and Late Mild Cognitive Impairment (MCI) patients, accurately classifying all AD patients. In a 4-class AD diagnosis scenario, accuracy improved from 86.87% when only 3D images were used to 90.00% when 3D sMRI and patient health records were included. Thus, the positive impact of combining 3D sMRI with HR on 4-class AD diagnosis was established.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChatGPT as a bioinformatic partner. ChatGPT 作为生物信息合作伙伴。
Pub Date : 2024-08-20 DOI: 10.1101/2024.08.20.24312291
Gianluca Mondillo, Alessandra Perrotta, Simone Colosimo, Vittoria Frattolillo
The advanced Large Language Model ChatGPT4o, developed by OpenAI, can be used in the field of bioinformatics to analyze and understand cross-reactive allergic reactions. This study explores the use of ChatGPT4o to support research on allergens, particularly in the cross-reactivity syndrome between cat and pork. Using a hypothetical clinical case of a child with a confirmed allergy to Fel d 2 (cat albumin) and Sus s 1 (pork albumin), the model guided data collection, protein sequence analysis, and three-dimensional structure visualization. Through the use of bioinformatics tools like SDAP 2.0 and BepiPRED, the epitope regions of the allergenic proteins were predicted, con-firming their accessibility to immunoglobulin E (IgE) and probability of cross-reactivity. The results show that regions with high epitope probability exhibit high surface accessibility and predominantly coil and helical structures. The construction of a phylogenetic tree further sup-ported the evolutionary relationships among the studied allergens. ChatGPT4o has demonstrated its usefulness in guiding non-specialist researchers through complex bioinformatics processes, making advanced science accessible and improving analytical and innovation capabilities.
由 OpenAI 开发的高级大型语言模型 ChatGPT4o 可用于生物信息学领域,以分析和理解交叉反应性过敏反应。本研究探讨了如何利用 ChatGPT4o 支持过敏原研究,尤其是猫和猪肉之间的交叉反应综合征。该模型使用了一个假定的临床病例,即一个对 Fel d 2(猫白蛋白)和 Sus s 1(猪白蛋白)确诊过敏的儿童,该模型指导了数据收集、蛋白质序列分析和三维结构可视化。通过使用 SDAP 2.0 和 BepiPRED 等生物信息学工具,预测了过敏原蛋白的表位区,确认了它们与免疫球蛋白 E (IgE) 的可及性和交叉反应的可能性。结果表明,表位概率高的区域具有较高的表面可及性,且主要为螺旋结构。系统发生树的构建进一步证实了所研究过敏原之间的进化关系。ChatGPT4o 在指导非专业研究人员完成复杂的生物信息学过程、普及先进科学知识以及提高分析和创新能力方面证明了它的实用性。
{"title":"ChatGPT as a bioinformatic partner.","authors":"Gianluca Mondillo, Alessandra Perrotta, Simone Colosimo, Vittoria Frattolillo","doi":"10.1101/2024.08.20.24312291","DOIUrl":"https://doi.org/10.1101/2024.08.20.24312291","url":null,"abstract":"The advanced Large Language Model ChatGPT4o, developed by OpenAI, can be used in the field of bioinformatics to analyze and understand cross-reactive allergic reactions. This study explores the use of ChatGPT4o to support research on allergens, particularly in the cross-reactivity syndrome between cat and pork. Using a hypothetical clinical case of a child with a confirmed allergy to Fel d 2 (cat albumin) and Sus s 1 (pork albumin), the model guided data collection, protein sequence analysis, and three-dimensional structure visualization. Through the use of bioinformatics tools like SDAP 2.0 and BepiPRED, the epitope regions of the allergenic proteins were predicted, con-firming their accessibility to immunoglobulin E (IgE) and probability of cross-reactivity. The results show that regions with high epitope probability exhibit high surface accessibility and predominantly coil and helical structures. The construction of a phylogenetic tree further sup-ported the evolutionary relationships among the studied allergens. ChatGPT4o has demonstrated its usefulness in guiding non-specialist researchers through complex bioinformatics processes, making advanced science accessible and improving analytical and innovation capabilities.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Meta-Transformers for Multimodal Clinical Decision Support and Evidence-Based Medicine 利用元变换器实现多模式临床决策支持和循证医学
Pub Date : 2024-08-20 DOI: 10.1101/2024.08.14.24312001
Sabah Mohammed, Jinan Fiaidhi, Abel Serracin Martinez
The advancements in computer vision and natural language processing are keys to thriving modern healthcare systems and its applications. Nonetheless, they have been researched and used as separate technical entities without integrating their predictive knowledge discovery when they are combined. Such integration will benefit every clinical/medical problem as they are inherently multimodal - they involve several distinct forms of data, such as images and text. However, the recent advancements in machine learning have brought these fields closer using the notion of meta-transformers. At the core of this synergy is building models that can process and relate information from multiple modalities where the raw input data from various modalities are mapped into a shared token space, allowing an encoder to extract high-level semantic features of the input data. Nerveless, the task of automatically identifying arguments in a clinical/medical text and finding their multimodal relationships remains challenging as it does not rely only on relevancy measures (e.g. how close that text to other modalities like an image) but also on the evidence supporting that relevancy. Relevancy based on evidence is a normal practice in medicine as every practice is an evidence-based. In this article we are experimenting with meta-transformers that can benefit evidence based predictions. In this article, we are experimenting with variety of fine tuned medical meta-transformers like PubmedCLIP, CLIPMD, BiomedCLIP-PubMedBERT and BioCLIP to see which one provide evidence-based relevant multimodal information. Our experimentation uses the TTi-Eval open-source platform to accommodate multimodal data embeddings. This platform simplifies the integration and evaluation of different meta-transformers models but also to variety of datasets for testing and fine tuning. Additionally, we are conducting experiments to test how relevant any multimodal prediction to the published medical literature especially those that are published by PubMed. Our experimentations revealed that the BiomedCLIP-PubMedBERT model provide more reliable evidence-based relevance compared to other models based on randomized samples from the ROCO V2 dataset or other multimodal datasets like MedCat. In this next stage of this research we are extending the use of the winning evidence-based multimodal learning model by adding components that enable medical practitioner to use this model to predict answers to clinical questions based on sound medical questioning protocol like PICO and based on standardized medical terminologies like UMLS.
计算机视觉和自然语言处理技术的进步是现代医疗系统及其应用蓬勃发展的关键。然而,人们一直将它们作为独立的技术实体进行研究和使用,而没有将它们结合起来进行预测性知识发现。这种整合将使每一个临床/医疗问题受益,因为它们本身就是多模态的--涉及多种不同形式的数据,如图像和文本。然而,最近机器学习的进步利用元变换器的概念拉近了这些领域的距离。这种协同作用的核心是建立能够处理和关联多种模态信息的模型,在这种模型中,来自不同模态的原始输入数据被映射到一个共享的标记空间,从而使编码器能够提取输入数据的高级语义特征。然而,自动识别临床/医学文本中的论点并找到它们之间的多模态关系仍然是一项具有挑战性的任务,因为这不仅依赖于相关性度量(例如文本与图像等其他模态的相关程度),还依赖于支持相关性的证据。基于证据的相关性是医学中的正常做法,因为每种做法都是以证据为基础的。在本文中,我们将尝试使用元变换器,这将有利于基于证据的预测。在本文中,我们将尝试使用各种微调医学元变换器,如 PubmedCLIP、CLIPMD、BiomedCLIP-PubMedBERT 和 BioCLIP,看看哪种元变换器能提供基于证据的相关多模态信息。我们的实验使用 TTi-Eval 开源平台来适应多模态数据嵌入。该平台不仅简化了不同元变换器模型的集成和评估,还简化了各种数据集的测试和微调。此外,我们还在进行实验,测试多模态预测与已发表的医学文献(尤其是 PubMed 发表的文献)的相关性。实验结果表明,与其他基于 ROCO V2 数据集随机样本或 MedCat 等其他多模态数据集的模型相比,BiomedCLIP-PubMedBERT 模型能提供更可靠的循证相关性。在下一阶段的研究中,我们将通过添加一些组件来扩展这一成功的基于证据的多模态学习模型的使用范围,使医疗从业人员能够使用该模型来预测临床问题的答案,这些答案将基于合理的医学提问协议(如 PICO)和标准化医学术语(如 UMLS)。
{"title":"Using Meta-Transformers for Multimodal Clinical Decision Support and Evidence-Based Medicine","authors":"Sabah Mohammed, Jinan Fiaidhi, Abel Serracin Martinez","doi":"10.1101/2024.08.14.24312001","DOIUrl":"https://doi.org/10.1101/2024.08.14.24312001","url":null,"abstract":"The advancements in computer vision and natural language processing are keys to thriving modern healthcare systems and its applications. Nonetheless, they have been researched and used as separate technical entities without integrating their predictive knowledge discovery when they are combined. Such integration will benefit every clinical/medical problem as they are inherently multimodal - they involve several distinct forms of data, such as images and text. However, the recent advancements in machine learning have brought these fields closer using the notion of meta-transformers. At the core of this synergy is building models that can process and relate information from multiple modalities where the raw input data from various modalities are mapped into a shared token space, allowing an encoder to extract high-level semantic features of the input data. Nerveless, the task of automatically identifying arguments in a clinical/medical text and finding their multimodal relationships remains challenging as it does not rely only on relevancy measures (e.g. how close that text to other modalities like an image) but also on the evidence supporting that relevancy. Relevancy based on evidence is a normal practice in medicine as every practice is an evidence-based. In this article we are experimenting with meta-transformers that can benefit evidence based predictions. In this article, we are experimenting with variety of fine tuned medical meta-transformers like PubmedCLIP, CLIPMD, BiomedCLIP-PubMedBERT and BioCLIP to see which one provide evidence-based relevant multimodal information. Our experimentation uses the TTi-Eval open-source platform to accommodate multimodal data embeddings. This platform simplifies the integration and evaluation of different meta-transformers models but also to variety of datasets for testing and fine tuning. Additionally, we are conducting experiments to test how relevant any multimodal prediction to the published medical literature especially those that are published by PubMed. Our experimentations revealed that the BiomedCLIP-PubMedBERT model provide more reliable evidence-based relevance compared to other models based on randomized samples from the ROCO V2 dataset or other multimodal datasets like MedCat. In this next stage of this research we are extending the use of the winning evidence-based multimodal learning model by adding components that enable medical practitioner to use this model to predict answers to clinical questions based on sound medical questioning protocol like PICO and based on standardized medical terminologies like UMLS.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replicating a COVID-19 study in a national England database to assess the generalisability of research with regional electronic health record data 在英格兰国家数据库中复制 COVID-19 研究,以评估地区电子健康记录数据研究的普遍性
Pub Date : 2024-08-08 DOI: 10.1101/2024.08.06.24311538
Richard Williams, David Jenkins, Thomas Bolton, Adrian Heald, Mehrdad A Mizani, Matthew Sperrin, Niels Peek, CVD-COVID-UK/COVID-IMPACT Consortium
IntroductionThe replication of observational studies using electronic health record data is critical for the evidence base of epidemiology. We have previously performed a study using linked primary and secondary care data in a large, urbanised region (Greater Manchester Care Record, Greater Manchester, UK) to compare the hospitalization rates of patients with diabetes (type 1 or type 2) after contracting COVID-19 with matched controls.MethodsIn this study we repeated the analysis using a national database covering the whole of England, UK (NHS England's Secure Data Environment service for England, accessed via the BHF Data Science Centre's CVD-COVID-UK/COVID-IMPACT Consortium).ResultsWe found that many of the effect sizes did not show a statistically significant difference. Where effect sizes were statistically significant in the regional study, then they remained significant in the national study and the effect size was the same direction and of similar magnitude.ConclusionThere is some evidence that the findings from studies in smaller regional datasets can be extrapolated to a larger, national setting. However, there were some significant differences and therefore replication studies remain an essential part of healthcare research.
导言:利用电子健康记录数据复制观察性研究对于流行病学的证据基础至关重要。我们以前曾在一个大型城市化地区(英国大曼彻斯特地区的大曼彻斯特医疗记录)使用链接的初级和二级医疗数据进行过一项研究,比较了感染 COVID-19 后的糖尿病(1 型或 2 型)患者与匹配对照组的住院率。方法在本研究中,我们使用覆盖整个英国英格兰的国家数据库(英国国家医疗服务体系的英格兰安全数据环境服务,通过英国卫生基金会数据科学中心的 CVD-COVID-UK/COVID-IMPACT 联合会访问)重复了上述分析。结论有证据表明,较小区域数据集的研究结果可以推广到较大的全国范围内。然而,其中也存在一些重大差异,因此,复制研究仍然是医疗保健研究的重要组成部分。
{"title":"Replicating a COVID-19 study in a national England database to assess the generalisability of research with regional electronic health record data","authors":"Richard Williams, David Jenkins, Thomas Bolton, Adrian Heald, Mehrdad A Mizani, Matthew Sperrin, Niels Peek, CVD-COVID-UK/COVID-IMPACT Consortium","doi":"10.1101/2024.08.06.24311538","DOIUrl":"https://doi.org/10.1101/2024.08.06.24311538","url":null,"abstract":"Introduction\u0000The replication of observational studies using electronic health record data is critical for the evidence base of epidemiology. We have previously performed a study using linked primary and secondary care data in a large, urbanised region (Greater Manchester Care Record, Greater Manchester, UK) to compare the hospitalization rates of patients with diabetes (type 1 or type 2) after contracting COVID-19 with matched controls.\u0000Methods\u0000In this study we repeated the analysis using a national database covering the whole of England, UK (NHS England's Secure Data Environment service for England, accessed via the BHF Data Science Centre's CVD-COVID-UK/COVID-IMPACT Consortium).\u0000Results\u0000We found that many of the effect sizes did not show a statistically significant difference. Where effect sizes were statistically significant in the regional study, then they remained significant in the national study and the effect size was the same direction and of similar magnitude.\u0000Conclusion\u0000There is some evidence that the findings from studies in smaller regional datasets can be extrapolated to a larger, national setting. However, there were some significant differences and therefore replication studies remain an essential part of healthcare research.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
medRxiv - Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1