Pub Date : 2024-08-14DOI: 10.1016/j.artmed.2024.102950
Duo Xu , Zeshui Xu
Artificial intelligence is constantly revolutionizing biomedical research and healthcare management. Disease comorbidity is a major threat to the quality of life for susceptible groups, especially middle-aged and elderly patients. The presence of multiple chronic diseases makes precision diagnosis challenging to realize and imposes a heavy burden on the healthcare system and economy. Given an enormous amount of accumulated health data, machine learning techniques show their capability in handling this puzzle. The present study conducts a review to uncover current research efforts in applying these methods to understanding comorbidity mechanisms and making clinical predictions considering these complex patterns. A descriptive metadata analysis of 791 unique publications aims to capture the overall research progression between January 2012 and June 2023. To delve into comorbidity-focused research, 61 of these scientific papers are systematically assessed. Four predictive analytics of tasks are detected: disease comorbidity data extraction, clustering, network, and risk prediction. It is observed that some machine learning-driven applications address inherent data deficiencies in healthcare datasets and provide a model interpretation that identifies significant risk factors of comorbidity development. Based on insights, both technical and practical, gained from relevant literature, this study intends to guide future interests in comorbidity research and draw conclusions about chronic disease prevention and diagnosis with managerial implications.
{"title":"Machine learning applications in preventive healthcare: A systematic literature review on predictive analytics of disease comorbidity from multiple perspectives","authors":"Duo Xu , Zeshui Xu","doi":"10.1016/j.artmed.2024.102950","DOIUrl":"10.1016/j.artmed.2024.102950","url":null,"abstract":"<div><p>Artificial intelligence is constantly revolutionizing biomedical research and healthcare management. Disease comorbidity is a major threat to the quality of life for susceptible groups, especially middle-aged and elderly patients. The presence of multiple chronic diseases makes precision diagnosis challenging to realize and imposes a heavy burden on the healthcare system and economy. Given an enormous amount of accumulated health data, machine learning techniques show their capability in handling this puzzle. The present study conducts a review to uncover current research efforts in applying these methods to understanding comorbidity mechanisms and making clinical predictions considering these complex patterns. A descriptive metadata analysis of 791 unique publications aims to capture the overall research progression between January 2012 and June 2023. To delve into comorbidity-focused research, 61 of these scientific papers are systematically assessed. Four predictive analytics of tasks are detected: disease comorbidity data extraction, clustering, network, and risk prediction. It is observed that some machine learning-driven applications address inherent data deficiencies in healthcare datasets and provide a model interpretation that identifies significant risk factors of comorbidity development. Based on insights, both technical and practical, gained from relevant literature, this study intends to guide future interests in comorbidity research and draw conclusions about chronic disease prevention and diagnosis with managerial implications.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102950"},"PeriodicalIF":6.1,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.artmed.2024.102948
Jorge Miguel Silva, João Rafael Almeida
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances.
In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification.
This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at https://github.com/ieeta-pt/xgTaxonomy.
{"title":"Enhancing metagenomic classification with compression-based features","authors":"Jorge Miguel Silva, João Rafael Almeida","doi":"10.1016/j.artmed.2024.102948","DOIUrl":"10.1016/j.artmed.2024.102948","url":null,"abstract":"<div><p>Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances.</p><p>In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification.</p><p>This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at <span><span>https://github.com/ieeta-pt/xgTaxonomy</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102948"},"PeriodicalIF":6.1,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001908/pdfft?md5=303ea4cf8a281f1bfe4eda4a7757ef46&pid=1-s2.0-S0933365724001908-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.artmed.2024.102945
Yean Zhu , Meirong Xiao , Dan Robbins , Xiaoying Wu , Wei Lu , Wensheng Hou
In the formulation of strategies for walking rehabilitation, achieving precise identification of the current state and making rational predictions about the future state are crucial but often unrealized. To tackle this challenge, our study introduces a unified framework that integrates a novel 3D walking motion capture method using multi-source image fusion and a walking rehabilitation simulation approach based on multi-agent reinforcement learning. We found that, (i) the proposal achieved an accurate 3D walking motion capture and outperforms other advanced methods. Experimental evidence indicates that, compared to similar visual skeleton tracking methods, the proposed approach yields results with higher Pearson correlation (), intra-class correlation coefficient (), and narrower confidence intervals ( for , for ) when compared to standard results. The outcomes of the proposed approach also exhibit commendable correlation and concurrence with those obtained through the IMU-based skeleton tracking method in the assessment of gait parameters ( for , for ); (ii) multi-agent reinforcement learning has the potential to be used to solve the simulation task of gait rehabilitation. In mimicry experiment, our proposed simulation method for gait rehabilitation not only enables the intelligent agent to converge from the initial state to the target state, but also observes evolutionary patterns similar to those observed in clinical practice through motor state resolution. This study offers valuable contributions to walking rehabilitation, enabling precise assessment and simulation-based interventions, with potential implications for clinical practice and patient outcomes.
{"title":"Walking representation and simulation based on multi-source image fusion and multi-agent reinforcement learning for gait rehabilitation","authors":"Yean Zhu , Meirong Xiao , Dan Robbins , Xiaoying Wu , Wei Lu , Wensheng Hou","doi":"10.1016/j.artmed.2024.102945","DOIUrl":"10.1016/j.artmed.2024.102945","url":null,"abstract":"<div><p>In the formulation of strategies for walking rehabilitation, achieving precise identification of the current state and making rational predictions about the future state are crucial but often unrealized. To tackle this challenge, our study introduces a unified framework that integrates a novel 3D walking motion capture method using multi-source image fusion and a walking rehabilitation simulation approach based on multi-agent reinforcement learning. We found that, (i) the proposal achieved an accurate 3D walking motion capture and outperforms other advanced methods. Experimental evidence indicates that, compared to similar visual skeleton tracking methods, the proposed approach yields results with higher Pearson correlation (<span><math><mrow><mi>r</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>93</mn></mrow></math></span>), intra-class correlation coefficient (<span><math><mrow><mi>I</mi><mi>C</mi><mi>C</mi><mrow><mo>(</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow><mo>=</mo><mn>0</mn><mo>.</mo><mn>91</mn></mrow></math></span>), and narrower confidence intervals (<span><math><mrow><mo>[</mo><mn>0</mn><mo>.</mo><mn>90</mn><mo>,</mo><mn>0</mn><mo>.</mo><mn>95</mn><mo>]</mo></mrow></math></span> for <span><math><mi>r</mi></math></span>, <span><math><mrow><mo>[</mo><mn>0</mn><mo>.</mo><mn>88</mn><mo>,</mo><mn>0</mn><mo>.</mo><mn>94</mn><mo>]</mo></mrow></math></span> for <span><math><mrow><mi>I</mi><mi>C</mi><mi>C</mi><mrow><mo>(</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>) when compared to standard results. The outcomes of the proposed approach also exhibit commendable correlation and concurrence with those obtained through the IMU-based skeleton tracking method in the assessment of gait parameters (<span><math><mrow><mo>[</mo><mn>0</mn><mo>.</mo><mn>85</mn><mo>,</mo><mn>0</mn><mo>.</mo><mn>89</mn><mo>]</mo></mrow></math></span> for <span><math><mi>r</mi></math></span>, <span><math><mrow><mo>[</mo><mn>0</mn><mo>.</mo><mn>75</mn><mo>,</mo><mn>0</mn><mo>.</mo><mn>81</mn><mo>]</mo></mrow></math></span> for <span><math><mrow><mi>I</mi><mi>C</mi><mi>C</mi><mrow><mo>(</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>); (ii) multi-agent reinforcement learning has the potential to be used to solve the simulation task of gait rehabilitation. In mimicry experiment, our proposed simulation method for gait rehabilitation not only enables the intelligent agent to converge from the initial state to the target state, but also observes evolutionary patterns similar to those observed in clinical practice through motor state resolution. This study offers valuable contributions to walking rehabilitation, enabling precise assessment and simulation-based interventions, with potential implications for clinical practice and patient outcomes.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102945"},"PeriodicalIF":6.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1016/j.artmed.2024.102937
Guoshu Jia , Lixia Fu , Likun Wang , Dongning Yao , Yimin Cui
Cell therapy, a burgeoning therapeutic strategy, necessitates a scientific regulatory framework but faces challenges in risk-based regulation due to the lack of a global consensus on risk classification. This study applies Bayesian network analysis to compare and evaluate the risk classification strategies for cellular products proposed by the Food and Drug Administration (FDA), Ministry of Health, Labour and Welfare (MHLW), and World Health Organization (WHO), using real-world data to validate the models. The appropriateness of key risk factors is assessed within the three regulatory frameworks, along with their implications for clinical safety. The results indicate several directions for refining risk classification approaches. Additionally, a substudy focuses on a specific type of cell and gene therapy (CGT), chimeric antigen receptor (CAR) T cell therapy. It underscores the importance of considering CAR targets, tumor types, and costimulatory domains when assessing the safety risks of CAR T cell products. Overall, there is currently a lack of a regulatory framework based on real-world data for cellular products and a lack of risk-based classification review methods. This study aims to improve the regulatory system for cellular products, emphasizing risk-based classification. Furthermore, the study advocates for leveraging machine learning in regulatory science to enhance the assessment of cellular product safety, illustrating the role of Bayesian networks in aiding regulatory decision-making for the risk classification of cellular products.
细胞疗法作为一种新兴的治疗策略,需要一个科学的监管框架,但由于全球对风险分类缺乏共识,基于风险的监管面临挑战。本研究运用贝叶斯网络分析法比较和评估了美国食品药品管理局(FDA)、日本厚生劳动省(MHLW)和世界卫生组织(WHO)提出的细胞产品风险分类策略,并使用真实世界的数据对模型进行了验证。在三个监管框架内评估了关键风险因素的适当性及其对临床安全性的影响。研究结果为完善风险分类方法指明了几个方向。此外,一项子研究重点关注一种特殊类型的细胞和基因疗法(CGT),即嵌合抗原受体(CAR)T 细胞疗法。它强调了在评估 CAR T 细胞产品的安全风险时考虑 CAR 靶点、肿瘤类型和成本调控域的重要性。总体而言,目前缺乏基于真实世界数据的细胞产品监管框架,也缺乏基于风险的分类审查方法。本研究旨在改进细胞产品的监管体系,强调基于风险的分类。此外,本研究还提倡在监管科学中利用机器学习来加强对细胞产品安全性的评估,并说明了贝叶斯网络在细胞产品风险分类监管决策中的辅助作用。
{"title":"Bayesian network analysis of risk classification strategies in the regulation of cellular products","authors":"Guoshu Jia , Lixia Fu , Likun Wang , Dongning Yao , Yimin Cui","doi":"10.1016/j.artmed.2024.102937","DOIUrl":"10.1016/j.artmed.2024.102937","url":null,"abstract":"<div><p>Cell therapy, a burgeoning therapeutic strategy, necessitates a scientific regulatory framework but faces challenges in risk-based regulation due to the lack of a global consensus on risk classification. This study applies Bayesian network analysis to compare and evaluate the risk classification strategies for cellular products proposed by the Food and Drug Administration (FDA), Ministry of Health, Labour and Welfare (MHLW), and World Health Organization (WHO), using real-world data to validate the models. The appropriateness of key risk factors is assessed within the three regulatory frameworks, along with their implications for clinical safety. The results indicate several directions for refining risk classification approaches. Additionally, a substudy focuses on a specific type of cell and gene therapy (CGT), chimeric antigen receptor (CAR) T cell therapy. It underscores the importance of considering CAR targets, tumor types, and costimulatory domains when assessing the safety risks of CAR T cell products. Overall, there is currently a lack of a regulatory framework based on real-world data for cellular products and a lack of risk-based classification review methods. This study aims to improve the regulatory system for cellular products, emphasizing risk-based classification. Furthermore, the study advocates for leveraging machine learning in regulatory science to enhance the assessment of cellular product safety, illustrating the role of Bayesian networks in aiding regulatory decision-making for the risk classification of cellular products.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102937"},"PeriodicalIF":6.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001799/pdfft?md5=a9ff234dd953a123436e59d1b68d25ee&pid=1-s2.0-S0933365724001799-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.artmed.2024.102930
Shiyi Wang , Yang Nan , Sheng Zhang , Federico Felder , Xiaodan Xing , Yingying Fang , Javier Del Ser , Simon L.F. Walsh , Guang Yang
In the realm of pulmonary tracheal segmentation, the scarcity of annotated data stands as a prevalent pain point in most medical segmentation endeavors. Concurrently, most Deep Learning (DL) methodologies employed in this domain invariably grapple with other dual challenges: the inherent opacity of ‘black box’ models and the ongoing pursuit of performance enhancement. In response to these intertwined challenges, the core concept of our Human-Computer Interaction (HCI) based learning models (RS_UNet, LC_UNet, UUNet and WD_UNet) hinge on the versatile combination of diverse query strategies and an array of deep learning models. We train four HCI models based on the initial training dataset and sequentially repeat the following steps 1–4: (1) Query Strategy: Our proposed HCI models selects those samples which contribute the most additional representative information when labeled in each iteration of the query strategy (showing the names and sequence numbers of the samples to be annotated). Additionally, in this phase, the model selects the unlabeled samples with the greatest predictive disparity by calculating the Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: The selected samples in previous stage are then used for domain expert correction of the system-generated tracheal central lines in each training round. (3) Update training dataset: When domain experts are involved in each epoch of the DL model's training iterations, they update the training dataset with greater precision after each epoch, thereby enhancing the trustworthiness of the ‘black box’ DL model and improving the performance of models. (4) Model training: Proposed HCI model is trained using the updated training dataset and an enhanced version of existing UNet.
Experimental results validate the effectiveness of this Human-Computer Interaction-based approaches, demonstrating that our proposed WD-UNet, LC-UNet, UUNet, RS-UNet achieve comparable or even superior performance than the state-of-the-art DL models, such as WD-UNet with only 15 %–35 % of the training data, leading to substantial reductions (65 %–85 % reduction of annotation effort) in physician annotation time.
{"title":"Probing perfection: The relentless art of meddling for pulmonary airway segmentation from HRCT via a human-AI collaboration based active learning method","authors":"Shiyi Wang , Yang Nan , Sheng Zhang , Federico Felder , Xiaodan Xing , Yingying Fang , Javier Del Ser , Simon L.F. Walsh , Guang Yang","doi":"10.1016/j.artmed.2024.102930","DOIUrl":"10.1016/j.artmed.2024.102930","url":null,"abstract":"<div><p>In the realm of pulmonary tracheal segmentation, the scarcity of annotated data stands as a prevalent pain point in most medical segmentation endeavors. Concurrently, most Deep Learning (DL) methodologies employed in this domain invariably grapple with other dual challenges: the inherent opacity of ‘black box’ models and the ongoing pursuit of performance enhancement. In response to these intertwined challenges, the core concept of our Human-Computer Interaction (HCI) based learning models (RS_UNet, LC_UNet, UUNet and WD_UNet) hinge on the versatile combination of diverse query strategies and an array of deep learning models. We train four HCI models based on the initial training dataset and sequentially repeat the following steps 1–4: (1) Query Strategy: Our proposed HCI models selects those samples which contribute the most additional representative information when labeled in each iteration of the query strategy (showing the names and sequence numbers of the samples to be annotated). Additionally, in this phase, the model selects the unlabeled samples with the greatest predictive disparity by calculating the Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: The selected samples in previous stage are then used for domain expert correction of the system-generated tracheal central lines in each training round. (3) Update training dataset: When domain experts are involved in each epoch of the DL model's training iterations, they update the training dataset with greater precision after each epoch, thereby enhancing the trustworthiness of the ‘black box’ DL model and improving the performance of models. (4) Model training: Proposed HCI model is trained using the updated training dataset and an enhanced version of existing UNet.</p><p>Experimental results validate the effectiveness of this Human-Computer Interaction-based approaches, demonstrating that our proposed WD-UNet, LC-UNet, UUNet, RS-UNet achieve comparable or even superior performance than the state-of-the-art DL models, such as WD-UNet with only 15 %–35 % of the training data, leading to substantial reductions (65 %–85 % reduction of annotation effort) in physician annotation time.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"154 ","pages":"Article 102930"},"PeriodicalIF":6.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001726/pdfft?md5=d0d63374bfedea7ba634b9ea22685c76&pid=1-s2.0-S0933365724001726-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141694038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1016/j.artmed.2024.102938
Iñigo Alonso, Maite Oronoz, Rodrigo Agerri
Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support. This potential has been illustrated by the state-of-the-art performance obtained by LLMs in Medical Question Answering, with striking results such as passing marks in licensing medical exams. However, while impressive, the required quality bar for medical applications remains far from being achieved. Currently, LLMs remain challenged by outdated knowledge and by their tendency to generate hallucinated content. Furthermore, most benchmarks to assess medical knowledge lack reference gold explanations which means that it is not possible to evaluate the reasoning of LLMs predictions. Finally, the situation is particularly grim if we consider benchmarking LLMs for languages other than English which remains, as far as we know, a totally neglected topic. In order to address these shortcomings, in this paper we present MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering. To the best of our knowledge, MedExpQA includes for the first time reference gold explanations, written by medical doctors, of the correct and incorrect options in the exams. Comprehensive multilingual experimentation using both the gold reference explanations and Retrieval Augmented Generation (RAG) approaches show that performance of LLMs, with best results around 75 accuracy for English, still has large room for improvement, especially for languages other than English, for which accuracy drops 10 points. Therefore, despite using state-of-the-art RAG methods, our results also demonstrate the difficulty of obtaining and integrating readily available medical knowledge that may positively impact results on downstream evaluations for Medical Question Answering. Data, code, and fine-tuned models will be made publicly available.1
{"title":"MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering","authors":"Iñigo Alonso, Maite Oronoz, Rodrigo Agerri","doi":"10.1016/j.artmed.2024.102938","DOIUrl":"10.1016/j.artmed.2024.102938","url":null,"abstract":"<div><p>Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support. This potential has been illustrated by the state-of-the-art performance obtained by LLMs in Medical Question Answering, with striking results such as passing marks in licensing medical exams. However, while impressive, the required quality bar for medical applications remains far from being achieved. Currently, LLMs remain challenged by outdated knowledge and by their tendency to generate hallucinated content. Furthermore, most benchmarks to assess medical knowledge lack reference gold explanations which means that it is not possible to evaluate the reasoning of LLMs predictions. Finally, the situation is particularly grim if we consider benchmarking LLMs for languages other than English which remains, as far as we know, a totally neglected topic. In order to address these shortcomings, in this paper we present MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering. To the best of our knowledge, MedExpQA includes for the first time reference gold explanations, written by medical doctors, of the correct and incorrect options in the exams. Comprehensive multilingual experimentation using both the gold reference explanations and Retrieval Augmented Generation (RAG) approaches show that performance of LLMs, with best results around 75 accuracy for English, still has large room for improvement, especially for languages other than English, for which accuracy drops 10 points. Therefore, despite using state-of-the-art RAG methods, our results also demonstrate the difficulty of obtaining and integrating readily available medical knowledge that may positively impact results on downstream evaluations for Medical Question Answering. Data, code, and fine-tuned models will be made publicly available.<span><span><sup>1</sup></span></span></p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102938"},"PeriodicalIF":6.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001805/pdfft?md5=4257af50106cf356598f8ea351cad6b8&pid=1-s2.0-S0933365724001805-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141914671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.artmed.2024.102934
Zichen Ye , Daqian Zhang , Yuankai Zhao , Mingyang Chen , Huike Wang , Samuel Seery , Yimin Qu , Peng Xue , Yu Jiang
Background
Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.
Objectives
To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.
Methods
Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.
Results
2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77–86), SPE was 87 % (range 84–90), with an AUC of 0.92 (range 0.89–0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84–0.90) and 0.83 (0.79–0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87–0.93), 0.80 (0.76–0.83), and 0.88 (0.85–0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85–0.91) for DL, 0.76 (0.72–0.79) for unassisted, and 0.87 (0.84–0.90) for DL-assisted dermatologists.
Conclusions
Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.
背景黑色素瘤严重危害人类健康,早期识别对治疗成功至关重要。深度学习(DL)具有利用成像技术检测癌症的潜力,许多研究提供证据表明,DL 算法可以在黑色素瘤诊断中达到很高的准确性.Objectives To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.MethodsOvid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021.方法系统地检索了从开始到 2021 年 12 月 7 日的研究。如果研究具有特定结果和组织病理学证实,则纳入报告了使用皮肤镜图像检测黑色素瘤的诊断 DL 模型性能的研究。提取二元诊断准确性数据和或然率表来分析相关结果,包括灵敏度(SEN)、特异性(SPE)和曲线下面积(AUC)。根据人机比较和合作情况进行了分组分析。该研究已在 PROSPERO 注册,CRD42022367824.Results最初检索到 2309 条记录,其中 37 项研究符合我们的纳入标准,27 项提供了足够的数据用于荟萃分析综合。汇总的 SEN 为 82%(范围为 77-86),SPE 为 87%(范围为 84-90),AUC 为 0.92(范围为 0.89-0.94)。人机比较中,DL 和皮肤科医生的集合 AUC 分别为 0.87(0.84-0.90)和 0.83(0.79-0.86)。DL以及初级和高级皮肤科医生的集合AUC分别为0.90(0.87-0.93)、0.80(0.76-0.83)和0.88(0.85-0.91)。结论有证据表明,DL 算法在黑色素瘤诊断方面与资深皮肤科医生一样准确。因此,DL 可用于辅助皮肤科医生做出诊断决策。不过,还需要进一步开展高质量、大规模的多中心研究,以应对与基于医疗人工智能的诊断相关的具体挑战。
{"title":"Deep learning algorithms for melanoma detection using dermoscopic images: A systematic review and meta-analysis","authors":"Zichen Ye , Daqian Zhang , Yuankai Zhao , Mingyang Chen , Huike Wang , Samuel Seery , Yimin Qu , Peng Xue , Yu Jiang","doi":"10.1016/j.artmed.2024.102934","DOIUrl":"10.1016/j.artmed.2024.102934","url":null,"abstract":"<div><h3>Background</h3><p>Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.</p></div><div><h3>Objectives</h3><p>To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.</p></div><div><h3>Methods</h3><p>Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.</p></div><div><h3>Results</h3><p>2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77–86), SPE was 87 % (range 84–90), with an AUC of 0.92 (range 0.89–0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84–0.90) and 0.83 (0.79–0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87–0.93), 0.80 (0.76–0.83), and 0.88 (0.85–0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85–0.91) for DL, 0.76 (0.72–0.79) for unassisted, and 0.87 (0.84–0.90) for DL-assisted dermatologists.</p></div><div><h3>Conclusions</h3><p>Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102934"},"PeriodicalIF":6.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141842381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.artmed.2024.102935
Laith Alzubaidi , Khamael AL-Dulaimi , Asma Salhi , Zaenab Alammar , Mohammed A. Fadhel , A.S. Albahri , A.H. Alamoodi , O.S. Albahri , Amjad F. Hasan , Jinshuai Bai , Luke Gilliland , Jing Peng , Marco Branni , Tristan Shuker , Kenneth Cutbush , Jose Santamaría , Catarina Moreira , Chun Ouyang , Ye Duan , Mohamed Manoufali , Yuantong Gu
Deep learning (DL) in orthopaedics has gained significant attention in recent years. Previous studies have shown that DL can be applied to a wide variety of orthopaedic tasks, including fracture detection, bone tumour diagnosis, implant recognition, and evaluation of osteoarthritis severity. The utilisation of DL is expected to increase, owing to its ability to present accurate diagnoses more efficiently than traditional methods in many scenarios. This reduces the time and cost of diagnosis for patients and orthopaedic surgeons. To our knowledge, no exclusive study has comprehensively reviewed all aspects of DL currently used in orthopaedic practice. This review addresses this knowledge gap using articles from Science Direct, Scopus, IEEE Xplore, and Web of Science between 2017 and 2023. The authors begin with the motivation for using DL in orthopaedics, including its ability to enhance diagnosis and treatment planning. The review then covers various applications of DL in orthopaedics, including fracture detection, detection of supraspinatus tears using MRI, osteoarthritis, prediction of types of arthroplasty implants, bone age assessment, and detection of joint-specific soft tissue disease. We also examine the challenges for implementing DL in orthopaedics, including the scarcity of data to train DL and the lack of interpretability, as well as possible solutions to these common pitfalls. Our work highlights the requirements to achieve trustworthiness in the outcomes generated by DL, including the need for accuracy, explainability, and fairness in the DL models. We pay particular attention to fusion techniques as one of the ways to increase trustworthiness, which have also been used to address the common multimodality in orthopaedics. Finally, we have reviewed the approval requirements set forth by the US Food and Drug Administration to enable the use of DL applications. As such, we aim to have this review function as a guide for researchers to develop a reliable DL application for orthopaedic tasks from scratch for use in the market.
{"title":"Comprehensive review of deep learning in orthopaedics: Applications, challenges, trustworthiness, and fusion","authors":"Laith Alzubaidi , Khamael AL-Dulaimi , Asma Salhi , Zaenab Alammar , Mohammed A. Fadhel , A.S. Albahri , A.H. Alamoodi , O.S. Albahri , Amjad F. Hasan , Jinshuai Bai , Luke Gilliland , Jing Peng , Marco Branni , Tristan Shuker , Kenneth Cutbush , Jose Santamaría , Catarina Moreira , Chun Ouyang , Ye Duan , Mohamed Manoufali , Yuantong Gu","doi":"10.1016/j.artmed.2024.102935","DOIUrl":"10.1016/j.artmed.2024.102935","url":null,"abstract":"<div><p>Deep learning (DL) in orthopaedics has gained significant attention in recent years. Previous studies have shown that DL can be applied to a wide variety of orthopaedic tasks, including fracture detection, bone tumour diagnosis, implant recognition, and evaluation of osteoarthritis severity. The utilisation of DL is expected to increase, owing to its ability to present accurate diagnoses more efficiently than traditional methods in many scenarios. This reduces the time and cost of diagnosis for patients and orthopaedic surgeons. To our knowledge, no exclusive study has comprehensively reviewed all aspects of DL currently used in orthopaedic practice. This review addresses this knowledge gap using articles from Science Direct, Scopus, IEEE Xplore, and Web of Science between 2017 and 2023. The authors begin with the motivation for using DL in orthopaedics, including its ability to enhance diagnosis and treatment planning. The review then covers various applications of DL in orthopaedics, including fracture detection, detection of supraspinatus tears using MRI, osteoarthritis, prediction of types of arthroplasty implants, bone age assessment, and detection of joint-specific soft tissue disease. We also examine the challenges for implementing DL in orthopaedics, including the scarcity of data to train DL and the lack of interpretability, as well as possible solutions to these common pitfalls. Our work highlights the requirements to achieve trustworthiness in the outcomes generated by DL, including the need for accuracy, explainability, and fairness in the DL models. We pay particular attention to fusion techniques as one of the ways to increase trustworthiness, which have also been used to address the common multimodality in orthopaedics. Finally, we have reviewed the approval requirements set forth by the US Food and Drug Administration to enable the use of DL applications. As such, we aim to have this review function as a guide for researchers to develop a reliable DL application for orthopaedic tasks from scratch for use in the market.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102935"},"PeriodicalIF":6.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001775/pdfft?md5=fe495472b9c9c72428f1196589e2e990&pid=1-s2.0-S0933365724001775-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141839273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.artmed.2024.102936
Erfan Darzi , Yiqing Shen , Yangming Ou , Nanna M. Sijtsema , P.M.A van Ooijen
Federated learning enables training models on distributed, privacy-sensitive medical imaging data. However, data heterogeneity across participating institutions leads to reduced model performance and fairness issues, especially for underrepresented datasets. To address these challenges, we propose leveraging the multi-head attention mechanism in Vision Transformers to align the representations of heterogeneous data across clients. By focusing on the attention mechanism as the alignment objective, our approach aims to improve both the accuracy and fairness of federated learning models in medical imaging applications. We evaluate our method on the IQ-OTH/NCCD Lung Cancer dataset, simulating various levels of data heterogeneity using Latent Dirichlet Allocation (LDA). Our results demonstrate that our approach achieves competitive performance compared to state-of-the-art federated learning methods across different heterogeneity levels and improves the performance of models for underrepresented clients, promoting fairness in the federated learning setting. These findings highlight the potential of leveraging the multi-head attention mechanism to address the challenges of data heterogeneity in medical federated learning.
{"title":"Tackling heterogeneity in medical federated learning via aligning vision transformers","authors":"Erfan Darzi , Yiqing Shen , Yangming Ou , Nanna M. Sijtsema , P.M.A van Ooijen","doi":"10.1016/j.artmed.2024.102936","DOIUrl":"10.1016/j.artmed.2024.102936","url":null,"abstract":"<div><p>Federated learning enables training models on distributed, privacy-sensitive medical imaging data. However, data heterogeneity across participating institutions leads to reduced model performance and fairness issues, especially for underrepresented datasets. To address these challenges, we propose leveraging the multi-head attention mechanism in Vision Transformers to align the representations of heterogeneous data across clients. By focusing on the attention mechanism as the alignment objective, our approach aims to improve both the accuracy and fairness of federated learning models in medical imaging applications. We evaluate our method on the IQ-OTH/NCCD Lung Cancer dataset, simulating various levels of data heterogeneity using Latent Dirichlet Allocation (LDA). Our results demonstrate that our approach achieves competitive performance compared to state-of-the-art federated learning methods across different heterogeneity levels and improves the performance of models for underrepresented clients, promoting fairness in the federated learning setting. These findings highlight the potential of leveraging the multi-head attention mechanism to address the challenges of data heterogeneity in medical federated learning.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102936"},"PeriodicalIF":6.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1016/j.artmed.2024.102933
Giuseppe Desolda , Giovanni Dimauro , Andrea Esposito , Rosa Lanzilotti , Maristella Matera , Massimo Zancanaro
This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human–AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (Iterative Exploration), (ii) explain the AI system’s behavior or decisions (Clarification), and (iii) allow non-expert users to trigger simple retraining of the AI model (Reconfiguration). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.
{"title":"A Human–AI interaction paradigm and its application to rhinocytology","authors":"Giuseppe Desolda , Giovanni Dimauro , Andrea Esposito , Rosa Lanzilotti , Maristella Matera , Massimo Zancanaro","doi":"10.1016/j.artmed.2024.102933","DOIUrl":"10.1016/j.artmed.2024.102933","url":null,"abstract":"<div><p>This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human–AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (<em>Iterative Exploration</em>), (ii) explain the AI system’s behavior or decisions (<em>Clarification</em>), and (iii) allow non-expert users to trigger simple retraining of the AI model (<em>Reconfiguration</em>). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102933"},"PeriodicalIF":6.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001751/pdfft?md5=b223655b36875d5261335f6e3519aed6&pid=1-s2.0-S0933365724001751-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}