Saubhagya Joshi, Eunbin Ha, Yonaira Rivera, Vivek K Singh
ChatGPT is a popular information system (over 1 billion visits in August 2023) that can generate natural language responses to user queries. It is important to study the quality and equity of its responses on health-related topics, such as vaccination, as they may influence public health decision-making. We use the Vaccine Hesitancy Scale (VHS) proposed by Shapiro et al.1 to measure the hesitancy of ChatGPT responses in English, Spanish, and French. We find that: (a) ChatGPT responses indicate less hesitancy than those reported for human respondents in past literature; (b) ChatGPT responses vary significantly across languages, with English responses being the most hesitant on average and Spanish being the least; (c) ChatGPT responses are largely consistent across different model parameters but show some variations across the scale factors (vaccine competency, risk). Results have implications for researchers interested in evaluating and improving the quality and equity of health-related web information.
{"title":"ChatGPT and Vaccine Hesitancy: A Comparison of English, Spanish, and French Responses Using a Validated Scale.","authors":"Saubhagya Joshi, Eunbin Ha, Yonaira Rivera, Vivek K Singh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>ChatGPT is a popular information system (over 1 billion visits in August 2023) that can generate natural language responses to user queries. It is important to study the quality and equity of its responses on health-related topics, such as vaccination, as they may influence public health decision-making. We use the Vaccine Hesitancy Scale (VHS) proposed by Shapiro et al.<sup>1</sup> to measure the hesitancy of ChatGPT responses in English, Spanish, and French. We find that: (a) ChatGPT responses indicate less hesitancy than those reported for human respondents in past literature; (b) ChatGPT responses vary significantly across languages, with English responses being the most hesitant on average and Spanish being the least; (c) ChatGPT responses are largely consistent across different model parameters but show some variations across the scale factors (vaccine competency, risk). Results have implications for researchers interested in evaluating and improving the quality and equity of health-related web information.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"266-275"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew R Ahn, Emmanuel Edu, Christina J O'Malley, Laura Kavanaugh, Alex Leiser, Linh Palcher, Christopher Erickson, Marissa Marchese, C William Hanson
{"title":"A Roadmap for Improving Telemedicine Support Operations.","authors":"Andrew R Ahn, Emmanuel Edu, Christina J O'Malley, Laura Kavanaugh, Alex Leiser, Linh Palcher, Christopher Erickson, Marissa Marchese, C William Hanson","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"63-64"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaodan Zhang, Nabasmita Talukdar, Sandeep Vemulapalli, Sumyeong Ahn, Jiankun Wang, Han Meng, Sardar Mehtab Bin Murtaza, Dmitry Leshchiner, Aakash Ajay Dave, Dimitri F Joseph, Martin Witteveen-Lane, Dave Chesla, Jiayu Zhou, Bin Chen
The emerging large language models (LLMs) are actively evaluated in various fields including healthcare. Most studies have focused on established benchmarks and standard parameters; however, the variation and impact of prompt engineering and fine-tuning strategies have not been fully explored. This study benchmarks GPT-3.5 Turbo, GPT-4, and Llama-7B against BERT models and medical fellows' annotations in identifying patients with metastatic cancer from discharge summaries. Results revealed that clear, concise prompts incorporating reasoning steps significantly enhanced performance. GPT-4 exhibited superior performance among all models. Notably, one-shot learning and fine-tuning provided no incremental benefit. The model's accuracy sustained even when keywords for metastatic cancer were removed or when half of the input tokens were randomly discarded. These findings underscore GPT-4's potential to substitute specialized models, such as PubMedBERT, through strategic prompt engineering, and suggest opportunities to improve open-source models, which are better suited to use in clinical settings.
{"title":"Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.","authors":"Xiaodan Zhang, Nabasmita Talukdar, Sandeep Vemulapalli, Sumyeong Ahn, Jiankun Wang, Han Meng, Sardar Mehtab Bin Murtaza, Dmitry Leshchiner, Aakash Ajay Dave, Dimitri F Joseph, Martin Witteveen-Lane, Dave Chesla, Jiayu Zhou, Bin Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The emerging large language models (LLMs) are actively evaluated in various fields including healthcare. Most studies have focused on established benchmarks and standard parameters; however, the variation and impact of prompt engineering and fine-tuning strategies have not been fully explored. This study benchmarks GPT-3.5 Turbo, GPT-4, and Llama-7B against BERT models and medical fellows' annotations in identifying patients with metastatic cancer from discharge summaries. Results revealed that clear, concise prompts incorporating reasoning steps significantly enhanced performance. GPT-4 exhibited superior performance among all models. Notably, one-shot learning and fine-tuning provided no incremental benefit. The model's accuracy sustained even when keywords for metastatic cancer were removed or when half of the input tokens were randomly discarded. These findings underscore GPT-4's potential to substitute specialized models, such as PubMedBERT, through strategic prompt engineering, and suggest opportunities to improve open-source models, which are better suited to use in clinical settings.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"478-487"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141199710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorna Pairman, Paul Chin, Sharon J Gardiner, Matthew Doogue
The aim was to assess how making the indication field compulsory in our electronic prescribing system influenced free text documentation and to visualise prescriber behaviour. The indication field was made compulsory for seven antibacterial medicines. Text recorded in the indication field was manually classified as 'indication present', 'other text', 'rubbish text', or 'blank'. The proportion of prescriptions with an indication was compared for four weeks before and after the intervention. Indication provision increased from 10.6% to 72.4% (p<0.01) post-intervention. 'Other text' increased from 7.6% to 25.1% (p<0.01), and 'rubbish text' from 0.0% to 0.6% (p<0.01). Introducing the compulsory indication field increased indication documentation substantially with only a small increase in 'rubbish text'. An interactive report was developed using a live data extract to illustrate indication provision for all medicines prescribed at our tertiary hospital. The interactive report was validated and locally published to support audit and quality improvement projects.
{"title":"Compulsory Indications in Hospital Prescribing Software Tested with Antibacterial Prescriptions.","authors":"Lorna Pairman, Paul Chin, Sharon J Gardiner, Matthew Doogue","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The aim was to assess how making the indication field compulsory in our electronic prescribing system influenced free text documentation and to visualise prescriber behaviour. The indication field was made compulsory for seven antibacterial medicines. Text recorded in the indication field was manually classified as 'indication present', 'other text', 'rubbish text', or 'blank'. The proportion of prescriptions with an indication was compared for four weeks before and after the intervention. Indication provision increased from 10.6% to 72.4% (p<0.01) post-intervention. 'Other text' increased from 7.6% to 25.1% (p<0.01), and 'rubbish text' from 0.0% to 0.6% (p<0.01). Introducing the compulsory indication field increased indication documentation substantially with only a small increase in 'rubbish text'. An interactive report was developed using a live data extract to illustrate indication provision for all medicines prescribed at our tertiary hospital. The interactive report was validated and locally published to support audit and quality improvement projects.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"632-641"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141823/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chronic obstructive pulmonary disease (COPD) is a global health issue causing significant illness and death. Pulmonary Rehabilitation (PR) offers non-pharmacological treatment, including education, exercise, and psychological support which was shown to improve clinical outcomes. In both stable COPD and after an acute exacerbation, PR has been demonstrated to increase exercise capacity, decrease dyspnea, and enhance quality of life. Despite these benefits, referrals for PR for COPD treatment remain low. This study aims to evaluate the perceptions of healthcare providers for referring a COPD patient to PR. Semi-structured qualitative interviews were conducted with pulmonary specialists, hospitalists, and emergency department physicians. Domains and constructs from the Consolidated Framework for Implementation Research (CFIR) were applied to the qualitative data to organize, analyze, and identify the barriers and facilitators to referring COPD patients. The findings from this study will help guide strategies to improve the referral process for PR.
{"title":"Assessing the Barriers and Facilitators to Pulmonary Rehabilitation Referrals Using the Consolidated Framework for Implementation Research (CFIR).","authors":"Aileen S Gabriel, Joseph Finkelstein","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Chronic obstructive pulmonary disease (COPD) is a global health issue causing significant illness and death. Pulmonary Rehabilitation (PR) offers non-pharmacological treatment, including education, exercise, and psychological support which was shown to improve clinical outcomes. In both stable COPD and after an acute exacerbation, PR has been demonstrated to increase exercise capacity, decrease dyspnea, and enhance quality of life. Despite these benefits, referrals for PR for COPD treatment remain low. This study aims to evaluate the perceptions of healthcare providers for referring a COPD patient to PR. Semi-structured qualitative interviews were conducted with pulmonary specialists, hospitalists, and emergency department physicians. Domains and constructs from the Consolidated Framework for Implementation Research (CFIR) were applied to the qualitative data to organize, analyze, and identify the barriers and facilitators to referring COPD patients. The findings from this study will help guide strategies to improve the referral process for PR.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"172-181"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruiming Wu, Bing He, Bojian Hou, Andrew J Saykin, Jingwen Yan, Li Shen
Over the past decade, Alzheimer's disease (AD) has become increasingly severe and gained greater attention. Mild Cognitive Impairment (MCI) serves as an important prodromal stage of AD, highlighting the urgency of early diagnosis for timely treatment and control of the condition. Identifying the subtypes of MCI patients exhibits importance for dissecting the heterogeneity of this complex disorder and facilitating more effective target discovery and therapeutic development. Conventional method uses clinical measurements such as cognitive score and neurophysical assessment to stratify MCI patients into two groups with early MCI (EMCI) and late MCI (LMCI), which shows their progressive stages. However, such clinical method is not designed to de-convolute the heterogeneity of the disorder. This study uses a data-driven approach to divide MCI patients into a novel grouping of two subtypes based on an amyloid dataset of 68 cortical features from positron emission tomography (PET), where each subtype has a homogeneous cortical amyloid burden pattern. Experimental evaluation including visual two-dimensional cluster distribution, Kaplan-Meier plot, genetic association studies, and biomarker distribution analysis demonstrates that the identified subtypes performs better across all metrics than the conventional EMCI and LMCI grouping.
{"title":"Cluster Analysis of Cortical Amyloid Burden for Identifying Imaging-driven Subtypes in Mild Cognitive Impairment.","authors":"Ruiming Wu, Bing He, Bojian Hou, Andrew J Saykin, Jingwen Yan, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Over the past decade, Alzheimer's disease (AD) has become increasingly severe and gained greater attention. Mild Cognitive Impairment (MCI) serves as an important prodromal stage of AD, highlighting the urgency of early diagnosis for timely treatment and control of the condition. Identifying the subtypes of MCI patients exhibits importance for dissecting the heterogeneity of this complex disorder and facilitating more effective target discovery and therapeutic development. Conventional method uses clinical measurements such as cognitive score and neurophysical assessment to stratify MCI patients into two groups with early MCI (EMCI) and late MCI (LMCI), which shows their progressive stages. However, such clinical method is not designed to de-convolute the heterogeneity of the disorder. This study uses a data-driven approach to divide MCI patients into a novel grouping of two subtypes based on an amyloid dataset of 68 cortical features from positron emission tomography (PET), where each subtype has a homogeneous cortical amyloid burden pattern. Experimental evaluation including visual two-dimensional cluster distribution, Kaplan-Meier plot, genetic association studies, and biomarker distribution analysis demonstrates that the identified subtypes performs better across all metrics than the conventional EMCI and LMCI grouping.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"439-448"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141199047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiwei Lin, Shiqiang Tao, Wei-Chun Chou, Guo-Qiang Zhang, Xiaojin Li
Clinical research data visualization is integral to making sense of biomedical research and healthcare data. The complexity and diversity of data, along with the need for solid programming skills, can hinder advances in clinical research data visualization. To overcome these challenges, we introduce VisualSphere, a web-based interactive visualization system that directly interfaces with clinical research data repositories, streamlining and simplifying the visualization workflow. VisualSphere is founded on three primary component modules: Connection, Configuration, and Visualization. An end-user can set up connections to the data repositories, create charts by selecting the desired tables and variables, and render visualization dashboards generated by Plotly and R/Shiny. We performed a preliminary evaluation of VisualSphere, which achieved high user satisfaction. VisualSphere has the potential to serve as a versatile tool for various clinical research data repositories, enabling researchers to explore and interact with clinical research data efficiently and effectively.
{"title":"VisualSphere: a Web-based Interactive Visualization System for Clinical Research Data.","authors":"Shiwei Lin, Shiqiang Tao, Wei-Chun Chou, Guo-Qiang Zhang, Xiaojin Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical research data visualization is integral to making sense of biomedical research and healthcare data. The complexity and diversity of data, along with the need for solid programming skills, can hinder advances in clinical research data visualization. To overcome these challenges, we introduce VisualSphere, a web-based interactive visualization system that directly interfaces with clinical research data repositories, streamlining and simplifying the visualization workflow. VisualSphere is founded on three primary component modules: Connection, Configuration, and Visualization. An end-user can set up connections to the data repositories, create charts by selecting the desired tables and variables, and render visualization dashboards generated by Plotly and R/Shiny. We performed a preliminary evaluation of VisualSphere, which achieved high user satisfaction. VisualSphere has the potential to serve as a versatile tool for various clinical research data repositories, enabling researchers to explore and interact with clinical research data efficiently and effectively.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"603-612"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivien Song, David Kauchak, John Hamre, Nick Morgenstein, Gondy Leroy
Critical to producing accessible content is an understanding of what characteristics affect understanding and comprehension. To answer this question, we are producing a large corpus of health-related texts with associated questions that can be read or listened to by study participants to measure the difficulty of the underlying content, which can later be used to better understand text difficulty and user comprehension. In this paper, we examine methods for automatically generating multiple-choice questions using Google's related questions and ChatGPT. Overall, we find both algorithms generate reasonable questions that are complementary; ChatGPT questions are more similar to the snippet while Google related-search questions have more lexical variation.
{"title":"A Comparison of Google and ChatGPT for Automatic Generation of Health-related Multiple-choice Questions.","authors":"Vivien Song, David Kauchak, John Hamre, Nick Morgenstein, Gondy Leroy","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Critical to producing accessible content is an understanding of what characteristics affect understanding and comprehension. To answer this question, we are producing a large corpus of health-related texts with associated questions that can be read or listened to by study participants to measure the difficulty of the underlying content, which can later be used to better understand text difficulty and user comprehension. In this paper, we examine methods for automatically generating multiple-choice questions using Google's related questions and ChatGPT. Overall, we find both algorithms generate reasonable questions that are complementary; ChatGPT questions are more similar to the snippet while Google related-search questions have more lexical variation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"679"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser
HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.
{"title":"FHIRing up OpenMRS: Architecture, Implementation and Real-World Use-Cases in Global Health.","authors":"I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"162-171"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately determining and classifying different types of skin cancers is critical for early diagnosis. In this work, we propose a novel use of deep learning for classification of benign and malignant skin lesions using dermoscopy images. We obtained 770 de-identified dermoscopy images from the University of Missouri (MU) Healthcare. We created three unique image datasets that contained the original images and images obtained after applying a hair removal algorithm. We trained three popular deep learning models, namely, ResNet50, DenseNet121, and Inception-V3. We evaluated the accuracy and the area under the curve (AUC) receiver operating characteristic (ROC) for each model and dataset. DenseNet121 achieved the best accuracy (80.52%) and AUC ROC score (0.81) on the third dataset. For this dataset, the sensitivity and specificity were 0.80 and 0.81, respectively. We also present the SHAP (SHapley Additive exPlanations) values for the predictions made by different models to understand their interpretability.
{"title":"Comparison of Three Deep Learning Models in Accurate Classification of 770 Dermoscopy Skin Lesion Images.","authors":"Abdulmateen Adebiyi, Praveen Rao, Jesse Hirner, Anya Anokhin, Emily Hoffman Smith, Eduardo J Simoes, Mirna Becevic","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Accurately determining and classifying different types of skin cancers is critical for early diagnosis. In this work, we propose a novel use of deep learning for classification of benign and malignant skin lesions using dermoscopy images. We obtained 770 de-identified dermoscopy images from the University of Missouri (MU) Healthcare. We created three unique image datasets that contained the original images and images obtained after applying a hair removal algorithm. We trained three popular deep learning models, namely, ResNet50, DenseNet121, and Inception-V3. We evaluated the accuracy and the area under the curve (AUC) receiver operating characteristic (ROC) for each model and dataset. DenseNet121 achieved the best accuracy (80.52%) and AUC ROC score (0.81) on the third dataset. For this dataset, the sensitivity and specificity were 0.80 and 0.81, respectively. We also present the SHAP (SHapley Additive exPlanations) values for the predictions made by different models to understand their interpretability.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"46-53"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141199974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}