Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran
Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.
{"title":"Detecting Cerebral Ischemia From Electroencephalography During Carotid Endarterectomy Using Machine Learning.","authors":"Amir I Mina, Jessi U Espino, Allison M Bradley, Parthasarathy D Thirumala, Kayhan Batmanghelich, Shyam Visweswaran","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.
{"title":"Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment.","authors":"Ying Liu, Genevieve B Melton, Rui Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Acronyms, abbreviations, and symbols play a significant role in clinical notes. Acronym and symbol sense disambiguation are crucial natural language processing (NLP) tasks that ensure the clarity and consistency of clinical notes and downstream NLP processing. Previous studies using traditional machine learning methods have been relatively successful in tackling this issue. In our research, we conducted an evaluation of large language models (LLMs), including ChatGPT 3.5 and 4, as well as other open LLMs, and BERT-based models, across three NLP tasks: acronym and symbol sense disambiguation, semantic similarity, and relatedness. Our findings emphasize ChatGPT's remarkable ability to distinguish between senses with minimal or zero-shot training. Additionally, open source LLM Mixtrial-8x7B exhibited high accuracy for acronyms with fewer senses, and moderate accuracy for symbol sense accuracy. BERT-based models outperformed previous machine learning approaches, achieving an impressive accuracy rate of over 95%, showcasing their effectiveness in addressing the challenge of acronym and symbol sense disambiguation. Furthermore, ChatGPT exhibited a strong correlation, surpassing 70%, with human gold standards when evaluating similarity and relatedness.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser
HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.
{"title":"FHIRing up OpenMRS: Architecture, Implementation and Real-World Use-Cases in Global Health.","authors":"I Bacher, M Goodrich, A Kimaina, M Seaton, G Faulkenberry, S Vaish, J Flowers, H S Fraser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>HL7 FHIR was created almost a decade ago and is seeing increasingly wide use in high income settings. Although some initial work was carried out in low and middle income (LMIC) settings there has been little impact until recently. The need for reliable and easy to implement interoperability between health information systems in LMICs is growing with large scale deployments of EHRs, national reporting systems and mHealth applications. The OpenMRS open source EHR has been deployed in more than 44 LMIC with increasing needs for interoperability with other HIS. We describe here the development and deployment of a new FHIR module supporting the latest standards and its use in interoperability with laboratory systems, mHealth applications, pharmacy dispensing system and as a tool for supporting advanced user interface designs. We also show how it facilitates date science projects and deployment of machine leaning based CDSS and precision medicine in LMICs.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen
Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.
{"title":"Multivariate mediation analysis with voxel-based morphometry revealed the neurodegeneration pathways from genetic variants to Alzheimer's Disease.","authors":"Shizhuo Mu, Jingxuan Bao, Hanxiang Xu, Manu Shivakumar, Shu Yang, Xia Ning, Dokyoon Kim, Christos Davatzikos, Haochang Shou, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neurodegenerative processes are increasingly recognized as potential causative factors in Alzheimer's disease (AD) pathogenesis. While many studies have leveraged mediation analysis models to elucidate the underlying mechanisms linking genetic variants to AD diagnostic outcomes, the majority have predominantly focused on regional brain measure as a mediator, thereby compromising the granularity of the imaging data. In our investigation, using the imaging genetics data from a landmark AD cohort, we contrasted both region-based and voxel-based brain measurements as imaging endophenotypes, and examined their roles in mediating genetic effects on AD outcomes. Our findings underscored that using voxel-based morphometry offers enhanced statistical power. Moreover, we delineated specific mediation pathways between SNP, brain volume, and AD outcomes, shedding light on the intricate relationship among these variables.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla
Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora (using five different prompts). We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated the user study texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert's evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.
{"title":"Text and Audio Simplification: Human vs. ChatGPT.","authors":"Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora (using five different prompts). We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated the user study texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert's evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.
{"title":"Cross-phenotype associations between Alzheimer's Disease and its comorbidities may provide clues to progression.","authors":"Anni Moore, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the most prevalent neurodegenerative disease worldwide, with one in nine people over the age of 65 living with the disease in 2023. In this study, we used a phenome wide association study (PheWAS) approach to identify cross-phenotype between previously identified genetic associations for AD and electronic health record (EHR) diagnoses from the UK Biobank (UKBB) (n=361,194 of European ancestry) and the eMERGE Network (n=105,108 of diverse ancestry). Based on 497 previously identified AD-associated variants from the Alzheimer's Disease Variant Portal (ADVP), we found significant associations primarily in immune and cardiac related diseases in our PheWAS. Replicating variants have widespread impacts on immune genes in diverse tissue types. This study demonstrates the potential of using the PheWAS strategy to improve our understanding of AD progression as well as identify potential drug repurposing opportunities for new treatment and disease prevention strategies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng
Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.
临床笔记中充满了模棱两可的医学缩写。最近基于学习的方法利用上下文知识进行意义消歧。以前的研究结果表明,临床笔记的结构元素包含有用的特征,可为缩写的不同解释提供信息,但这些特征仍未得到充分利用,也未得到充分研究。据我们所知,唯一一项探索笔记结构的研究只是列举了笔记中的标题,而这种表述并不具有语义意义。本文介绍了一种基于学习的方法,该方法使用统一医学语言系统(UMLS)中预定义的语义类型来表示笔记结构。除了广泛使用的 N-gram,我们还在两个不同的数据集上使用三种学习模型对该表示法进行了评估。实验结果表明,我们的特征增强技术持续提高了缩写消歧模型的性能,最佳 F1 得分为 0.93。
{"title":"Clinical Note Structural Knowledge Improves Word Sense Disambiguation.","authors":"Fangyi Chen, Gongbo Zhang, Si Chen, Tiffany Callahan, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.
{"title":"Enabling Semantic Topic Modeling on Twitter Using MetaMap.","authors":"Rebecca Shyu, Chunhua Weng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \"hydroxychloroquine\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed
Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.
{"title":"Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare.","authors":"Prosanta Barai, Gondy Leroy, Prakash Bisht, Joshua M Rothman, Sumi Lee, Jennifer Andrews, Sydney A Rice, Arif Ahmed","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141200175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert
This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.
{"title":"Local Large Language Models for Complex Structured Tasks.","authors":"V K Cody Bumgardner, Aaron Mullen, Samuel E Armstrong, Caylin Hickey, Victor Marek, Jeff Talbert","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}